Sort a list of strings based on a substring

Just for learning purposes, I want to sort a list by the numbers contained in each item. I think the solution might be the compare:options:range: selector, but I don’t know the correct syntax. Perhaps a selector that requires a parameter can’t be used? I tried the integerValue property as the key but that doesn’t work. Thanks for any help.

use framework "Foundation"
set theList to {"A04", "B03", "C02", "D01"}
set theArray to current application's NSArray's arrayWithArray:theList
set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"self" ascending:true selector:"compare:"
return (theArray's sortedArrayUsingDescriptors:{theDescriptor}) as list
-- current result {"A04", "B03", "C02", "D01"}
-- desired result {"D01", "C02", "B02", "A04"}

You want ascending to be false

Ascending
abcde.
123345 etc

If you are wanting to sort things like
A, b, D, g, a, C. 01a, 01A.
You may end up with different results depending on what you Locale is set to

1 Like

Hi peavine.

I don’t think NSStrings have the necessary properties to make this directly doable in ASObjC. You’d have to prepare an array of dictionaries with the sorting substrings separated out into their own properties:

use framework "Foundation"
set theList to {"A04", "B03", "C02", "D01"}
set sortingList to {}
repeat with this in theList
	set end of sortingList to {original:this's contents, numberPart:this's text 2 thru end}
end repeat

set theArray to current application's NSArray's arrayWithArray:sortingList
set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"numberPart" ascending:true selector:"compare:"
return ((theArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"original") as list
--> {"D01", "C02", "B02", "A04"}

Obviously the preparation method would depend on the nature of the strings and how you wanted them sorted.

It would be an easy job for one of my customisable vanilla sorts (eg. this one, although it’s overkill for your purposes) if you had it in your Script Libraries folder: :grin:

use sorter : script "Custom Iterative Ternary Merge Sort"

on isGreater(a, b)
	considering numeric strings
		return (a's text 2 thru end > b's text 2 thru end)
	end considering
end isGreater

set theList to {"A04", "B03", "C02", "D01"}
-- Sort items 1 thru -1 of theList using the
-- isGreater() handler in the current script (object).
tell sorter to sort(theList, 1, -1, {comparer:me})
return theList
--> {"D01", "C02", "B03", "A04"}
1 Like

@technomorph. Thanks for the suggestion, but I wanted to sort the list by the numbers contained in each item of the list. I’ve edited the script in my first post to better demonstrate this.

@Nigel. I’ve used an array of dictionaries for sorting purposes on a few occasions, and I now better understand the process better. Thanks!

BTW, the reason I got onto the selector issue was Shane’s comment in his book which is quoted below. I now understand that the compare methods Shane refers to are caseInsensitiveCompare, localizedCasInsensitiveCompare, compare, localizedCompare, and localizedStandardCompare. The compare:options:range method cannot be used.

The selector can be any method that compares one item to another and returns one of three results: NSOrderedAscending, NSOrderedSame or NSOrderedDescending. In practice, that means you can use any of the five variations on -compare: defined in NSString when you are sorting strings, and just -compare: for any other sortable class. (In Objective-C projects, developers can also define their own comparison methods.)

FWIW, I sometimes like to substitute ASObjC code for basic AppleScript code. This is not because its necessarily better (it’s unnecessarily complicates the matter in this case), but it helps with the learning process.

use framework "Foundation"

set theList to {"B121", "A11", "C06", "D01"}
set theArray to current application's NSMutableArray's arrayWithArray:theList
set sortingArray to current application's NSMutableArray's new()

repeat with anItem in theArray
	set theSubstring to (anItem's substringWithRange:{1, ((anItem's |length|()) - 1)})
	-- set theDictionary to (current application's NSDictionary's dictionaryWithDictionary:{original:anItem, numberPart:theSubstring}) -- instead of following 3 lines
	set theDictionary to (current application's NSMutableDictionary's new())
	(theDictionary's setObject:anItem forKey:"original")
	(theDictionary's setObject:theSubstring forKey:"numberPart")
	(sortingArray's addObject:theDictionary)
end repeat

set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"numberPart" ascending:true selector:"compare:"
return ((sortingArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"original") as list --> {"D01", "C06", "A11", "B121"}

Hi @peavine.

Don’t forget that strictly speaking, the length of an NSString is measured in UTF-16 code units, not characters, although that’s likely to be the same here.

Two “correct” alternatives that occur to me are …

set theSubstring to (anItem's stringByReplacingOccurrencesOfString:"^." withString:"" options:(current application's NSRegularExpressionSearch) range:{0, anItem's |length|()})

… and …

set c1Len to (anItem's rangeOfComposedCharacterSequenceAtIndex:0)'s |length|()
set theSubstring to (anItem's substringWithRange:{c1Len, ((anItem's |length|()) - c1Len)})

PS. By the way, if the input strings contain different numbers of digits, as above, it would be better to use “localizedStandardCompare:” as the descriptor’s selector so that the sort’s on the numerical values rather than on the lexical ones — ie. “121” > “21”.

Thanks Nigel. I especially like the first alternative, because it allows me to eliminate multiple characters at the beginning of an item (and elsewhere if desired) with a simple RegEx pattern. Your suggestion to use localizedStandardCompare returns results as I would want them. So:

use framework "Foundation"

set theList to {"A-121", "BB011", "CCC6", "D01"}
set theArray to current application's NSArray's arrayWithArray:theList
set sortingArray to current application's NSMutableArray's new()

repeat with anItem in theArray
	set theSubstring to (anItem's stringByReplacingOccurrencesOfString:"^\\D*" withString:"" options:(current application's NSRegularExpressionSearch) range:{0, anItem's |length|()})
	set theDictionary to (current application's NSDictionary's dictionaryWithDictionary:{originalString:anItem, sortString:theSubstring})
	(sortingArray's addObject:theDictionary)
end repeat

set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"sortString" ascending:true selector:"localizedStandardCompare:"
return ((sortingArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"originalString") as list
--> {"D01", "CCC6", "BB011", "A-121"}

The localizedStandardCompare selector can be used directly without mapping the list to a dictionary

It sorts the list like in Finder, to sort descending set ascending to false.

use framework "Foundation"

set theList to {"A04", "B03", "C02", "D01"}
set theArray to current application's NSArray's arrayWithArray:theList
set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"self" ascending:false selector:"localizedStandardCompare:"
return (theArray's sortedArrayUsingDescriptors:{theDescriptor}) as list

It works also with {"A-121", "D01", "CCC6", "BB011"} just as-is.

If the sort order is ascending you can use the API sortedArrayUsingSelector without creating a sort descriptor

use framework "Foundation"

set theList to {"A-121", "D01", "CCC6", "BB011"}
set theArray to current application's NSArray's arrayWithArray:theList
ascending:false selector:"localizedStandardCompare:"
return (theArray's sortedArrayUsingSelector:"localizedStandardCompare:") as list

Thanks Stefan for the suggestions.

I didn’t properly explain my request in post 1, and my test lists were poor. My goal was to completely ignore the initial letter of each item of the list and to sort on the numbers only. The following demonstrates why localizedStandardCompare can’t be used directly to achieve this goal (unless I’m missing something):

use framework "Foundation"

set theList to {"B04", "A03", "C02", "D01"}
set theArray to current application's NSArray's arrayWithArray:theList
set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"self" ascending:false selector:"localizedStandardCompare:"
return (theArray's sortedArrayUsingDescriptors:{theDescriptor}) as list
-- returns {"D01", "C02", "B04", "A03"}
-- desired result {"D01", "C02", "A03", "B04"}

I ran timing tests with a slightly modified version of the above scripts. The results with lists that contained 128 and 512 items were 12 and 39 milliseconds.

use framework "Foundation"
use scripting additions

set theList to {"Item 11", "Item 01", "Item 21"}
set thePattern to "^\\D*" -- remove characters not decimal digit from front of string
set sortedList to getSortedList(theList, thePattern)

on getSortedList(theList, thePattern)
	set theArray to current application's NSArray's arrayWithArray:theList
	set sortingArray to current application's NSMutableArray's new()
	repeat with anItem in theArray
		set theSubstring to (anItem's stringByReplacingOccurrencesOfString:thePattern withString:"" options:1024 range:{0, anItem's |length|()}) -- option 1024 is RegEx search
		(sortingArray's addObject:{originalString:anItem, sortString:theSubstring})
	end repeat
	set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"sortString" ascending:true selector:"localizedStandardCompare:"
	return ((sortingArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"originalString") as list
end getSortedList

The script immediately above isolates the desired substring by using the RegEx pattern to remove unwanted portions of the string. The following script works by using the RegEx pattern to directly identify the desired substring, which in most cases is probably a better approach.

As written, the following script throws an error if a list item does not contain a matching substring. Error correction–which is appropriate to the task at hand–needs to be added for this.

use framework "Foundation"
use scripting additions

set theList to {"item 21", "item 01", "item 11"}
set thePattern to "\\d+" -- match first instance decimal digits
set sortedList to getSortedList(theList, thePattern)

on getSortedList(theList, thePattern)
	set theArray to current application's NSArray's arrayWithArray:theList
	set sortingArray to current application's NSMutableArray's new()
	repeat with anItem in theArray
		set theRange to (anItem's rangeOfString:thePattern options:1024) -- option 1024 is RegEx search
		set theSubstring to (anItem's substringWithRange:theRange)
		(sortingArray's addObject:{originalString:anItem, sortString:theSubstring})
	end repeat
	set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"sortString" ascending:true selector:"localizedStandardCompare:"
	return ((sortingArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"originalString") as list
end getSortedList

In addition to those detailed above, there’s actually a third approach, which utilizes a capture group and both filters and sorts. I don’t know if this would ever be of actual use, so I include it FWIW.

The following script returns all items that begin with the character “i” and that contain one or more numbers in parentheses. The substring that is sorted on is the first instance of numbers in parentheses.

use framework "Foundation"
use scripting additions

set theList to {"skip item (33)", "item (22)", "item (01)", "item (11)"}
set thePattern to "^i.*?\\((\\d+)\\).*$"
set sortedList to getSortedList(theList, thePattern)

on getSortedList(theList, thePattern)
	set theArray to current application's NSArray's arrayWithArray:theList
	set sortingArray to current application's NSMutableArray's new()
	repeat with anItem in theArray
		set theSubstring to (anItem's stringByReplacingOccurrencesOfString:thePattern withString:"$1" options:1024 range:{0, anItem's |length|()}) -- option 1024 is RegEx search
		if (anItem's isEqualToString:theSubstring) is false then (sortingArray's addObject:{originalString:anItem, sortString:theSubstring})
	end repeat
	set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"sortString" ascending:true selector:"localizedStandardCompare:"
	return ((sortingArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"originalString") as list
end getSortedList

For purely inspirational purposes, the (hopefully) equivalent script in JavaScript:

const l = ["skip item (33)", "item (22)", "item (01)", "item (11)"];

const res = l.filter(x => /^i.*\(\d+\)/.test(x)).sort((a,b) => {
  const anum = +a.match(/\((\d+)\)/)[1];
  const bnum = +b.match(/\((\d+)\)/)[1];
  return anum-bnum;
})
console.log(res) 

That returns
["item (01)","item (11)","item (22)"]
which might not be exactly what the Applescript version does.

The filter method returns a new array containing only those elements of l that begin with “i” and contain a number in parenthesis. sort then sorts this array by using the anonymous function passed as a parameter. Note that this approach might not be the most performant one because of the repeated execution of match.

In any case, there’s not much sense in using .*$ in the regular expression since you don’t care what the rest of the string after the closing parenthesis is. But you still force the RE engine to continue its work. Also, .* is in many cases not a good idea because it might gobble up more then one wants.

chrillek. Thanks for looking at my thread and for the suggestion.

IMO, the use of .*$ is necessary in my third script. If it’s not included in the pattern, all of the string after the first instance of numbers in parentheses is included in the substring. It’s rare this would impact the sort order, but it seems best to set the substring to the desired substring and nothing else. I ran some timing tests with a list that contained 500 items, and there was no difference if I included .*$ or not.

You’re right of course. I was thinking along the lines of „use the capturing group for comparison“, whereas your code is relying on a modified string for that. My method is probably but possible with the ObjC frameworks.

chrillek. You make a good point. In testing with a 640-item list, an ASObjC script that filters then sorts is 36 percent faster (30 versus 47 milliseconds). The use of a capture group is still necessary to get the numbers within parentheses (without the parentheses), although including the parentheses with the substring probably wouldn’t impact the sort order.

use framework "Foundation"

set theList to {"skip item (33)", "item (22)", "item (01)", "item (11)", "item (skip item)"}
set sortedList to getSortedList(theList)

on getSortedList(theList)
	set theArray to current application's NSArray's arrayWithArray:theList
	set thePredicate to current application's NSPredicate's predicateWithFormat:"self MATCHES 'i.*?\\\\(\\\\d+\\\\).*$'"
	set theArray to theArray's filteredArrayUsingPredicate:thePredicate
	set sortingArray to current application's NSMutableArray's new()
	repeat with anItem in theArray
		set theSubstring to (anItem's stringByReplacingOccurrencesOfString:"^.*?\\((\\d+)\\).*$" withString:"$1" options:1024 range:{0, anItem's |length|()}) -- option 1024 is RegEx search
		(sortingArray's addObject:{originalString:anItem, sortString:theSubstring})
	end repeat
	set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"sortString" ascending:true selector:"localizedStandardCompare:"
	return ((sortingArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"originalString") as list
end getSortedList

Faster then the JavaScript variant?

I tested with Script Geek, and I don’t know how to test JavaScript with Script Geek. I created the test list with the following but didn’t include the creation of the list in the timing result:

set theList to {"skip item (33)", "item (22)", "item (01)", "item (11)", "item (skip item)"}
repeat 7 times
	set theList to theList & theList
end repeat
theList

FTR: I ran the JavaScript version in Scriptable on an iPad Pro 11 (M1) with this 640 element array in 6ms (using the Date.now() method after filling the array and after sorting: 6ms. To make the test more valuable, I increased the array size to 100000, for which the script ran in 882 ms.

Here’s the code:

const l = ["skip item (33)", "item (22)", "item (01)", "item (11)", "item (skip item)"];
const list = Array.from({length:20000}, () => l).flat(); /* 100000 elements, use 128 for 640 elements*/

const startTime = Date.now();
const res = list.filter(x => /^i.*\(\d+\)/.test(x)).sort((a,b) => {
  const anum = +a.match(/\((\d+)\)/)[1];
  const bnum = +b.match(/\((\d+)\)/)[1];
  return anum-bnum;
})
console.log(Date.now()-startTime);

Saving that in a file and running via osascript -l JavaScript <file> in the terminal should write the elapsed time in ms to the terminal.

1 Like

And a final timing result, this time for the AppleScript script here:

I used a list with 10240 elements, i.e. the repeat loop run 11 times. I then changed the call to the sort handler like so

set startTime to current application's NSDate's now
set sortedList to getSortedList(theList)
set thetime to ((startTime's timeIntervalSinceNow()) * -1000) as integer
log thetime

Then I used osascript to run the JavaScript version posted earlier with the same array length (i.e. 10240) and the AppleScript version (both on a Macbook Pro from 2019 with the current version of Ventura installed.

| JavaScript | AppleScript |
|       96ms |      1605ms |

Obviously, the JavaScript version is not only a lot shorter, but it also runs about 16 times faster. Which is not to imply that this is generally the case. But here, no ObjC framework and no marshalling between two languages is required.

chrillek. Thanks for the timing results.

I use AppleScript almost exclusively and wondered how JavaScript could be used in an AppleScript. Can the following be made to work?

set theList to {"skip item (33)", "item (22)", "item (01)", "item (11)", "item (skip item)"}

set sortedList to getSortedList(theList) --> {"item (01)", "item (11)", "item (22)"}

on getSortedList(theList)
const res = l.filter(x => /^i.*\(\d+\)/.test(x)).sort((a,b) => {
  const anum = +a.match(/\((\d+)\)/)[1];
  const bnum = +b.match(/\((\d+)\)/)[1];
  return anum-bnum;
})
console.log(res)
end getSortedList