Sort a list of strings based on a substring

That sounds challenging. First off, you can’t directly include JS in AS code since the script engine is „set“ (for lack of a better term) to AS.

Second, passing the list as a parameter to JS is not obvious. I’ll try to find out more about this.

1 Like

Given that the items to be sorted are all strings, they can be concatenated to form part of the JS code string:

{"skip item (22)", "item (33)", "item (100)", "item (011)", "item (skip item)"}
sort_(result) --> {"item (011)", "item (33)", "item (100)"} 

to sort:(L as list)
        set my text item delimiters to character id 0
        run script "`" & L & "`.split('\\x00')
        .filter(x => x.match(/[0-9]/) && !(
                     x.match(/skip item/i)))
        .sort((a,b) => a.match(/[0-9]+/)
                     - b.match(/[0-9]+/));
        " in "JavaScript"
end sort:

The key aspect of this script is the joining together of a list of a strings using a nul byte (character id 0), as this wouldn’t ordinarily be a character that features in conventional strings: where they most often appear are as a marker delimiting special segments of a string in which specific byte positions or ranges are allocated for specific pieces of information. This is precisely the function they’ll be serving here, by delimiting the boundaries between distinct items in the original list, which will be passed into the JXA code as a single string.

Thus, after setting the text item delimiters to character id 0, it is safe to perform this operation with L (the list of string items):

"`" & L & "`.split(..."

This implicitly coerces L from a list into a string, performing serial concatenations of all of its items with the nul byte as described above. In the JavaScript code, this string is enclosed inside a pair of back-ticks, i.e. `․․․` , which allows for the possibility of multi-line string items that might feature in the original list. However, it would require that any occurrence of a back-tick character in any of the string items be encoded before being sent through. If multi-line strings are not a consideration, it would be better to do this instead:

quoted form of (L as text) & ".split(..."

After passing the string representation of the original list through to the JXA environment, it needs to be converted from a string back into separate items housed within a native Array-prototype object. The items were originally joined using the nul byte (character id 0), so the string now needs to be split at every occurrence of a nul byte (which, in JavaScript, is expressed as the hexadecimal escaped form \x00—this will need to be double-escaped, of course), i.e.:

"`" & L & "`.split('\\x00')

or, if preferred:

quoted form of (L as text) & ".split('\\x00')"

The remainder of the JXA code acts upon the array to sort it using the custom callback function, which is by no means the best example of such a function here, but it sufficiently handles and correctly sorts the test items supplied, returning a JavaScript array, which conveniently gets converted back into an AppleScript list upon completion of run script command.

I didn’t run that yet but am a bit puzzled: afaik, match returns an Array. What is the difference of two Arrays that is calculated in sort? And why do you use g in the regular expression there?

Edit To answer my own question: The - operator applied to two single-element arrays calculates the difference of the two elements and returns a number. But: You use g in the regular expression. So, if the string contains more than one match, the result is NaN:

var a = '123 456';
var b = '789 123';
a.match(/\d+/g)-b.match(/\d+/g); // [123,456]-[789,123] => NaN

Since we’re interested only in the first number, one could simply use
/\d+/ without the g in the match.

Perhaps you could elaborate a bit on the working of your script? I gathered this

  • result contains the list defined in the first line
  • in sort, the first line establishes ASCII 0 as the new text item delimiter
  • In run script, using the string concatenation operator on L casts the list to a string, with the components separated by ASCII 0.

The rest is straightforward (except for the problem with the global flag in the match operator, see above).
But the result of the sort is a JavaScript array, and that seems to be returned as an AppleScript list to the caller – how/why does that happen? I’d expected that AS lists and JS Arrays are different beasts that can’t be simply passed between the two languages.

Quite right. That was a typo that resulted from my toying around with two different implementations, one of which utilised .replace(/^[0-9]/g, '') as a possibility for a more robust solution. But I decided not to focus on solving the sorting issue, which had already received a myriad solutions, for which I personally would elect to use Nigel’s solution. When I swapped the match() function back in, I left the g option flag in place by mistake.

What I had aimed to address specifically was the problem that I quoted from your earlier message in my response, namely:

I’ll add the details to highlight the key aspects of what the script is doing in order to work around this specific issue.

There’s not a huge difference between lists and arrays, the main one being how memory is allocated and subsequently how data is stored (arrays use a single, contiguous block of memory, which is allocated before the array is created, fixing the size of the array until its destruction). JavaScript arrays are a bit weird, of course, allowing greater flexibility and ease-of-use for high-level programming. Conceptually, bridging between a JavaScript Array and an AppleScript list is not only achievable, but I imagine arguably essential as part of the remit of Apple’s Open Scripting Architecture endeavour.

Each of the core AppleScript data types partners with an equivalent JavaScript prototype:

AppleScript JavaScript
string String.prototype
text String.prototype
real Number.prototype
integer Number.prototype
date Date.prototype
list Array.prototype
record Object.prototype
file Path.prototype
boolean Boolean.prototype

There was a time when calling JXA code from within AppleScript was properly implemented by way of the run script "..." in "JavaScript" using parameters {...} command. The supplied parameters would be passed into the JXA’s run() function, which would be explicitly declared, allowing the sending of AppleScript data types cleanly into the JavaScript context, and the receipt of JavaScript data types back to the AppleScript context. The last time I remember this working, at least in a limited fashion, was Mojave, and in a much more substantial manner in High Sierra.

Sadly, it doesn’t seem possible to send parameters through using run script (actually, I journey out still works fine, but the run() function doesn’t seem able to yield a meaningful return value, instead returning error code -4960. However, this seems to be a specific fatal flaw of the run() function, as it’s entirely possible to house code inside a named function, which, if called in the last line of a JXA script, will happily pass its result back through to the AppleScript context.

:+1:
Thanks a lot for spelling out all the details. I was not aware that there’s a direct mapping between all the AS and JS types. And I was still thinking do shell script "osascript …" when I asked about passing the array back to AS.

I did a little fine-tuning of my scripts included above and thought I would post the final result. As written, the script finds and sorts on a substring that matches a regex pattern. I’ve included an alternative that finds and sorts on a regex capture group, which allows a more refined pattern match. The script contains rudimentary error checking that reports an error when a match is not found. The timing result with a list of 100 items was 12 milliseconds.

use framework "Foundation"
use scripting additions

set theList to {"/Users/Peavine/File 103.txt", "/Users/Peavine/File 101.txt", "/Users/Peavine/File 102.txt"}
set sortedList to getSortedList(theList)

on getSortedList(theList)
	set theArray to current application's NSArray's arrayWithArray:theList
	set sortingArray to current application's NSMutableArray's new()
	repeat with aString in theArray
		set aRange to (aString's rangeOfString:"\\d+\\." options:1024) -- change pattern as desired
		if aRange's |length|() is 0 then display dialog "Match not found" buttons {"OK"} cancel button 1 default button 1
		set aSubstring to (aString's substringWithRange:aRange)
		(sortingArray's addObject:{originalString:aString, sortString:aSubstring})
	end repeat
	set theDescriptor to current application's NSSortDescriptor's sortDescriptorWithKey:"sortString" ascending:true selector:"localizedStandardCompare:"
	return ((sortingArray's sortedArrayUsingDescriptors:{theDescriptor})'s valueForKey:"originalString") as list
end getSortedList

(* 
-- replace the first 3 lines of the repeat loop with the following to use regex capture group
set aSubstring to aString's mutableCopy()
set matchFound to (aSubstring's replaceOccurrencesOfString:".*(\\d+)\\..*" withString:"$1" options:1024 range:{0, aString's |length|()})
if matchFound is not 1 then display dialog "Match not found" buttons {"OK"} cancel button 1 default button 1 
*)