A merge sort

OK. To merge sort “on” anything, you need a customisable merge sort. Here’s the customisable version of the above, which I don’t think I’ve posted before:

13th September 2010: I’ve uploaded a new version of this script with improved customisation and different names for the “slave move” handlers, as part of a collection, to ScriptBuilders.
27th November 2013: Since ScriptBuilders is now defunct, I’ve replaced the code originally posted here with the latest version. There are differences in both the code and the form of the customisation parameter, so I’ve updated the entire post.
14th June 2015: The amount of waste has been further reduced (after eight years!) by only extracting the left halves of the merge ranges to separate lists. Comments and variable names have been changed and the former mutually recursive sortHalf() handler has been replaced with a couple of bits of in-line code. A reposition() hander has been added to the slave script object to cover for the fact that the right half of each merge now comes directly from the original list instead of from an auxiliary.

(* Merge sort ” customisable version
Merge sort algorithm: John von Neumann, 1945.
AppleScript implementation: Nigel Garvey, 2007/2015.

Parameters: (list, range index 1, range index 2, record with optional 'comparer' and 'slave' properties). The 'comparer' value is a script object containing an isGreater(a, b) handler which determines if object a is "greater" than object b. The 'slave' value is a script object containing an extract(a, b) handler, which is called when the sort has extracted a copy of range a thru b of the list being sorted; a reposition(a, b) handler, called when item b in the main list has been moved to position b; a merge(a, b) handler, called when item a in an auxiliary list has been moved to position b in the main list; and a swap(a, b) handler, called when items a and b in the main list have been swapped.
Where the 'comparer' or 'slave' properties are omitted, or the customisation parameter isn't a record, the sort has default handlers which respectively compare items directly and do nothing.
*)

on CustomMergeSort(theList, l, r, customiser) -- Sort items l thru r of theList.
	script o
		property comparer : me
		property slave : me
		
		property lst : theList
		property auxList : missing value
		
		on msrt(l, r)
			set leftR to (l + r) div 2 -- Left half end index.
			set rx to leftR + 1 -- Right half start/tracking index.
			
			-- Sort the left half by recursion if it has more than two items or by swapping as necessary if there are two. If one, do nothing.
			if (leftR - l > 1) then
				msrt(l, leftR)
			else if (leftR > l) then
				set lv to item l of my lst
				set rv to item leftR of my lst
				if (comparer's isGreater(lv, rv)) then
					set item l of my lst to rv
					set item leftR of my lst to lv
					slave's swap(l, leftR)
				end if
			end if
			-- Sort the right half similarly.
			if (r - rx > 1) then
				msrt(rx, r)
			else if (r > rx) then
				set lv to item rx of my lst
				set rv to item r of my lst
				if (comparer's isGreater(lv, rv)) then
					set item rx of my lst to rv
					set item r of my lst to lv
					slave's swap(rx, r)
				end if
			end if
			
			-- Extract a separate list of the left half's items and set tracking and end indices for it.
			set auxList to items l thru leftR of my lst
			slave's extract(l, leftR)
			set lx to 1
			set leftR to rx - l
			
			-- Iterate through the range, merging the two halves by comparing the lowest unassigned value from auxList with that from the right half of the range and assigning the lower of the two (or the left if they're equal) to the current slot.
			
			-- Begin with the first (lowest) value from each half.
			set lv to beginning of my auxList
			set rv to item rx of my lst
			repeat with dx from l to r
				if (comparer's isGreater(lv, rv)) then
					-- The right value's less than the left. Assign it to this slot.
					set item dx of my lst to rv
					slave's reposition(rx, dx)
					-- If no more right-half values, assign the remaining left values to the remaining slots and exit this repeat and recursion level.
					if (rx is r) then
						repeat with dx from (dx + 1) to r
							set item dx of my lst to item lx of my auxList
							slave's merge(lx, dx)
							set lx to lx + 1
						end repeat
						exit repeat
					end if
					-- Otherwise get the next right-half value.
					set rx to rx + 1
					set rv to item rx of my lst
				else
					-- The left value's less than or equal to the right.
					set item dx of my lst to lv
					slave's merge(lx, dx)
					-- If no more left-half values, simply exit this repeat and recursion level as the remaining right values are in place anyway.
					if (lx is leftR) then exit repeat
					-- Otherwise get the next left-half value
					set lx to lx + 1
					set lv to item lx of my auxList
				end if
			end repeat
		end msrt
		
		-- Default comparison and slave handlers for an ordinary sort.
		on isGreater(a, b)
			(a > b)
		end isGreater
		
		on extract(a, b)
		end extract
		
		on reposition(a, b)
		end reposition
		
		on merge(a, b)
		end merge
		
		on swap(a, b)
		end swap
	end script
	
	-- Process the input parameters.
	set listLen to (count theList)
	if (listLen > 1) then
		-- Negative and/or transposed range indices.
		if (l < 0) then set l to listLen + l + 1
		if (r < 0) then set r to listLen + r + 1
		if (l > r) then set {l, r} to {r, l}
		
		-- Supplied or default customisation scripts.
		if (customiser's class is record) then set {comparer:o's comparer, slave:o's slave} to (customiser & {comparer:o, slave:o})
		
		-- Do the sort.
		o's msrt(l, r)
	end if
	
	return -- nothing
end CustomMergeSort

property sort : CustomMergeSort

-- (* Contrived demo:
-- Reverse sort a list of random integers, sorting a list of matching strings in parallel with it.

script backwards
	-- Returns the opposite of the truth, for a reversed sort.
	on isGreater(a, b)
		(a < b) -- (a ≤ b) to reverse the stability too!
	end isGreater
end script

script parallel
	-- Duplicates moves from a merge sort in its own list.
	property lst : missing value
	property auxList : missing value
	
	on extract(a, b)
		set auxList to items a thru b of my lst
	end extract
	
	on reposition(a, b)
		set item b of my lst to item a of my lst
	end reposition
	
	on merge(a, b)
		set item b of my lst to item a of my auxList
	end merge
	
	on swap(a, b)
		tell item a of my lst
			set item a of my lst to item b of my lst
			set item b of my lst to it
		end tell
	end swap
end script

set l to {}
set l2 to {}
repeat 1000 times
	set end of my l to (random number 1000)
	set end of my l2 to result as text
end repeat
set parallel's lst to l2

sort(l, 1, -1, {comparer:backwards, slave:parallel})
l2
-- *)

Like CustomQsort() (a combined QuickSort/insertion sort by Arthur Knapp and myself), CustomMergeSort() takes a fourth, customisation parameter. Originally, this was a script object containing both comparison and “slave” handlers (which I’ll explain in a moment). But the new version above is more flexible. Its customisation parameter is a record with ‘comparer’ and ‘slave’ properties which separately specify script objects for the comparison and slave actions. The two kinds of action can thus be in the same or different script objects, allowing a more mix-and-match approach. Furthermore, both script objects are now optional, the sort having internal defaults to use when they’re omitted.

Comparisons: Instead of comparing the list items itself, CustomMergeSort() passes each pair to an isGreater() handler in the ‘comparer’ script object. This handler is expected to return a boolean indicating whether or not the value of the first item is “greater” than that of the second by the criteria the handler’s designed to employ. For instance, if the items are lists, the handler may be written to see if item x of list a is greater than item x of list b. Or if list a is longer than list b. Or if list a contains a certain kind of value and list b doesn’t. etc. If the items are records, the values of certain properties may need to be compared, eg.:

script compareRecordsByAardvark
	on isGreater(a, b)
		return (a's aardvark > b's aardvark)
	end isGreater
end script

Slave actions: Every time the sort moves items in the list, it sends information about the moves to relevant handlers in the ‘slave’ script object. The slave handler(s) will normally either reproduce the moves in another list ” thus sorting it in parallel with the first ” or simply be blank and do nothing at all. Unfortunately, you have to know how the sort moves things around to be able to write functional slave handlers yourself. The merge sorts here have three kinds of action and CustomMergeSort() expects to be able to call extract(), merge(), and swap() handlers in the ‘slave’ script object. To sort another list in parallel with the first, a “slave” script object might look like this:

script parallel
	property lst : missing value -- Set this property to the list to be sorted in parallel with the main list.
	property auxList : missing value
	
	on extract(a, b)
		-- The range a thru b of the main list (the "left half" for a merge) has been extracted for merging back into it. Do the same in respect of the slave list.
		set auxList to items a thru b of my lst
	end extract
	
	on reposition(a, b)
		-- Item b of the main list has been set to item a of the main list.
		set item b of my lst to item a of my lst
	end reposition
	
	on merge(a, b)
		-- Item b of the main list has been set to item a of the extracted left half.
		set item b of my lst to item a of my auxList
	end merge
	
	on swap(a, b)
		-- Items a and b of the main list have been swapped over.
		tell item a of my lst
			set item a of my lst to item b of my lst
			set item b of my lst to it
		end tell
	end swap
end script

Here are some example calls using the above script objects:

-- Sort items 1 thru -1 of a list of records by their 'aardvark' properties, simultaneously sorting another list in parallel.
-- The "slave" (parallel) list must have at least as many items as the main list.
set sortInParallel's lst to myOtherList
CustomMergeSort(myListOfRecords, 1, -1, {comparer:compareRecordsByAardvark, slave:parallel})

-- Sort the list of records, but without a parallel sort.
CustomMergeSort(myListOfRecords, 1, -1, {comparer:compareRecordsByAardvark})

-- Sort a list values which don't need custom comparisons, simultaneously sorting another list in parallel.
set sortInParallel's slaveList to myOtherList
CustomMergeSort(myListOfDates, 1, -1, {slave:parallel})

-- Do a non-custom sort! ie. compare values directly and perform no slave action.
CustomMergeSort(myListOfDates, 1, -1, {})

McUsr pointed out above that Merge Sort is a “stable” sort:

For my own use, I sometimes need to sort the rows of a spreadsheet on column F, subsorting on columns B, C, and E in turn. (In the script, the “rows” are a list of lists whose items are the cell values at the intersections with the columns.)

I use CustomMergeSort() for this. But instead of sorting on one column at a time and exploiting merge sort’s stability to keep the rest in relative order, I sort on all four columns at once using an isGreater() handler which compares items 6, 2, 3, and 5 of the rows in that order each time it’s called. It only needs to work through the column sequence until the two rows are found to have different values in one of the columns, and the entire process is accomplished with one sort instead of four. It’s analogous to the way strings are compared when they’re sorted.

Here’s a demo of sorting on columns. You have to supply your own list of lists and your own path to the CustomMergeSort script.

-- Sort rows from a spreadsheet range, sorting and subsorting on four of its columns.
-- This code only demos the use of the custom sort.
-- It uses CustomMergeSort, but the sortOnCols script object works with any of my custom sorts.

-- This handler controls the sort.
on sortOnColumns(theRows, sortColumns)
	script sortOnCols
		property rows : theRows
		property sortCols : sortColumns
		property sortColumnCount : (count sortColumns)
		
		-- Compare the values at the given sort columns in row list a with those in row list b.
		on isGreater(a, b)
			script p
				property row_a : a
				property row_b : b
			end script
			
			-- Repeat with each of the sort columns, in the given order.
			repeat with i from 1 to sortColumnCount
				set col to item i of o's sortCols
				-- Get the cell values from the intersection of this sort column with these two rows.
				set val_a to p's row_a's item col
				set val_b to p's row_b's item col
				if (val_a = val_b) then
					-- Keep going if the cell values are equal.
				else
					-- Otherwise return whether or not cell value a is greater cell value b.
					return (val_a > val_b)
				end if
			end repeat
			
			return false -- a's sort-column values are all the same as b's, so a's not greater than b.
		end isGreater
	end script
	
	-- Assuming one's CustomMergeSort() script is kept in the following location, load it and do the sort.
	set sorter to (load script file ((path to scripts folder from user domain as Unicode text) & "Libraries:Sorts:Custom Merge Sort.scpt"))
	
	sorter's CustomMergeSort(theRows, 1, -1, {comparer:sortOnCols})
end sortOnColumns

set theRows to aListOfListsExtractedFromNumbers
set sortColumns to {6, 2, 3, 5} -- Columns F, B, C, & E.

sortOnColumns(theRows, sortColumns)

Nice Mr.G!.

I shall really try this :slight_smile: I have to create a database first. But that is underway in another context. You sure have your hands with sort algorithms also. :slight_smile:

It is really nice to see practical implementations over the text book’s theoretical versions. -Without having to make them myself and knowing that every thing is in order. :smiley:

I think merge sort is really applicable when the the data is suspected to be a retrieved in a worst case order for instance from a make shift database under a web server or such, where all the incoming data is in fairly random order.

What concerns me with merge sort is the alleged usage of stack space. I’m not sure wether this applies to your implementation or not, since you save some list handling. Which Apple Script is a little bit cheap on. I’ll come back to you on this in not too long time I hope.

It seems that, while qSort() and CustomQsort() have to do considerably less moving around than mergeSort() and CustomMergeSort() to get a list in order, they actually perform quite a few more comparisons of the items. (The actual numbers depend, of course, on the length of the list and what’s in it.) The greater amount of work involved in custom comparisons therefore affects CustomQsort() more than it does CustomMergeSort() and is enough to swing the speed advantage in the latter’s favour.

Later on, I’ll bung parallel sorting into the mix and see how things stand in that respect…

That will be interesting to see.

But won’t customMergeSort still move a lot more data around? (This is right from the text book, I haven’t actually looked at how you have implemented your version compared to the text book version of Merge Sort.)
If it is, it will require far more stack space if that is ever to be an issue.

Parallel sorting is something I’m looking forward to see. :slight_smile:

I will soon work on some tree structures, which enables me to reach the parent node from a child. Hopefully I get a way with a doubly linked list. I have implemented such a structure fairly well in java, and will now convert it, which should be really easy. I believe I have implemented a balanced tree as well, but that may just be a wish. I have to have a look in my tree. :slight_smile:

I want to present Script libraries in the dot language, and render it with graphwiz (But the renderings will not be more complex than the object diagrams in ScriptDebugger, they will be simpler as a matter of fact.

Maybe a late response but for the record: qsort is slower than mergesort. The name is confusing and by most considered therefore the fastest sort. The perfect (fastest) sort in qsort is equally fast as mergesort, the worse case sort in qsort is half as fast as the perfect sort meaning it’s about 39% slower mergesort. Because mergesort is stable, unlike qsort, the sort speed is consistent. Because mavericks has the new scirpt library, I was looking in my “old” library of handlers and found this:


mergeSort({9, 4, 2, 7, 5, 8, 6, 1, 3})
mergeSort({"mouse", "dog", "cat", "bird", "snake", "fish"})

on mergeSort(lst)
	set sortedList to {}
	if (count lst) < 2 then return lst
	set m to (count lst) div 2
	set l to mergeSort(items 1 thru m of lst)
	set r to mergeSort(items (m + 1) thru -1 of lst)
	repeat until ((count l) = 0 and (count r) = 0)
		if (count l) ≠ 0 and ((count r) = 0 or (item 1 of l) < (item 1 of r)) then
			set end of sortedList to item 1 of l
			set l to rest of l
		else
			set end of sortedList to item 1 of r
			set r to rest of r
		end if
	end repeat
	return sortedList
end mergeSort

In general mergesort performs better for linked lists instead of linear arrays. The problem for linear arrays is that mergesort needs to allocate sub arrays or as McUsr wrote “move a lot more data around”. But for linked lists, a list designed to insert and remove data quickly no matter how big the list is, mergesort comes really to it’s right because there is no movement of data, only pointer values. AppleScript uses linked lists but unfortunately they’re presented and accessible only as linear arrays. This means that in my example code I’m forced to create new sub lists and return a new sorted list. Still it’s more than 2 times faster than a qsort.

Hello.

I can’t get this out of my mind, so I have to reply! :slight_smile:

First of all, I can’t argue that mergesort will run out of stack space, because the limit of AppleScript lists are 2^14 = ca. 16384 elements, which will be a boundary long before we run out of stack frames.

But you could improve the quicksort if you calculate the median, so that who’s best is really a question of the data. This is a big discussion in between users of the median of medians algorithm at the moment by the way.

(I believe Nigel calculates the median for quicksort in the thread with Arthur Knapp.)

But. twice the speed and no work, makes for an easy choice. Personally I’d choose it for the stable properties.

Thanks for sharing, together with Nigel’s CustomSort my dose of sorts is almost complete. :slight_smile:

I agree:

on sortAList:aList
	set anArray to current application's NSArray's arrayWithArray:aList
	return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

:wink:

(Actually, the speed difference is considerably greater than that.)

It’s a pity that stable sorting isn’t directly accessible from AppleScriptObjC.

Hi DJ.

The qsort to which I was referring was the one I’d previously mentioned as having been written by Arthur Knapp and myself, which is a hybrid Quicksort/insertion sort with a single sweep of the insertion sort replacing the lower recursion levels of the Quicksort. While an insertion sort isn’t particularly efficient in itself, it’s absolutely brilliant when there’s not much sorting left to do ” such as after an incomplete Quicksort ” because then it doesn’t do much! The lower levels of a Quicksort, on the other hand, have to keep “dividing and conquering” ever smaller and more numerous subranges with very little actual sorting taking place by then. The hybrid is thus generally faster than a straight Quicksort and is faster with integers than my merge sort. Other methods which have been suggested for improving Quicksort involve using the best pivot value at each level without taking too much time to find out what that is, and/or by grouping all the values which are equal to the pivot so that they can all be eliminated together from the rest of the sort.

While it’s possible to talk theoretically about the comparative speeds of sorts using "O"s and “worse-case scenarii”, that’s not enough in a high-level language like AppleScript. The way algorithms are implemented in the language of choice also has a tremendous effect on what is actually fast, as evidenced by the fact that my merge sort in post #1 sorts a list of 10,000 random integers in around 0.85 seconds (on my machine) whereas yours takes around 41.8 seconds. (The qsort just mentioned takes 0.66 seconds, a Shell sort implementation of mine takes 1.28 seconds, and my ternary heap sort takes 1.11 seconds.) There are also factors such as how long it takes to compare particular types of items in the language (strings usually take longer than integers, for example) and how long it takes to move them, which could influence whether an algorithm with fewer moves is faster or one with fewer comparisons. Then there’s the number of items to be sorted, the number of different values in relation to the number of items, how much sorting needs to be done, and so on. You can’t just say “Merge sort is the fastest sort. Here is one.”

However, at 0.55 seconds for 10,000 random integers, the fastest vanilla sort in my collection is my as yet unpublished non-recursive ternary merge sort, just beating its recursive ternary sibling by a few hundredths of a second ” usually. :slight_smile:

OT: I note that having run these tests, the AppleScript Editor in Mavericks is claiming not to be able to revert my scripts to their original saved state. It’s bad enough having to revert them in the first place, but this is the pits. However, the files’ modification dates haven’t changed and the scripts seem to open in their original state. :confused:

Now that you’re on Mavericks, perhaps you can run the ASObjC version for comparison. I don’t know how our Macs compare, but I just ran it with a simple list created by:

set list1 to {}
repeat 10000 times
	set end of list1 to random number from 1 to 10000
end repeat

And I’m getting consistent times around 0.034 seconds.

(And most of that time is spent converting from AS list to array and back. it looks like the actual sorting takes in the order of 0.007 seconds.)

Sure. How do I do that?

Make a new document in ASE and add this:

on sortAList:aList
	set anArray to current application's NSArray's arrayWithArray:aList
	return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

Save it to ~/Library/Script Libraries/ (you’ll have to make that folder) as a .scptd script bundle.

Once it’s saved, the Bundle Contents button in the toolbar will be enabled. click, and in the drawer that appears, click on the AppleScript/Objective-C Library checkbox. Save again.

Then open new window and type:

use theLib : script "<name of first file>" 
set list1 to {}
repeat 10000 times
   set end of list1 to random number from 1 to 10000
end repeat
theLib's sortAList:list1

Add you timer of choice and run.

‘random number’ doesn’t compile while the ‘use’ command’s there.

Once you use a use(!), you need to include “use scripting additions” if you want to use them (which in reality means you do it always).

(sorry, I wrote the last email while being called to dinner…)

OK.

1.194 seconds on the first run, 0.081-ish subsequently.

There’s probably a bit of loading done for the first run, but settles down to be pretty consistent. I believe it uses a form of quicksort – or it did. It’s hard to be sure because NSArray is a cluster of classes, so the code used can vary depending on things like the number of items. It’s definitely unfair competition :frowning:

If you modify the script like this:

on sortAList:aList
   set anArray to current application's NSArray's arrayWithArray:aList
   anArray's sortedArrayUsingSelector:"compare:"
   anArray's sortedArrayUsingSelector:"compare:"
   ... <say ten extra sorts>
   return (anArray's sortedArrayUsingSelector:"compare:") as list
end sortAList:

you can calculate the extra time taken, and work out how long the actual sorting takes, and how much of the time is spent going from AppleScript to Cocoa and back.

@Nigel: I replied to your statement about qsort and mergesort and I replied to that in a more general way and not to your custom mergesort and qsort. Then I added that I had an true mergesort handler and compared it with a true qsort handler and unlike normal 39% average slower qsort, in AS the qsort is much slower. Again, no harm against your custom sorts but since they’re called qsort and mergesort like normal sort mechanisms I think it was useful to reply to the real qsort and mergesort mechanisms because it was lacking in the only topic on MacScripter about mergesort.

So I created a sorted list as described in the concept above, a true mergesort IMO.

p.s. Sorting a list of 10,000 unique integers took only 2 seconds on my machine, qsort took around 10 seconds.

I see I didn’t reply to this at the time.

It’s not so much the data which are moved, but the pointers in the lists. And there are different approaches to that. The text book merge sort creates three new arrays at each recursion: two containing the items from each half of the array passed down from above, which are themselves then passed down to lower levels, and a third which is cobbled together from two received back from the lower levels and then passed up to the level above. It’s very expensive in terms of array creation.

In my merge sorts, only the indices of ranges within the original list are passed down. Instead of receiving back sorted lists, each recursion receives a situation whereby those ranges in the original list have been sorted. It then creates one new list covering both ranges and merges the items from each half of that back into the original. There are thus far fewer temporary lists generated (though still a lot!) and the result’s an in-place sort of the original list. There’s the additional bonus in the merge stages that if all the left items are used up, there’s no need to bother with the remaining right items as they’re already in place in the range from which they were taken.

Hi DJ.

qsort is perhaps a little misleadingly named, since it’s a hybrid. On the other hand, it isn’t actually called “Quicksort”. :wink:

The merge sort, however, is algorithmically a true merge sort. It’s just an in-place implementation.

I can assure you it is quicksort Wirth was talking about.

By the way, thanks for your belated reply. Well, since it is is christmas today, I’ll tell you how I calculated that mergesort never will run out of stackspace:

I presume AppleScript crashes after 100 recursions, and that mergeseort splits its lists in two.

Then we have the inequality log2X>100, solving that by raising everything up to e, we are left with x*2 = e^100.
and (e^100)/2 is a number that surpasses 16340 quite a few times. :slight_smile:

Well, I’ll go back to the AS testing framework, while I am riding the waves.

And by the way, I think your CustomSort to be the most technical excellent handler I have seen in AppleScript, and most useful too, when you need it! :wink:

Hello.

Nigel’s script in post #6 here is really pivotal!

Thing is, that a stable sort handler, can solve a slew of problems, concerning structuring data, from say a graphics program, or any thing else for that matter, that is not made with structure in mind.

As such, a stable sort algorithm, and one that works well, is totally indispensable, the moment you need one. There is really no substitute for it.

Edit

You can of course write as complex a comparator as you want, to “emulate” the stable sort. But what you can’t do, is preparating the data (fields) inside the comparator, not in every case anyway. But that is something that you easily can do between runs of the merge sort. So it is a really “low-treshold” tool for turning possible nightmares into a breeze.

(I’d rather call it an instrument, than a tool.)