Improve your Applescripts: Sorting - The Big "O"

Finding What Sort Of Scripter You Are

In my previous columns in this series, I showed you a technique to simplify your script creation (pseudocode) and how to use Applescript to create three of the basic data structures that are used in computer science (stacks, queues, and priority queues). The next usual topic in most computer science courses is sorting. Sorting a list of items is one of the most common actions in any programming language, and it is often given sort shrift by most programming language creators. While some languages provide a built-in sort function, many (including Applescript) do not. But when you are showing the user of your script a long “choose from list” dialog it is much easier for the user to select something if the list is sorted.

A Bubbly Personality

Perhaps the oldest and easiest to understand sorting method is the bubble sort. Simply put, the bubble sort compares the first two items in the (unsorted) list. If the first item is greater than the second, the items are swapped. Then the second item (which was the first if there was a swap last time) and the third item are compared and swapped if necessary. The process continues until you have worked through all the times, finally placing the largest item in the last position.

However, we’re not nearly done! Now, we perform the same operation again, sorting the item next-to-last in the list, and then again for the item before next-to-last; you can see that with each pass, 1 item finds its proper place, but if the list has n items, you must loop through the list (n-1)+(n-2)+(n-3)…down to…1 times! So to sort 10 items, you have to do 9+8+7+6+5+4+3+2+1=45 comparisons. In AppleScript, the bubble sort looks like this:

on bubbleSort(theList)
	-- defining an internal script makes for faster run times!
	script bs
		property alist : theList
	end script
	set theCount to length of bs's alist
	if theCount < 2 then return bs's alist
	repeat with indexA from theCount to 1 by -1
		repeat with indexB from 1 to indexA - 1
			if item indexB of bs's alist > item (indexB + 1) of bs's alist then
				set temp to item indexB of bs's alist
				set item indexB of bs's alist to item (indexB + 1) of bs's alist
				set item (indexB + 1) of bs's alist to temp
			end if
		end repeat
	end repeat
	return bs's alist
end bubbleSort

While the bubble sort is okay for small lists (less than 20 items), it begins to bog down and get very slow for large numbers of items. The reason is that to sort the entire list (no matter how large it is) you have to do roughly (n*(n+1))/2 comparisons. To computer scientists and mathematicians, the important part of that equation is n*n or n squared.

Now, for 10 items we only did 45 comparisons, but 45 is “in the neighborhood” of 10 squared (or 100). For 100 items, we would do 99+98+97+…+1 comparisons, which works out to 5000 comparisons, which is “in the neighborhood” of 100 squared (or 10,000).

When comparing sorting algorithms or other processes, computer scientists look for the “order” of the operation, also called “the Big O” (for “order”). By this they mean, is the amount of processing required related to the number of inputs by simply n itself, n squared, or some other factor? In the case of the bubble sort, the operation is “big O of n squared.” We can do better. Although the bubble sort is good for learning to sort, it is basically useless for any real sorting. There is a variation of the bubble sort where you keep track of whether you’ve had to swap items during a loop through the data, and if not, you’re done, but even this doesn’t help the bubble sort much.

Any time you have a repeat loop inside another repeat loop, the total number of times the program will loop is the number of times through the outer loop MULTIPLIED BY the number of times through the inner loop. This is why bubble sort is Big O n squared. Keep this in mind any time you are writing nested repeats!

Insert Tab A into Slot B

I don’t know about you, but when I actually sort something by hand, I start a sorted stack (initially empty) and pull items from the unsorted bunch and insert the new item in its proper place in the sorted stack. If I’m short on workspace, I might even just arrange the items in order in the same stack of items. These are both examples of an insertion sort. For worst case (totally random sort), the bubble sort (n squared over 2) and the insertion sort (n squared over 4) are of about the same order. But in the situation where the data is basically in some order, then the insertion sort performs fewer swaps and is of the order (n+d) where d is the number of swaps performed. This is considerably better than n squared! Add to that the feature that you can add items spontaneously and they will be inserted in the proper place, and you have a much stronger sort than the bubble sort, which must operate on a fixed number of items.

Notice also that even though this sort has nested repeat loops, the inner loop only looks until it finds the spot for the sorted item, not through the entire list of items like bubble sort!

on insertSort(theList)
	--Insertion sort using the same list 
	--(as opposed to using a second list)
	script bs
		property alist : theList
	end script
	set theLength to length of bs's alist
	if theLength > 1 then
		set insertHere to 2
		repeat while insertHere ≤ theLength
			if last item of bs's alist ≤ first item of bs's alist then
				set bs's alist to {last item of bs's alist} & items 1 thru (theLength - 1) of bs's alist
				set lookfor to 1
				repeat while lookfor < insertHere and item lookfor of bs's alist ≤ last item of bs's alist
					set lookfor to lookfor + 1
				end repeat
				set bs's alist to items 1 thru (lookfor - 1) of bs's alist & last item of bs's alist & items (lookfor) thru (theLength - 1) of bs's alist
			end if
			set insertHere to insertHere + 1
		end repeat
	end if
	return bs's alist
end insertSort

The Urge to Merge

The merge sort is based on a totally different idea. You first compare item 1 and item 2 and put them in the correct order, forming a sublist of 2 items. You do the same for items 3 and 4, making another sublist, and so on until you reach the end, creating two-item sublists that are in order within. Then you begin comparing the first item of two adjacent sublists. So you compare item 1 to item 3 (the first item of list A to first item of list B) and if they need to swap, do so. If they don’t, then compare item 3 to item 2, swapping if necessary. When you are done with item 3, then do item 4 and now you’ve merged the two small two-item lists into a larger, four-item list. You continue this logic for all items, making larger sublists until you have two sublists containing half of the total items each, and then you merge those two lists.

The Applescript implementation of the merge sort requires us to create a handler inside the main sort handler, since this is a recursive algorithm. While the merge() handler has a repeat loop, it does NOT have any nested repeats. That alone should tell us that this sort is faster! In fact, the merge sort is of the order n*log(n), which is considerable better than n squared. Better yet, that n*log(n) performance holds true for both the average AND the worst cases!

on mergeSort(theList)
	--the public routine, to be called from your script
	script bs
		property alist : theList
		on merge(leftSide, rightSide)
			--private routine called by mergeSort. 
			--do not call from your script!
			set newList to {}
			set theLeft to leftSide
			set theRight to rightSide
			set leftLength to length of theLeft
			set rightLength to length of theRight
			repeat while leftLength > 0 and rightLength > 0
				if first item of theLeft ≤ first item of theRight then
					set newList to newList & first item of theLeft
					set theLeft to (rest of theLeft)
					set newList to newList & first item of theRight
					set theRight to rest of theRight
				end if
				set leftLength to length of theLeft
				set rightLength to length of theRight
			end repeat
			if leftLength > 0 then set newList to newList & theLeft
			if rightLength > 0 then set newList to newList & theRight
			return newList
		end merge
	end script
	set midList to 0
	set leftList to {}
	set rightList to {}
	set listLength to length of bs's alist
	if listLength ≤ 1 then
		return bs's alist
		set midList to listLength div 2
		repeat with pointer from 1 to midList
			set leftList to leftList & item pointer of bs's alist
		end repeat
		repeat with pointer from (midList + 1) to listLength
			set rightList to rightList & item pointer of bs's alist
		end repeat
		set leftList to mergeSort(leftList)
		set rightList to mergeSort(rightList)
	end if
	return bs's merge(leftList, rightList)
end mergeSort

The Quick and the Dead

In an amazing display of common sense, the fastest sort that is widely used is called quicksort. The quicksort algorithm is a bit more complex than the other sorts, but the results are nothing short of amazing. Using quicksort, large lists of items that are unsortable in an amount of time that a user is willing to wait are sortable in a fraction of the time.

The idea of the quicksort is to choose a midpoint item and all the items larger than that item (called the “pivot”) go to the right of it and the smaller items go to the left. These two sublists are then sorted the same way: choose a pivot, sort the items smaller to one side, larger to the other. It sounds simple enough, but there have been all kinds of methods for choosing the pivot item. The simplest example of quicksort just chooses the item in the middle of the list. The version I’ve listed below uses an internal handler to optimize the selection of the pivot value:

on quickSort(theList)
	--public routine, called from your script
	script bs
		property alist : theList
		on Qsort(leftIndex, rightIndex)
			--private routine called by quickSort. 
			--do not call from your script!
			if rightIndex > leftIndex then
				set pivot to ((rightIndex - leftIndex) div 2) + leftIndex
				set newPivot to Qpartition(leftIndex, rightIndex, pivot)
				set theList to Qsort(leftIndex, newPivot - 1)
				set theList to Qsort(newPivot + 1, rightIndex)
			end if
		end Qsort
		on Qpartition(leftIndex, rightIndex, pivot)
			--private routine called by quickSort. 
			--do not call from your script!
			set pivotValue to item pivot of bs's alist
			set temp to item pivot of bs's alist
			set item pivot of bs's alist to item rightIndex of bs's alist
			set item rightIndex of bs's alist to temp
			set tempIndex to leftIndex
			repeat with pointer from leftIndex to (rightIndex - 1)
				if item pointer of bs's alist ≤ pivotValue then
					set temp to item pointer of bs's alist
					set item pointer of bs's alist to item tempIndex of bs's alist
					set item tempIndex of bs's alist to temp
					set tempIndex to tempIndex + 1
				end if
			end repeat
			set temp to item rightIndex of bs's alist
			set item rightIndex of bs's alist to item tempIndex of bs's alist
			set item tempIndex of bs's alist to temp
			return tempIndex
		end Qpartition
	end script
	if length of bs's alist > 1 then bs's Qsort(1, length of bs's alist)
	return bs's alist
end quickSort

Some versions of quicksort just grab the first item for the pivot, but if the list is already somewhat sorted, then the benefits of quicksort evaporate and you’re stuck with a sort that is Big O n squared again! To get around this fault in quicksort, some prefer to choose the middle item in the list or the middle value of the first, last and middle items. This version starts with the middle value and adjusts it when creating the partitions to find an optimal value. By doing this, this sort is almost always on the order of n*log(n) even in the worst possible case.

Out of Sorts

Sorting is something all of us use (or want to use) at one time or another. I hope this has helped show the pitfalls of sorting so you can select a sort that is appropriate for your use. I originally started working on sorting because I had an iTunes script that I wanted to display a list of my albums and the initial list was an unsorted mess. In researching sorts, I found that some are so slow that the are completely unusable. Others were only slow because I’d coded them in such a way that Applescript was dragging its feet, and recoding them speeded them up greatly. The quicksort above is VERY usable, even on an old G3 iMac (like mine) with an iTunes library of about 1000 songs. The “choose from list” dialog pops up very fast!

There is also a great thread in the MacScripter bbs’ Code Exchange forum on sorting, and I’d like to thank everyone who participated in that discussion, as it not only helped me to write these sort routines, they also helped me improve these routines for speed! All the sort routines in this article, plus the versions of quicksort optimized by Adam Bell and by Nigel Garvey that are in the Code Exchange thread, are in my Nite Flite library’s sortLib on as well.

More Information

For more information about sorting:

Wikipedia - Sorting Algorithms
Wikipedia - Bubble Sort
Wikipedia - Insertion Sort
Wikipedia - Merge Sort
Wikipedia - Quicksort

1 Like