Multiple element sorting

I have a list of real estate sales that includes, for each property, the sale date, the city, the addresses, the buyer, the seller, and the selling price. I wish to sort this by ascending date, then ascending city, then descending price. The data is in tab-delimited files. A simple sample:

05/01/2006\tSan Francisco\tJohn Doe\tMary Roe\t$850,000
05/02/2006\tMonterey\tJane Joe\tJerry Coe\t$700,000

In this example, the second line should be sorted to the top, since Monterey comes before San Francisco in the sort.

Is there any way besides creating a dummy document with the columns reorganized, first for the ascending sorts and then for the descending one? I.e., any scripting additions that’ll handle this? I’m currently doing it manually in Excel . . .

You could use something as this:

set x to {{3, 6}, {1, 5}, {2, 4}}

sortListOfLists(x, 2) --> {{2, 4}, {1, 5}, {3, 6}}


to sortListOfLists(lst, fld)
	script a
		property l1 : lst
		property l2 : {l1's item 1}
	end script
	repeat with i from 1 to count lst
		set f to a's l1's item i's item fld
		considering case
			if f < a's l2's item 1's item fld then
				set beginning of a's l2 to a's l1's item i
			else if f > a's l2's item -1's item fld then
				set end of a's l2 to a's l1's item i
			else --> loop
				repeat with x from 1 to count lst
					try
						if f > a's l2's item x's item fld and f < a's l2's item (x + 1)'s item fld then
							set a's l2 to (a's l2's items 1 thru x) & {(a's l1's item i)} & (a's l2's items (x + 1) thru -1)
							exit repeat
						end if
					end try
				end repeat
			end if
		end considering
	end repeat
	return a's l2
end sortListOfLists

The first step would be transforming your tab-delimited data into a list of lists, so it can be sorted (including coercion of dates, and $ to numbers). To apply a descending sorting, just use the current ascending one, then read items from the tail (item -1, item -2…), or just use the statement “reverse of sortListOfLists(thelist, thefieldtosortby)”.

jj,

How do I generalize your example to my case of having, say, 5 columns in which I’m sorting on 3. In other words, let’s say I had the following data (with ‘|’ representing a tab)

A | 5/2/2006 | San Francisco | $300,000
B | 5/1/2006 | San Francisco | $100,000
C | 5/1/2006 | Monterey | $120,000
D | 5/1/2006 | Monterey | $150,000
E | 5/2/2006 | Monterey | $100,000

I want to sort ascending by date, then (for same date) ascending by city, then (for same date and city) descending by price, so I end up with:

D | 5/1/2006 | Monterey | $150,000
C | 5/1/2006 | Monterey | $120,000
B | 5/1/2006 | San Francisco | $100,000
E | 5/2/2006 | Monterey | $100,000
A | 5/2/2006 | San Francisco | $300,000

I don’t see how a simple binary sort can readily yield these results. Maybe I just don’t understand the “list of lists” concept. My thought was that I need to somehow record the initial record order, then generate a new set of records in which my three fields are somehow concatenated . . . but it seems I’d need to pad the field lenths to not have a numeral from the price field being matched against a letter from the city field.

Hi,

The trickey part about your text is that the dates don’t have leading zeros. For instance, the date 5/2/2006 can be written 05/02/2006. With this form you can sort alphabetically for months. Note that this won’t work for different years. After you have the leading zeros, you need to place the fileds in order for sorting with date, city, amount, and label adding the padding. Then you could just sort the whole thing alphabetically.

The other method would be to sort each field by grouping where each group with greater then 1 item needs to be sorted. There might be an easy way to do this with recursion.

gl,

What you want isn’t exactly a trivial task! You need a sort routine, a subsort routine to control it, and a way to control the direction of the individual subsorts. You also need to convert the short dates to AppleScript dates and the currency values to numbers first, so that they sort by value rather than lexically. If your text is from a spreadsheet, it would be easier if that could be scripted to do the job.

Failing that, here’s a script that’s adapted from something I wrote for my own use. It’s long, but it works and is quite fast. I’ve used a convention of negative column numbers in the column list for ‘sortOnColumns()’ to specify reverse sorts. (Please don’t ask me to explain how it works!)

set originalText to "A	5/2/2006	San Francisco	$300,000
B	5/1/2006	San Francisco	$100,000
C	5/1/2006	Monterey	$120,000
D	5/1/2006	Monterey	$150,000
E	5/2/2006	Monterey	$100,000"

-- Break up the text into a list of "rows".
set theRows to originalText's paragraphs

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to tab
repeat with thisRow in theRows
	-- Break up each row into list of "column" values.
	set rowList to thisRow's text items
	-- Replace each short date with an AppleScript date.
	-- (Assumes the machine's short-date format matches the format supplied.)
	set item 2 of rowList to date (item 2 of rowList)
	-- Replace each currency value with an AppleScript number.
	set item 4 of rowList to currencyToNumber(item 4 of rowList)
	-- Append the "row" text to the list for a free ride during the sort.
	set end of rowList to thisRow's contents
	
	set thisRow's contents to rowList
end repeat

-- Sort successively on columns 2, 3, and 4, reverse-sorting on column 4.
sortOnColumns(theRows, {2, 3, -4})

-- Retrieve the orignal "row" strings from the sorted lists.
repeat with thisRow in theRows
	set thisRow's contents to end of thisRow
end repeat
-- Rejoin them into a single text.
set AppleScript's text item delimiters to return -- or ASCII character 10
set newText to theRows as text

set AppleScript's text item delimiters to astid
return newText


on currencyToNumber(d)
	set i to 1
	considering case
		repeat until character i of d is in "1234567890."
			set i to i + 1
		end repeat
	end considering
	set d to text i thru -1 of d
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ","
	set d to d's text items
	set AppleScript's text item delimiters to ""
	set d to d as text
	set AppleScript's text item delimiters to astid
	
	return d as number
end currencyToNumber

on sortOnColumns(theRows, sortColumns)
	-- theRows is a list of lists. The lists represent rows in a spreadsheet and their
	-- items are the values from the intersections with the corresponding columns.
	-- sortColumns is a list of the column numbers on which successive sorts and
	-- subsorts are to be based. Negative column numbers indicate reverse sorting.
	
	script o
		property rows : theRows
		property sortCols : sortColumns
		property maxRecursionLevel : count sortColumns
		
		script forwardSort
			-- x, isLess() and isGreater() are for the CustomQsort comparisons
			property x : missing value
			
			on isLess(a, b)
				script p
					property c : a
					property d : b
				end script
				
				(p's c's item x < p's d's item x)
			end isLess
			
			on isGreater(a, b)
				script p
					property c : a
					property d : b
				end script
				
				(p's c's item x > p's d's item x)
			end isGreater
		end script
		
		script reverseSort
			property x : missing value
			
			property isLess : forwardSort's isGreater
			property isGreater : forwardSort's isLess
		end script
		
		on soc(top, bottom, recursionLevel)
			-- Sort this batch of rows on this column
			set sortColumn to my sortCols's item recursionLevel
			-- If the supplied column number's positive, sort forwards; otherwise reverse sort.
			if (sortColumn > -1) then
				set forwardSort's x to sortColumn
				CustomQsort(my rows, top, bottom, forwardSort)
			else
				set reverseSort's x to -sortColumn
				CustomQsort(my rows, top, bottom, reverseSort)
			end if
			
			-- For every run of equal values in this section of this column
			-- recurse to subsort on the next specified column
			if (recursionLevel < maxRecursionLevel) then
				set nextRecursionLevel to recursionLevel + 1
				set i to top
				repeat with j from top + 1 to bottom
					if (my rows's item j's item sortColumn is my rows's item i's item sortColumn) then
					else
						if (j - i > 1) then soc(i, j - 1, nextRecursionLevel)
						set i to j
					end if
				end repeat
				if (j > i) then soc(i, j, nextRecursionLevel)
			end if
		end soc
	end script
	
	o's soc(1, count theRows, 1)
	
end sortOnColumns

--   A variation on the Garvey/Knapp qsort, 
on CustomQsort(a, l, r, compObj) -- compObj must have handlers isLess(a,b) and isGreater(a,b)
	script o
		property cutoff : 10
		property p : a
		on qsrt(l, r)
			set i to l
			set j to r
			set v to my p's item ((l + r) div 2)
			
			repeat while (j > i)
				
				set u to my p's item i
				repeat while (compObj's isLess(u, v))
					set i to i + 1
					set u to my p's item i
				end repeat
				
				set w to my p's item j
				repeat while (compObj's isGreater(w, v))
					set j to j - 1
					set w to my p's item j
				end repeat
				
				if (i > j) then
				else
					set my p's item i to w
					set my p's item j to u
					set i to i + 1
					set j to j - 1
				end if
			end repeat
			
			if (j - l < cutoff) then
			else
				qsrt(l, j)
			end if
			if (r - i < cutoff) then
			else
				qsrt(i, r)
			end if
		end qsrt
		
		on isrt(l, r)
			set x to l
			set z to l + cutoff - 1
			if (z > r) then set z to r
			
			set v to my p's item x
			repeat with y from (x + 1) to z
				if (compObj's isLess(my p's item y, v)) then
					set x to y
					set v to my p's item y
				end if
			end repeat
			
			tell my p's item l
				set my p's item l to v
				set my p's item x to it
			end tell
			
			set u to my p's item (l + 1)
			repeat with i from (l + 2) to r
				set v to my p's item i
				if (compObj's isLess(v, u)) then
					set my p's item i to u
					repeat with j from (i - 2) to l by -1
						if (compObj's isLess(v, my p's item j)) then
							set my p's item (j + 1) to my p's item j
						else
							set my p's item (j + 1) to v
							exit repeat
						end if
					end repeat
				else
					set u to v
				end if
			end repeat
		end isrt
	end script
	
	if r - l < o's cutoff then
	else
		o's qsrt(l, r)
	end if
	o's isrt(l, r)
end CustomQsort

Thanks, Nigel! I hadn’t expected it to be so complicated. I’ll give it a try early next week.

Dear Nigel

I use your sortOnColumns(theRows, sortColumns) scipt in post#5 for sorting a 800 x 10 list of lists and it is much faster than the quick sort handler (also get it from this forum) that I normally would use. Thank you for your 15 years-old work!

Thanks, ngan. It’s good to know the old stuff’s still useful. :slight_smile:

The sortOnColumns(theRows, sortColumns) handler would be much shorter if written today because:

  1. My custom sort handlers now only expect an isGreater(a, b) handler in the customising script objects. The effect of the old isLess(a, b) handler can be got simply by calling isGreater(a, b) with swapped parameters!
  2. I generally find a single sort with compound comparisons to be more efficient than sorting and subsorting on individual criteria. It’s certainly less code. :wink:

The parameters for my custom sorts are specified slightly differently nowadays too, but I haven’t changed that aspect here:

set originalText to "A	5/2/2006	San Francisco	$300,000
B	5/1/2006	San Francisco	$100,000
C	5/1/2006	Monterey	$120,000
D	5/1/2006	Monterey	$150,000
E	5/2/2006	Monterey	$100,000
F	5/2/2006	Monterey	$150,000"

-- Break up the text into a list of "rows".
set theRows to originalText's paragraphs

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to tab
repeat with thisRow in theRows
	-- Break up each row into list of "column" values.
	set rowList to thisRow's text items
	-- Replace each short date with an AppleScript date.
	-- (Assumes the machine's short-date format matches the format supplied.)
	set item 2 of rowList to date (item 2 of rowList)
	-- Replace each currency value with an AppleScript number.
	set item 4 of rowList to currencyToNumber(item 4 of rowList)
	-- Append the "row" text to the list for a free ride during the sort.
	set end of rowList to thisRow's contents
	
	set thisRow's contents to rowList
end repeat

-- Sort successively on columns 2, 3, and 4, reverse-sorting on column 4.
sortOnColumns(theRows, {2, 3, -4})

-- Retrieve the orignal "row" strings from the sorted lists.
repeat with thisRow in theRows
	set thisRow's contents to end of thisRow
end repeat
-- Rejoin them into a single text.
set AppleScript's text item delimiters to return -- or ASCII character 10
set newText to theRows as text

set AppleScript's text item delimiters to astid
return newText


on currencyToNumber(d)
	set i to 1
	considering case
		repeat until character i of d is in "1234567890."
			set i to i + 1
		end repeat
	end considering
	set d to text i thru -1 of d
	
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ","
	set d to d's text items
	set AppleScript's text item delimiters to ""
	set d to d as text
	set AppleScript's text item delimiters to astid
	
	return d as number
end currencyToNumber

on sortOnColumns(theRows, sortColumns)
	-- theRows is a list of lists. The lists represent rows in a spreadsheet and their
	-- items are the values from the intersections with the corresponding columns.
	-- sortColumns is a list of the column numbers on which successive sorts and
	-- subsorts are to be based. Negative column numbers indicate reverse sorting.
	
	script o
		property sortCols : sortColumns
		property sortColCount : (count sortCols)
		
		on isGreater(a, b) -- Should list 'a' be moved to after list 'b'?.
			script p
				property c : a
				property d : b
			end script
			
			-- With each pair of lists, compare the items from each sort column in turn until a decision's made.
			repeat with i from 1 to sortColCount
				-- Get the next sort column index.
				set x to my sortCols's item i
				-- If it's negative, descending order's required and the index needs to be made positive.
				set descending to (x < 0)
				if (descending) then set x to -x
				-- Get the values from the two lists.
				set va to p's c's item x
				set vb to p's d's item x
				-- If they're equal, no decision can be made based on this column.
				-- Otherwise the first list has to go after the second if the first value's 
				-- relationship to the second doesn't match the required sort direction.
				if (va = vb) then
				else
					return ((descending) = (vb > va))
				end if
			end repeat
			-- If we get here, the values from all the sort columns are the same in both lists.
			-- List 'a' doesn't have to go after list 'b'.
			return false
		end isGreater
	end script
	
	CustomQSort(theRows, 1, (count theRows), o)
end sortOnColumns

-- A variation on the Garvey/Knapp qsort, 
on CustomQSort(a, l, r, compObj) -- compObj must have handler isGreater(a,b)
	script o
		property cutoff : 10
		property p : a
		on qsrt(l, r)
			set i to l
			set j to r
			set v to my p's item ((l + r) div 2)
			
			repeat while (j > i)
				
				set u to my p's item i
				repeat while (compObj's isGreater(v, u))
					set i to i + 1
					set u to my p's item i
				end repeat
				
				set w to my p's item j
				repeat while (compObj's isGreater(w, v))
					set j to j - 1
					set w to my p's item j
				end repeat
				
				if (i > j) then
				else
					set my p's item i to w
					set my p's item j to u
					set i to i + 1
					set j to j - 1
				end if
			end repeat
			
			if (j - l < cutoff) then
			else
				qsrt(l, j)
			end if
			if (r - i < cutoff) then
			else
				qsrt(i, r)
			end if
		end qsrt
		
		on isrt(l, r)
			set x to l
			set z to l + cutoff - 1
			if (z > r) then set z to r
			
			set v to my p's item x
			repeat with y from (x + 1) to z
				if (compObj's isGreater(v, my p's item y)) then
					set x to y
					set v to my p's item y
				end if
			end repeat
			
			tell my p's item l
				set my p's item l to v
				set my p's item x to it
			end tell
			
			set u to my p's item (l + 1)
			repeat with i from (l + 2) to r
				set v to my p's item i
				if (compObj's isGreater(u, v)) then
					set my p's item i to u
					repeat with j from (i - 2) to l by -1
						if (compObj's isGreater(my p's item j, v)) then
							set my p's item (j + 1) to my p's item j
						else
							set my p's item (j + 1) to v
							exit repeat
						end if
					end repeat
				else
					set u to v
				end if
			end repeat
		end isrt
	end script
	
	if r - l < o's cutoff then
	else
		o's qsrt(l, r)
	end if
	o's isrt(l, r)
end CustomQSort

Edit: Minor comment clarification.

I will invest time to understand the technique behind.

A brief test, without using Script Geek, on sorting a list of list (800 x 10) by the two versions of sortOnColumns(theRows, sortColumns) on the same process on SD shows that the newer version is even slightly faster!