Two FULLY in-place merge sorts

Nigel_Garvey · August 21, 2015, 11:13pm

One of the “disadvantages” often quoted for merge sorts is that they use a lot of memory. This is because they’re usually implemented recursively and because each recursion step involves the creation of an additional list into which (or from which) that particular merge is done.

But iterative merge sorts are very obvious and simple to implement and, since they perform all the merges of a particular size in the same pass, they can use just one additional list for an entire sort, merging back and forth between it and the original. (See this, for example.)

Here though, just for the hell of it, are two merge sorts which use no additional lists at all. Obviously the problem to be solved has been what to do with values currently sitting in slots where other values have to be inserted during the merge.

The first script uses the simple expedient of right-shifting all the “left” values between a “right” value and its target slot. In this respect, it resembles an insertion sort, but it has merge sort’s binary structure and comparison regime and is slightly faster than an insertion sort. (Faster than my own insertion sort, that is, and given an initially chaotic list).


(* Merge sort
Merge sort algorithm: John von Neumann, 1945.
AppleScript implementation: Nigel Garvey, 2007. Iterative version without auxiliary list: 2015.

Parameters: (list, range index 1, range index 2)
*)

on sort(theList, l, r)
	script o
		property lst : theList
	end script
	
	-- Process the input parmeters.
	set listLen to (count theList)
	if (listLen < 2) then return
	-- Negative and/or transposed range indices.
	if (l < 0) then set l to listLen + l + 1
	if (r < 0) then set r to listLen + r + 1
	if (l > r) then set {l, r} to {r, l}
	
	-- The sort.
	set sortRangeLen to r - l + 1
	set mergeSize to 1
	-- Repeat with the merge size doubling on each pass.
	repeat while (mergeSize < sortRangeLen)
		set leftLen to mergeSize
		set mergeSize to mergeSize * 2
		-- Iterate through the sort range's merge divisions. Odd items at the end are a division if there are enough for a full "left" component and at least one "right" item.
		repeat with leftL from l to (r - leftLen) by mergeSize
			-- Initialise tracking indices for this division's "left" and "right" components and an end index for the right.
			set i to leftL
			set j to i + leftLen
			set rightR to leftL + mergeSize - 1
			if (rightR > r) then set rightR to r
			
			set lv to item i of o's lst
			-- Repeat until the right index reaches the end of the right items.
			repeat with j from j to rightR
				set rv to item j of o's lst
				-- Advance through the left values until either the left index catches up with the right or a left value is found which is greater than the current right value.
				repeat until ((i = j) or (lv > rv))
					set i to i + 1
					set lv to item i of o's lst
				end repeat
				-- If the left index has caught up, we've run out of left values, so finish early.
				if (i = j) then exit repeat
				-- Otherwise shift all the intervening values one slot to the right and plonk the right value where the left value was.
				repeat with x from j to (i + 1) by -1
					set item x of o's lst to item (x - 1) of o's lst
				end repeat
				set item i of o's lst to rv
				-- Adjust the left index to the left value's new position. 
				set i to i + 1
			end repeat
		end repeat
	end repeat
	
	return -- nothing.
end sort

-- (* Demo:
set lst to {}
repeat 1000 times
	set end of my lst to (random number 1000)
end repeat
log lst

-- Sort items 1 thru -1 of lst.
sort(lst, 1, -1)
lst
-- *)

The second script is more complex, but quite a bit faster. It treats the area between the placed values on the left and the not-yet-used “right” values as a buffer containing the not-yet-used “left” values and two “vacant” slots whose contents have been assigned to variables for comparison. The buffer migrates to the right as the merge progresses and the queue of unused left values inside it loops round on itself, the leftmost value being moved to the other end where possible instead of the entire queue being moved aside. The complex dynamic of the two ends of the buffer and different parts of the queue advancing at different rates and at different times means there’s often no free space to the right of the queue and it’s still necessary to shift part of the queue aside. However, it’s only a part of the queue which gets moved ” not the whole thing as in the first script ” and the script always chooses the shorter part.

Although considerably faster than the first script, this one’s not particuarly exciting speedwise, although it’s certainly usable. I’ve had lots of great ideas for speeding it up, but this seems to be the best compromise between speed and the sort actually working. If you need a stable vanilla sort which only uses the same amount of memory as an insertion sort, but which is much faster given an initially chaotic list, this could be for you. Just don’t expect a ternary version any time soon.


(* Merge sort
Merge sort algorithm: John von Neumann, 1945.
AppleScript implementation: Nigel Garvey, 2007. Iterative version without auxiliary list: 2015.

Parameters: (list, range index 1, range index 2)
*)

on sort(theList, l, r)
	script o
		property lst : theList
	end script
	
	-- Process the input parmeters.
	set listLen to (count theList)
	if (listLen < 2) then return
	-- Negative and/or transposed range indices.
	if (l < 0) then set l to listLen + l + 1
	if (r < 0) then set r to listLen + r + 1
	if (l > r) then set {l, r} to {r, l}
	
	-- The sort.
	set sortRangeLen to r - l + 1
	set mergeSize to 1
	-- Repeat with the merge size doubling on each pass.
	repeat while (mergeSize < sortRangeLen)
		set leftLen to mergeSize
		set mergeSize to mergeSize * 2
		-- Iterate through the sort range's merge divisions. The items at the end are a division if there are enough for a full "left" component and at least one "right" item.
		repeat with leftL from l to (r - leftLen) by mergeSize
			-- Derive indices for the beginning of this division's "right" component and for the ends of both.
			set rightL to leftL + leftLen
			set leftR to rightL - 1
			set rightR to leftR + leftLen
			if (rightR > r) then set rightR to r
			
			-- Any values on the right greater than or equal to the last on the left are already in place for this merge. If they're all greater or equal, the merge isn't needed at all.
			set lv to item leftR of o's lst
			repeat until ((rightR = leftR) or (lv > item rightR of o's lst))
				set rightR to rightR - 1
			end repeat
			if (rightR > leftR) then
				-- A merge is happening. Any values on the left which are less than or equal to the first on the right are already in place too.
				set lv to item leftL of o's lst
				set rv to item rightL of o's lst
				repeat until (lv > rv)
					set leftL to leftL + 1
					set lv to item leftL of o's lst
				end repeat
				-- So now the first value actually placed in the merge will be the first from the right and the last will be the last from the left.
				
				-- Initialise the left queue indices.
				set ql to leftL + 1 -- Left of queue.
				set qe to rightL -- End of queue + 1.
				set qr to rightL -- Right of queue + 1.
				set i to ql -- Front of queue.
				-- And the right queue index.
				set j to rightL + 1
				-- The current lowest left and right values are in the variables lv and rv, leaving two buffer slots (at leftL and rightL) available for writing.
				
				set k to leftL -- k is the traversal index for the merge.
				-- Immediately insert the first right value into the first slot.
				set item k of o's lst to rv
				-- If more than one value remains to be inserted, do the business below. Otherwise skip to the placing of the remaining left value.
				if (rightR - leftL > 1) then
					-- Get the next right value. Are there any more?
					set rightsLeft to (j â‰¤ rightR)
					if (rightsLeft) then
						-- Yes. Get it from the right in the normal way.
						set rv to item j of o's lst
						set j to j + 1
					else
						-- No. Start to use up the remaining left values. Set rv to the current left value and get a new left value from the queue.
						set rv to lv
						set lv to item i of o's lst
						set ql to ql + 1
						set i to i + 1
					end if
					
					-- Traverse the merge division to at most two slots before the end. The last two items will be set from rv and lv, in that order.
					set kMax to rightR - 2
					repeat while (k < kMax)
						set k to k + 1
						
						-- Move the leftmost queued value out of harm's way if necessary.
						if (k = ql) then
							-- There are three possibilities:
							if (qr < j) then
								-- There's a vacant buffer slot to the right of the queue. Move the value there.
								set item qr of o's lst to item ql of o's lst
								if (k = i) then set i to qr -- Change front index if relevant.
								set ql to ql + 1
								set qr to qr + 1
							else if (qr - i < qe - ql) then
								-- No vacant slot to the right of the queue and the right queue segment's shorter than the left. Shift the entire right segment two places to the left .
								repeat with x from i to (qr - 1)
									set item (x - 2) of o's lst to item x of o's lst
								end repeat
								set i to i - 2
								set qr to qr - 2
								-- . then move the leftmost value to the first of the now-vacant slots on the right.
								set item qr of o's lst to item ql of o's lst
								set ql to ql + 1
								set qr to qr + 1
							else
								-- No vacant slot to the right and the left segment's shorter than or the same length as the right. Shift the entire left segment two places to the right, leaving two vacant spaces to the left.
								repeat with x from (qe - 1) to ql by -1
									set item (x + 2) of o's lst to item x of o's lst
								end repeat
								set ql to ql + 2
								set qe to qe + 2
							end if
						end if
						
						-- Select and insert the next value to be placed in the merge.
						if (rightsLeft) then
							if (lv > rv) then
								-- The right candidate's less than the left. Use it.
								set item k of o's lst to rv
								-- If it's the last right value, start to fast-track the remaining lefts through the queue-management process.
								set nextValueFromLeft to (j > rightR)
								if (nextValueFromLeft) then
									set rightsLeft to false
									set rv to lv
								end if
							else
								-- The left candidate's less than or equal to the right. Use it.
								set item k of o's lst to lv
								set nextValueFromLeft to true
							end if
						else
							-- All the right values were previously used up. Insert another left without comparing it and request another.
							set item k of o's lst to rv
							set rv to lv
							set nextValueFromLeft to true
						end if
						
						-- Fetch the next value from the side decided above.
						if (nextValueFromLeft) then
							-- From the buffered queue.
							set lv to item i of o's lst
							if (i = ql) then set ql to ql + 1
							set i to i + 1
							if (i = qr) then
								-- The right queue segment's now used up. The left's the entire queue.
								set i to ql
								set qr to qe
								if (ql = qr) then
									-- The entire queue's gone. Shift any unfetched right values one slot to the left and exit early to have rv (lowest remaining value) and lv (highest) inserted at either end of them.
									repeat with x from (k + 3) to rightR
										set item (x - 1) of o's lst to item x of o's lst
									end repeat
									exit repeat
								end if
							end if
						else
							-- From the right.
							set rv to item j of o's lst
							set j to j + 1
						end if
						
						-- If the left queue segment's been used up, the right's the entire queue.
						if (ql < qe) then
						else
							set ql to i
							set qe to qr
						end if
					end repeat
					
					-- After leaving the merge repeat, insert the remaining right value (or penultimate fast-tracked left) into the slot after k.
					set item (k + 1) of o's lst to rv
				end if
				-- Insert the remaining left value into the end slot.
				set item rightR of o's lst to lv
			end if
		end repeat
	end repeat
	
	return -- nothing.
end sort

-- (* Demo:
set lst to {}
repeat 1000 times
	set end of my lst to (random number 1000)
end repeat
log lst

-- Sort items 1 thru -1 of lst.
sort(lst, 1, -1)
lst
-- *)