Monday, December 11, 2017

You are not logged in.

• Index
•  » Code Exchange
•  » Quicksort with median-of-3 pivot selection, 3-way partitioning, and â€¦

## #1 2015-06-23 07:22:32 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4449

### Quicksort with median-of-3 pivot selection, 3-way partitioning, and â€¦

â€¦ tail call elimination. (Sorry. The subject field won't accept the full title!)

The Quicksort algorithm, devised by Tony Hoare in 1960, is elegant, simple (once you understand it!), and â€” as the name implies â€” fast (when competently implemented). But the very process which makes it so effective has a few efficiency issues which have interested algorithmicists over the years â€” notably the computer scientist Robert Sedgewick, whose Ph.D. thesis was a study of Quicksort.

Quicksort's recursive, divide-and-conquer action branches at each level to deal with ever smaller and more numerous subsets of items. Since smaller subsets have less scope for being out of order, Quicksort in effect does more and more dividing-and-conquering while doing less and less actual sorting. Sedgewick has suggested stopping the recursion at levels where fewer than eight or ten items would be handled. This would leave the sort incomplete, but with the values all within eight or ten places of where they had to be. A follow-up with an insertion sort would then take care of the remaining fine detail more quickly than the Quicksort would have done. This is the idea used in the qsort handlers Arthur Knapp and I wrote some years ago.

Another issue is the selection of the pivot values. (I'm assuming here that you know what's meant by these.) The best possible pivot value in each range is the median of the all values in the range, since it gives approximately equal numbers of greater and lesser values and maximises the differentiation achieved in that sweep. The worst possible pivot value is the highest or lowest in the range, because then only a single pivot instance is separated out and the rest of the values remain lumped together in one group. Unfortunately, the time it takes to identify the median value is usually more than the time saved by using it, so in practice you just have to grab one of the values and run with it. Sedgewick suggests median-of-3 pivot selection as a compromise. This uses the median of just the first, middle, and last values in the range, which obviously takes less time to find (except with very short ranges, of course). It doesn't guarantee the best pivot â€” or even a good one â€” but it does reduce the likelihood of the lowest or highest value being chosen and seems to be statistically effective, even in AppleScript. Also, as it turns out, the time taken for the pivot selection can be partly offset against certain esoteric consequences of the median-of-3 logic itself, whose upshot is that any swapping or partitioning of the two outer values which the sort would normally do can be performed incidentally and more economically during the pivot selection instead.

Quicksort normally divides the values in each range into those which are less than or equal to the pivot and those which are greater than or equal to it. The "or equal" consideration is a convenience which allows the pivot itself to be moved and helps pause the repeats. But if the split could be arranged three ways â€” values less than the pivot, instances of the pivot, and values greater than the pivot â€” the pivot instances would all be in their final positions and could be ignored for the rest of the sort. Bentley/McIlroy 3-way partitioning achieves this by diverting pivot instances during swaps to expandable "partitions" at either end of the range and then, at the end of the sweep, swapping them back in between the lesser and greater values. This scheme raises its own quandry, since it greatly speeds up sorts of lists containing lots of duplicate values, but slows things down when the values are unique. Sedgewick's philosophy, with which I think I agree, is that lists generally do contain duplicates and so 3-way partitioning is worth including. The "straight" version of the implementation below cheats for speed by feeding the pivots back in from a variable instead of actually fetching them from the partitions. The Custom version however does things the proper way to ensure that composite values are preserved. The complementary cheat â€” advancing the partition boundaries but not actually storing the pivots â€” doesn't work. Very often, it's a collision with a partitioned pivot which stops the sweep.

Finally (literally!), the very last thing in a Quicksort handler is the second of the two recursive calls to itself. When this second call eventually returns, the current call itself immediately returns to the call which called it, and so on. One can imagine a whole lotta returnin' going on up the line of second calls. It would be better if the lowest second call could return straight to the top of the line â€” or better still, if the second calls weren't made at all but were replaced with repeats. That's what tail call elimination does. In this case, instead of the fixed left-then-right recursion plan of traditional Quicksorts, Sedgewick recommends treating whichever group is smaller first, using recursion, and then the larger, using a repeat. This presumably ensures that there's more repeating than recursing.

Only the median-of-3, 3-way-partitioning, and tail-call-elimination optimisations are included in the scripts below. There doesn't appear to be any advantage in combining them with the insertion-sort-finish idea. Quite the opposite, in fact. The choices seem to be between them or it.

#### Applescript:

(* Quicksort with median-of-3 pivot selection, 3-way partitioning, and tail call elimination.

Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Refinements suggested by Robert Sedgewick at various times.
AppleScript Quicksort implementation: Arthur J. Knapp and Nigel Garvey, 2003.
Median-of-3 pivot selection and Bentley/McIlroy 3-way partitioning implementation: Nigel Garvey, 2010.
Tail call elimination implementation: Nigel Garvey, 2015.

Parameters: (list, range index 1, range index 2)
*)

on Quicksort_mo3_3wp_tce(theList, l, r)
script o
property lst : theList

on qsrt(l, r)
repeat -- Tail call elimination repeat.

(* MEDIAN-OF-3 PIVOT SELECTION. *)
-- Set the median of the first, middle, and last values in the current range as the pivot for this recursion or iteration. In the process, initialise partition indices for the 3-way partitioning and sort the first and last items as necessary to save having to fetch and compare them again later.
set lv to my lst's item l -- First value.
set mv to my lst's item ((l + r) div 2) -- Middle value.
set rv to my lst's item r -- Last value.
set pivot to mv -- 'pivot' is the middle value until changed.

set lpb to l - 1 -- Boundary indices for the partitions at either end of
set rpb to r + 1 -- the range, initialised immediately outside the range.
if (lv < rv) then
if (lv < mv) then
if (mv < rv) then
else -- (lv < rv <= mv) The last value is (co-)median.
set pivot to rv
set rpb to r -- Partition it in situ.
end if
else -- (mv <= lv < rv) The first value is (co-)median.
set pivot to lv
set lpb to l -- Partition it in situ.
end if
else if (rv < lv) then
-- Whatever the pivot after a median-of-3 selection, the first and last values in the range will be swapped if the first is greater than the last. Do it now to save fetching and comparing them again in the sort.
set my lst's item l to rv
set my lst's item r to lv
if (rv < mv) then
if (mv < lv) then
else -- (rv < lv <= mv) The first (now last) value is (co-)median
set pivot to lv
set rpb to r -- Partition it in situ.
end if
else -- (mv <= rv < lv) The last (now first) value is (co-)median.
set pivot to rv
set lpb to l -- Partition it in situ.
end if
else -- (rv = lv) -- The first and last values are co-median. Partition them both. Quicksort would normally swap them too, but the logic here avoids that.
set pivot to lv
set lpb to l
set rpb to r
end if

-- The first and last items in the range have been dealt with above. Continue from the second and penultimate items.
set lx to l + 1
set rx to r - 1
(* END OF MEDIAN-OF-3 PIVOT SELECTION. *)

repeat until (lx > rx)
set lv to my lst's item lx
repeat while (lv < pivot)
set lx to lx + 1
set lv to my lst's item lx
end repeat

set rv to my lst's item rx
repeat while (rv > pivot)
set rx to rx - 1
set rv to my lst's item rx
end repeat

if (lx < rx) then
(* THREE-WAY PARTITIONING. *)
-- Where a swap involves a pivot instance, extend the partition at the destination end of the range, move the value displaced thereby to the swap destination, and move the pivot instance to the vacated slot. At the end of the pass, the partitioned pivots will be swapped back into the area between the lesser and greater values.
if (lv > pivot) then
-- The left value's not a pivot instance. Move it to the normal swap destination.
set my lst's item rx to lv
else
-- Otherwise partition it as described above.
set rpb to rpb - 1
set my lst's item rx to my lst's item rpb
set my lst's item rpb to lv
end if

-- Similarly with the right value in the other direction.
if (rv < pivot) then
set my lst's item lx to rv
else
set lpb to lpb + 1
set my lst's item lx to my lst's item lpb
set my lst's item lpb to rv
end if

set lx to lx + 1
set rx to rx - 1
else if (lx = rx) then
-- lx and rx index the same item: a pivot value in the right place.
-- It doesn't need to be swapped or partitioned, let alone at both ends!
set lx to lx + 1
set rx to rx - 1
end if
end repeat

-- Swap any pivot values in the left partition with values from rx leftwards. (The pivots really would have to come from the partition in a custom sort, not just cheated from the 'pivot' variable!)
repeat with k from l to lpb
set my lst's item k to my lst's item rx
set my lst's item rx to pivot
set rx to rx - 1
end repeat
-- Swap any pivot values in the right partition with values from lx rightwards. (Ditto.)
repeat with k from r to rpb by -1
set my lst's item k to my lst's item lx
set my lst's item lx to pivot
set lx to lx + 1
end repeat
-- All instances of the current pivot are now placed and can be excluded from the rest of the sort.
(* END OF THREE-WAY PARTITIONING *)

(* TAIL CALL ELIMINATION *)
-- Subsort the smaller group of remaining values using recursion, then the larger group using a repeat.
set shortDiff to rx - l -- The veracity of these assignments â€¦
set longDiff to r - lx -- â€¦ will be checked immediately!
if (longDiff > shortDiff) and (longDiff > 0) then
-- The right group is indeed larger than the left and has more than one item. Set special indices for the left range and a new left index for the right.
set shortL to l
set shortR to rx
set l to lx
else if (shortDiff > 0) then
-- The left group has more than one item and the right either doesn't have more or doesn't have more than one. Treat the left group as the larger. Swap the shortDiff/longDiff assignments.
tell shortDiff
set shortDiff to longDiff
set longDiff to it
end tell
-- Set special indices for the right range and a new right index for the left.
set shortL to lx
set shortR to r
set r to rx
else
-- Neither group contains more than one item.
exit repeat
end if
-- Deal with the smaller group by recursion if it has more than two items or by swapping as necessary if it has two. Skip it if it has fewer than that.
if (shortDiff > 1) then
qsrt(shortL, shortR)
else if (shortDiff is 1) then
set lv to my lst's item shortL
set rv to my lst's item shortR
if (lv > rv) then
set my lst's item shortL to rv
set my lst's item shortR to lv
end if
end if
-- Deal with the larger group by swapping if necessary and exiting if it only has two items or by going back to the top of the repeat if it has more.
if (longDiff is 1) then
set lv to my lst's item l
set rv to my lst's item r
if (lv > rv) then
set my lst's item l to rv
set my lst's item r to lv
end if
exit repeat
end if
end repeat (* END OF TAIL CALL ELIMINATION *)
end qsrt
end script

-- Process the input parameters.
set listLen to (count theList)
if (listLen > 1) then
-- Negative and/or transposed range indices.
if (l < 0) then set l to listLen + l + 1
if (r < 0) then set r to listLen + r + 1
if (l > r) then set {l, r} to {r, l}

-- Do the sort.
o's qsrt(l, r)
end if

return -- nothing.
end Quicksort_mo3_3wp_tce

property sort : Quicksort_mo3_3wp_tce

--(* Demo:
set l to {}
repeat 1000 times
set end of my l to (random number 1000)
end repeat

sort(l, 1, -1)
l
--*)

And here's the customisable version:

#### Applescript:

(* Custom Quicksort with median-of-3 pivot selection, 3-way partitioning, and tail call elimination.

Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Refinements suggested by Robert Sedgewick at various times.
AppleScript Custom Quicksort implementation: Arthur J. Knapp and Nigel Garvey, 2003.
Median-of-3 pivot selection and Bentley/McIlroy 3-way partitioning implementation: Nigel Garvey, 2010.
Tail call elimination implementation: Nigel Garvey, 2015.

Parameters: (list, range index 1, range index 2, customisation record)
The customisation record has two recognised properties â€” 'comparer' and 'slave' â€” which are both optional. The 'comparer' value must be a script object containing an isGreater(a, b) handler which returns whether or not value 'a' is greater than value 'b' according to the customisation criteria. The 'slave' value is a script object with a swap(a, b) handler which is called whenever a swap takes place in the sort. This might be used, say, to swap items with the same indices in another list. Where the 'comparer' and/or 'slave' properties are omitted, or the customisation parameter isn't a record, the sort respectively compares values directly and does nothing outside the main process.
*)

on CustomQuicksort_mo3_3wp_tce(theList, l, r, customiser)
script o
property comparer : me
property slave : me

property lst : theList

on qsrt(l, r)
repeat -- Tail call elimination repeat.

(* MEDIAN-OF-3 PIVOT SELECTION. *)
-- Set the median of the first, middle, and last values in the current range as the pivot for this recursion or iteration. In the process, initialise partition indices for the 3-way partitioning and sort the first and last items as necessary to save having to fetch and compare them again later.
set lv to my lst's item l -- First value.
set mv to my lst's item ((l + r) div 2) -- Middle value.
set rv to my lst's item r -- Last value.
set pivot to mv -- 'pivot' is the middle value until changed.

set lpb to l - 1 -- Boundary indices for the partitions at either end of
set rpb to r + 1 -- the range, initialised immediately outside the range.
if (comparer's isGreater(rv, lv)) then
if (comparer's isGreater(mv, lv)) then
if (comparer's isGreater(rv, mv)) then
else -- (lv < rv <= mv) The last value is (co-)median.
set pivot to rv
set rpb to r -- Partition it in situ.
end if
else -- (mv <= lv < rv) The first value is (co-)median.
set pivot to lv
set lpb to l -- Partition it in situ.
end if
else if (comparer's isGreater(lv, rv)) then
-- Whatever the pivot after a median-of-3 selection, the first and last values in the range will be swapped if the first is greater than the last. Do it now to save fetching and comparing them again in the sort.
set my lst's item l to rv
set my lst's item r to lv
slave's swap(l, r)
if (comparer's isGreater(mv, rv)) then
if (comparer's isGreater(lv, mv)) then
else -- (rv < lv <= mv) The first (now last) value is (co-)median
set pivot to lv
set rpb to r -- Partition it in situ.
end if
else -- (mv <= rv < lv) The last (now first) value is (co-)median.
set pivot to rv
set lpb to l -- Partition it in situ.
end if
else -- (rv = lv) -- The first and last values are co-median. Partition them both. Quicksort would normally swap them too, but the logic here avoids that.
set pivot to lv
set lpb to l
set rpb to r
end if

-- The first and last items in the range have been dealt with during the pivot selection. Continue from the second and penultimate items.
set lx to l + 1
set rx to r - 1
(* END OF MEDIAN-OF-3 PIVOT SELECTION. *)

repeat until (lx > rx)
set lv to my lst's item lx
repeat while (comparer's isGreater(pivot, lv))
set lx to lx + 1
set lv to my lst's item lx
end repeat

set rv to my lst's item rx
repeat while (comparer's isGreater(rv, pivot))
set rx to rx - 1
set rv to my lst's item rx
end repeat

if (lx < rx) then
(* THREE-WAY PARTITIONING. *)
-- Where a swap involves a pivot instance, extend the partition at the destination end of the range, move the value displaced thereby to the swap destination, and move the pivot instance to the vacated slot. At the end of the pass, the partitioned pivots will be swapped back into the area between the lesser and greater values.
-- With the slave swap, it's easier to swap items lx and rx directly anyway and then swap them into the partitions if necesary.
slave's swap(lx, rx)
if (comparer's isGreater(lv, pivot)) then
-- The left value's not a pivot instance. Move it to the normal swap destination.
set my lst's item rx to lv
else
-- Otherwise partition it as described above.
set rpb to rpb - 1
set my lst's item rx to my lst's item rpb
set my lst's item rpb to lv
slave's swap(rx, rpb)
end if

-- Similarly with the right value in the other direction.
if (comparer's isGreater(pivot, rv)) then
set my lst's item lx to rv
else
set lpb to lpb + 1
set my lst's item lx to my lst's item lpb
set my lst's item lpb to rv
slave's swap(lx, lpb)
end if

set lx to lx + 1
set rx to rx - 1
else if (lx = rx) then
-- lx and rx index the same item: a pivot value in the right place.
-- It doesn't need to be swapped or partitioned, let alone at both ends!
set lx to lx + 1
set rx to rx - 1
end if
end repeat

-- Swap any pivot values in the left partition with values from rx leftwards.
repeat with k from l to lpb
tell my lst's item k
set my lst's item k to my lst's item rx
set my lst's item rx to it
end tell
slave's swap(k, rx)
set rx to rx - 1
end repeat
-- Swap any pivot values in the right partition with values from lx rightwards.
repeat with k from r to rpb by -1
tell my lst's item k
set my lst's item k to my lst's item lx
set my lst's item lx to it
end tell
slave's swap(k, lx)
set lx to lx + 1
end repeat
-- All instances of the current pivot are now placed and can be excluded from the rest of the sort.
(* END OF THREE-WAY PARTITIONING *)

(* TAIL CALL ELIMINATION *)
-- Subsort the smaller group of remaining values first using recursion, then the larger group using a repeat.
set shortDiff to rx - l -- The veracity of these assignments â€¦
set longDiff to r - lx -- â€¦ will be checked immediately!
if (longDiff > shortDiff) and (longDiff > 0) then
-- The right group is indeed larger than the left and has more than one item. Set special indices for the left range and a new left index for the right.
set shortL to l
set shortR to rx
set l to lx
else if (shortDiff > 0) then
-- The left group has more than one item and the right either doesn't have more or doesn't have more than one. Treat the left group as the larger. Swap the shortDiff/longDiff assignments.
tell shortDiff
set shortDiff to longDiff
set longDiff to it
end tell
-- Set special indices for the right range and a new right index for the left.
set shortL to lx
set shortR to r
set r to rx
else
-- Neither group contains more than one item.
exit repeat
end if
-- Deal with the smaller group by recursion if it has more than two items or by swapping as necessary if it has two. Skip it if it has fewer than that.
if (shortDiff > 1) then
qsrt(shortL, shortR)
else if (shortDiff is 1) then
set lv to my lst's item shortL
set rv to my lst's item shortR
if (comparer's isGreater(lv, rv)) then
set my lst's item shortL to rv
set my lst's item shortR to lv
slave's swap(shortL, shortR)
end if
end if
-- Deal with the larger group by swapping if necessary and exiting if it only has two items or by going back to the top of the repeat if it has more.
if (longDiff is 1) then
set lv to my lst's item l
set rv to my lst's item r
if (comparer's isGreater(lv, rv)) then
set my lst's item l to rv
set my lst's item r to lv
slave's swap(l, r)
end if
exit repeat
end if
end repeat (* END OF TAIL CALL ELIMINATION *)
end qsrt

-- Default comparison and slave handlers for an ordinary sort.
on isGreater(a, b)
(a > b)
end isGreater

on swap(a, b)
end swap
end script

-- Process the input parameters.
set listLen to (count theList)
if (listLen > 1) then
-- Negative and/or transposed range indices.
if (l < 0) then set l to listLen + l + 1
if (r < 0) then set r to listLen + r + 1
if (l > r) then set {l, r} to {r, l}

-- Supplied or default customisation scripts.
if (customiser's class is record) then set {comparer:o's comparer, slave:o's slave} to (customiser & {comparer:o, slave:o})

-- Do the sort.
o's qsrt(l, r)
end if

return -- nothing.
end CustomQuicksort_mo3_3wp_tce

property sort : CustomQuicksort_mo3_3wp_tce

--(* Contrived demo:
-- Reverse sort a list of random integers, sorting a list of matching strings in parallel with it.

script backwards -- 'comparer' script object. Does 'less than' comparisons instead to reverse the sort.
on isGreater(a, b)
(a < b)
end isGreater
end script

script parallel -- 'slave' script object. Duplicates moves from an ongoing Quicksort in its own list.
property lst : missing value

-- Swap items a and b in the slave list.
on swap(a, b)
tell my lst's item a
set my lst's item a to my lst's item b
set my lst's item b to it
end tell
end swap
end script

set l to {}
set l2 to {}
repeat 1000 times
set end of my l to (random number 1000)
set end of my l2 to result as text
end repeat
set parallel's lst to l2

sort(l, 1, -1, {comparer:backwards, slave:parallel})
l2
--*)

NG

Online

• Index
•  » Code Exchange
•  » Quicksort with median-of-3 pivot selection, 3-way partitioning, and â€¦