Monday, December 11, 2017
  • Index
  •  » Code Exchange
  •  » Quicksort with median-of-3 pivot selection, 3-way partitioning, and …

#1 2015-06-23 07:22:32 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4448

Quicksort with median-of-3 pivot selection, 3-way partitioning, and …

… tail call elimination. (Sorry. The subject field won't accept the full title!)

The Quicksort algorithm, devised by Tony Hoare in 1960, is elegant, simple (once you understand it!), and — as the name implies — fast (when competently implemented). But the very process which makes it so effective has a few efficiency issues which have interested algorithmicists over the years — notably the computer scientist Robert Sedgewick, whose Ph.D. thesis was a study of Quicksort.

Quicksort's recursive, divide-and-conquer action branches at each level to deal with ever smaller and more numerous subsets of items. Since smaller subsets have less scope for being out of order, Quicksort in effect does more and more dividing-and-conquering while doing less and less actual sorting. Sedgewick has suggested stopping the recursion at levels where fewer than eight or ten items would be handled. This would leave the sort incomplete, but with the values all within eight or ten places of where they had to be. A follow-up with an insertion sort would then take care of the remaining fine detail more quickly than the Quicksort would have done. This is the idea used in the qsort handlers Arthur Knapp and I wrote some years ago.

Another issue is the selection of the pivot values. (I'm assuming here that you know what's meant by these.) The best possible pivot value in each range is the median of the all values in the range, since it gives approximately equal numbers of greater and lesser values and maximises the differentiation achieved in that sweep. The worst possible pivot value is the highest or lowest in the range, because then only a single pivot instance is separated out and the rest of the values remain lumped together in one group. Unfortunately, the time it takes to identify the median value is usually more than the time saved by using it, so in practice you just have to grab one of the values and run with it. Sedgewick suggests median-of-3 pivot selection as a compromise. This uses the median of just the first, middle, and last values in the range, which obviously takes less time to find (except with very short ranges, of course). It doesn't guarantee the best pivot — or even a good one — but it does reduce the likelihood of the lowest or highest value being chosen and seems to be statistically effective, even in AppleScript. Also, as it turns out, the time taken for the pivot selection can be partly offset against certain esoteric consequences of the median-of-3 logic itself, whose upshot is that any swapping or partitioning of the two outer values which the sort would normally do can be performed incidentally and more economically during the pivot selection instead.

Quicksort normally divides the values in each range into those which are less than or equal to the pivot and those which are greater than or equal to it. The "or equal" consideration is a convenience which allows the pivot itself to be moved and helps pause the repeats. But if the split could be arranged three ways — values less than the pivot, instances of the pivot, and values greater than the pivot — the pivot instances would all be in their final positions and could be ignored for the rest of the sort. Bentley/McIlroy 3-way partitioning achieves this by diverting pivot instances during swaps to expandable "partitions" at either end of the range and then, at the end of the sweep, swapping them back in between the lesser and greater values. This scheme raises its own quandry, since it greatly speeds up sorts of lists containing lots of duplicate values, but slows things down when the values are unique. Sedgewick's philosophy, with which I think I agree, is that lists generally do contain duplicates and so 3-way partitioning is worth including. The "straight" version of the implementation below cheats for speed by feeding the pivots back in from a variable instead of actually fetching them from the partitions. The Custom version however does things the proper way to ensure that composite values are preserved. The complementary cheat — advancing the partition boundaries but not actually storing the pivots — doesn't work. Very often, it's a collision with a partitioned pivot which stops the sweep.

Finally (literally!), the very last thing in a Quicksort handler is the second of the two recursive calls to itself. When this second call eventually returns, the current call itself immediately returns to the call which called it, and so on. One can imagine a whole lotta returnin' going on up the line of second calls. It would be better if the lowest second call could return straight to the top of the line — or better still, if the second calls weren't made at all but were replaced with repeats. That's what tail call elimination does. In this case, instead of the fixed left-then-right recursion plan of traditional Quicksorts, Sedgewick recommends treating whichever group is smaller first, using recursion, and then the larger, using a repeat. This presumably ensures that there's more repeating than recursing.

Only the median-of-3, 3-way-partitioning, and tail-call-elimination optimisations are included in the scripts below. There doesn't appear to be any advantage in combining them with the insertion-sort-finish idea. Quite the opposite, in fact. The choices seem to be between them or it.

Applescript:

(* Quicksort with median-of-3 pivot selection, 3-way partitioning, and tail call elimination.

Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Refinements suggested by Robert Sedgewick at various times.
AppleScript Quicksort implementation: Arthur J. Knapp and Nigel Garvey, 2003.
Median-of-3 pivot selection and Bentley/McIlroy 3-way partitioning implementation: Nigel Garvey, 2010.
Tail call elimination implementation: Nigel Garvey, 2015.

Parameters: (list, range index 1, range index 2)
*)


on Quicksort_mo3_3wp_tce(theList, l, r)
   script o
       property lst : theList
       
       on qsrt(l, r)
           repeat -- Tail call elimination repeat.
               
               (* MEDIAN-OF-3 PIVOT SELECTION. *)
               -- Set the median of the first, middle, and last values in the current range as the pivot for this recursion or iteration. In the process, initialise partition indices for the 3-way partitioning and sort the first and last items as necessary to save having to fetch and compare them again later.
               set lv to my lst's item l -- First value.
               set mv to my lst's item ((l + r) div 2) -- Middle value.
               set rv to my lst's item r -- Last value.
               set pivot to mv -- 'pivot' is the middle value until changed.
               
               set lpb to l - 1 -- Boundary indices for the partitions at either end of
               set rpb to r + 1 -- the range, initialised immediately outside the range.
               if (lv < rv) then
                   if (lv < mv) then
                       if (mv < rv) then
                       else -- (lv < rv <= mv) The last value is (co-)median.
                           set pivot to rv
                           set rpb to r -- Partition it in situ.
                       end if
                   else -- (mv <= lv < rv) The first value is (co-)median.
                       set pivot to lv
                       set lpb to l -- Partition it in situ.
                   end if
               else if (rv < lv) then
                   -- Whatever the pivot after a median-of-3 selection, the first and last values in the range will be swapped if the first is greater than the last. Do it now to save fetching and comparing them again in the sort.
                   set my lst's item l to rv
                   set my lst's item r to lv
                   if (rv < mv) then
                       if (mv < lv) then
                       else -- (rv < lv <= mv) The first (now last) value is (co-)median
                           set pivot to lv
                           set rpb to r -- Partition it in situ.
                       end if
                   else -- (mv <= rv < lv) The last (now first) value is (co-)median.
                       set pivot to rv
                       set lpb to l -- Partition it in situ.
                   end if
               else -- (rv = lv) -- The first and last values are co-median. Partition them both. Quicksort would normally swap them too, but the logic here avoids that.
                   set pivot to lv
                   set lpb to l
                   set rpb to r
               end if
               
               -- The first and last items in the range have been dealt with above. Continue from the second and penultimate items.
               set lx to l + 1
               set rx to r - 1
               (* END OF MEDIAN-OF-3 PIVOT SELECTION. *)
               
               repeat until (lx > rx)
                   set lv to my lst's item lx
                   repeat while (lv < pivot)
                       set lx to lx + 1
                       set lv to my lst's item lx
                   end repeat
                   
                   set rv to my lst's item rx
                   repeat while (rv > pivot)
                       set rx to rx - 1
                       set rv to my lst's item rx
                   end repeat
                   
                   if (lx < rx) then
                       (* THREE-WAY PARTITIONING. *)
                       -- Where a swap involves a pivot instance, extend the partition at the destination end of the range, move the value displaced thereby to the swap destination, and move the pivot instance to the vacated slot. At the end of the pass, the partitioned pivots will be swapped back into the area between the lesser and greater values.
                       if (lv > pivot) then
                           -- The left value's not a pivot instance. Move it to the normal swap destination.
                           set my lst's item rx to lv
                       else
                           -- Otherwise partition it as described above.
                           set rpb to rpb - 1
                           set my lst's item rx to my lst's item rpb
                           set my lst's item rpb to lv
                       end if
                       
                       -- Similarly with the right value in the other direction.
                       if (rv < pivot) then
                           set my lst's item lx to rv
                       else
                           set lpb to lpb + 1
                           set my lst's item lx to my lst's item lpb
                           set my lst's item lpb to rv
                       end if
                       
                       set lx to lx + 1
                       set rx to rx - 1
                   else if (lx = rx) then
                       -- lx and rx index the same item: a pivot value in the right place.
                       -- It doesn't need to be swapped or partitioned, let alone at both ends!
                       set lx to lx + 1
                       set rx to rx - 1
                   end if
               end repeat
               
               -- Swap any pivot values in the left partition with values from rx leftwards. (The pivots really would have to come from the partition in a custom sort, not just cheated from the 'pivot' variable!)
               repeat with k from l to lpb
                   set my lst's item k to my lst's item rx
                   set my lst's item rx to pivot
                   set rx to rx - 1
               end repeat
               -- Swap any pivot values in the right partition with values from lx rightwards. (Ditto.)
               repeat with k from r to rpb by -1
                   set my lst's item k to my lst's item lx
                   set my lst's item lx to pivot
                   set lx to lx + 1
               end repeat
               -- All instances of the current pivot are now placed and can be excluded from the rest of the sort.
               (* END OF THREE-WAY PARTITIONING *)
               
               (* TAIL CALL ELIMINATION *)
               -- Subsort the smaller group of remaining values using recursion, then the larger group using a repeat.
               set shortDiff to rx - l -- The veracity of these assignments …
               set longDiff to r - lx -- … will be checked immediately!
               if (longDiff > shortDiff) and (longDiff > 0) then
                   -- The right group is indeed larger than the left and has more than one item. Set special indices for the left range and a new left index for the right.
                   set shortL to l
                   set shortR to rx
                   set l to lx
               else if (shortDiff > 0) then
                   -- The left group has more than one item and the right either doesn't have more or doesn't have more than one. Treat the left group as the larger. Swap the shortDiff/longDiff assignments.
                   tell shortDiff
                       set shortDiff to longDiff
                       set longDiff to it
                   end tell
                   -- Set special indices for the right range and a new right index for the left.
                   set shortL to lx
                   set shortR to r
                   set r to rx
               else
                   -- Neither group contains more than one item.
                   exit repeat
               end if
               -- Deal with the smaller group by recursion if it has more than two items or by swapping as necessary if it has two. Skip it if it has fewer than that.
               if (shortDiff > 1) then
                   qsrt(shortL, shortR)
               else if (shortDiff is 1) then
                   set lv to my lst's item shortL
                   set rv to my lst's item shortR
                   if (lv > rv) then
                       set my lst's item shortL to rv
                       set my lst's item shortR to lv
                   end if
               end if
               -- Deal with the larger group by swapping if necessary and exiting if it only has two items or by going back to the top of the repeat if it has more.
               if (longDiff is 1) then
                   set lv to my lst's item l
                   set rv to my lst's item r
                   if (lv > rv) then
                       set my lst's item l to rv
                       set my lst's item r to lv
                   end if
                   exit repeat
               end if
           end repeat (* END OF TAIL CALL ELIMINATION *)
       end qsrt
   end script
   
   -- Process the input parameters.
   set listLen to (count theList)
   if (listLen > 1) then
       -- Negative and/or transposed range indices.
       if (l < 0) then set l to listLen + l + 1
       if (r < 0) then set r to listLen + r + 1
       if (l > r) then set {l, r} to {r, l}
       
       -- Do the sort.
       o's qsrt(l, r)
   end if
   
   return -- nothing.
end Quicksort_mo3_3wp_tce

property sort : Quicksort_mo3_3wp_tce

--(* Demo:
set l to {}
repeat 1000 times
   set end of my l to (random number 1000)
end repeat

sort(l, 1, -1)
l
--*)

And here's the customisable version:

Applescript:

(* Custom Quicksort with median-of-3 pivot selection, 3-way partitioning, and tail call elimination.

Quicksort algorithm: S.A.R. (Tony) Hoare, 1960.
Refinements suggested by Robert Sedgewick at various times.
AppleScript Custom Quicksort implementation: Arthur J. Knapp and Nigel Garvey, 2003.
Median-of-3 pivot selection and Bentley/McIlroy 3-way partitioning implementation: Nigel Garvey, 2010.
Tail call elimination implementation: Nigel Garvey, 2015.

Parameters: (list, range index 1, range index 2, customisation record)
The customisation record has two recognised properties — 'comparer' and 'slave' — which are both optional. The 'comparer' value must be a script object containing an isGreater(a, b) handler which returns whether or not value 'a' is greater than value 'b' according to the customisation criteria. The 'slave' value is a script object with a swap(a, b) handler which is called whenever a swap takes place in the sort. This might be used, say, to swap items with the same indices in another list. Where the 'comparer' and/or 'slave' properties are omitted, or the customisation parameter isn't a record, the sort respectively compares values directly and does nothing outside the main process.
*)

on CustomQuicksort_mo3_3wp_tce(theList, l, r, customiser)
   script o
       property comparer : me
       property slave : me
       
       property lst : theList
       
       on qsrt(l, r)
           repeat -- Tail call elimination repeat.
               
               (* MEDIAN-OF-3 PIVOT SELECTION. *)
               -- Set the median of the first, middle, and last values in the current range as the pivot for this recursion or iteration. In the process, initialise partition indices for the 3-way partitioning and sort the first and last items as necessary to save having to fetch and compare them again later.
               set lv to my lst's item l -- First value.
               set mv to my lst's item ((l + r) div 2) -- Middle value.
               set rv to my lst's item r -- Last value.
               set pivot to mv -- 'pivot' is the middle value until changed.
               
               set lpb to l - 1 -- Boundary indices for the partitions at either end of
               set rpb to r + 1 -- the range, initialised immediately outside the range.
               if (comparer's isGreater(rv, lv)) then
                   if (comparer's isGreater(mv, lv)) then
                       if (comparer's isGreater(rv, mv)) then
                       else -- (lv < rv <= mv) The last value is (co-)median.
                           set pivot to rv
                           set rpb to r -- Partition it in situ.
                       end if
                   else -- (mv <= lv < rv) The first value is (co-)median.
                       set pivot to lv
                       set lpb to l -- Partition it in situ.
                   end if
               else if (comparer's isGreater(lv, rv)) then
                   -- Whatever the pivot after a median-of-3 selection, the first and last values in the range will be swapped if the first is greater than the last. Do it now to save fetching and comparing them again in the sort.
                   set my lst's item l to rv
                   set my lst's item r to lv
                   slave's swap(l, r)
                   if (comparer's isGreater(mv, rv)) then
                       if (comparer's isGreater(lv, mv)) then
                       else -- (rv < lv <= mv) The first (now last) value is (co-)median
                           set pivot to lv
                           set rpb to r -- Partition it in situ.
                       end if
                   else -- (mv <= rv < lv) The last (now first) value is (co-)median.
                       set pivot to rv
                       set lpb to l -- Partition it in situ.
                   end if
               else -- (rv = lv) -- The first and last values are co-median. Partition them both. Quicksort would normally swap them too, but the logic here avoids that.
                   set pivot to lv
                   set lpb to l
                   set rpb to r
               end if
               
               -- The first and last items in the range have been dealt with during the pivot selection. Continue from the second and penultimate items.
               set lx to l + 1
               set rx to r - 1
               (* END OF MEDIAN-OF-3 PIVOT SELECTION. *)
               
               repeat until (lx > rx)
                   set lv to my lst's item lx
                   repeat while (comparer's isGreater(pivot, lv))
                       set lx to lx + 1
                       set lv to my lst's item lx
                   end repeat
                   
                   set rv to my lst's item rx
                   repeat while (comparer's isGreater(rv, pivot))
                       set rx to rx - 1
                       set rv to my lst's item rx
                   end repeat
                   
                   if (lx < rx) then
                       (* THREE-WAY PARTITIONING. *)
                       -- Where a swap involves a pivot instance, extend the partition at the destination end of the range, move the value displaced thereby to the swap destination, and move the pivot instance to the vacated slot. At the end of the pass, the partitioned pivots will be swapped back into the area between the lesser and greater values.
                       -- With the slave swap, it's easier to swap items lx and rx directly anyway and then swap them into the partitions if necesary.
                       slave's swap(lx, rx)
                       if (comparer's isGreater(lv, pivot)) then
                           -- The left value's not a pivot instance. Move it to the normal swap destination.
                           set my lst's item rx to lv
                       else
                           -- Otherwise partition it as described above.
                           set rpb to rpb - 1
                           set my lst's item rx to my lst's item rpb
                           set my lst's item rpb to lv
                           slave's swap(rx, rpb)
                       end if
                       
                       -- Similarly with the right value in the other direction.
                       if (comparer's isGreater(pivot, rv)) then
                           set my lst's item lx to rv
                       else
                           set lpb to lpb + 1
                           set my lst's item lx to my lst's item lpb
                           set my lst's item lpb to rv
                           slave's swap(lx, lpb)
                       end if
                       
                       set lx to lx + 1
                       set rx to rx - 1
                   else if (lx = rx) then
                       -- lx and rx index the same item: a pivot value in the right place.
                       -- It doesn't need to be swapped or partitioned, let alone at both ends!
                       set lx to lx + 1
                       set rx to rx - 1
                   end if
               end repeat
               
               -- Swap any pivot values in the left partition with values from rx leftwards.
               repeat with k from l to lpb
                   tell my lst's item k
                       set my lst's item k to my lst's item rx
                       set my lst's item rx to it
                   end tell
                   slave's swap(k, rx)
                   set rx to rx - 1
               end repeat
               -- Swap any pivot values in the right partition with values from lx rightwards.
               repeat with k from r to rpb by -1
                   tell my lst's item k
                       set my lst's item k to my lst's item lx
                       set my lst's item lx to it
                   end tell
                   slave's swap(k, lx)
                   set lx to lx + 1
               end repeat
               -- All instances of the current pivot are now placed and can be excluded from the rest of the sort.
               (* END OF THREE-WAY PARTITIONING *)
               
               (* TAIL CALL ELIMINATION *)
               -- Subsort the smaller group of remaining values first using recursion, then the larger group using a repeat.
               set shortDiff to rx - l -- The veracity of these assignments …
               set longDiff to r - lx -- … will be checked immediately!
               if (longDiff > shortDiff) and (longDiff > 0) then
                   -- The right group is indeed larger than the left and has more than one item. Set special indices for the left range and a new left index for the right.
                   set shortL to l
                   set shortR to rx
                   set l to lx
               else if (shortDiff > 0) then
                   -- The left group has more than one item and the right either doesn't have more or doesn't have more than one. Treat the left group as the larger. Swap the shortDiff/longDiff assignments.
                   tell shortDiff
                       set shortDiff to longDiff
                       set longDiff to it
                   end tell
                   -- Set special indices for the right range and a new right index for the left.
                   set shortL to lx
                   set shortR to r
                   set r to rx
               else
                   -- Neither group contains more than one item.
                   exit repeat
               end if
               -- Deal with the smaller group by recursion if it has more than two items or by swapping as necessary if it has two. Skip it if it has fewer than that.
               if (shortDiff > 1) then
                   qsrt(shortL, shortR)
               else if (shortDiff is 1) then
                   set lv to my lst's item shortL
                   set rv to my lst's item shortR
                   if (comparer's isGreater(lv, rv)) then
                       set my lst's item shortL to rv
                       set my lst's item shortR to lv
                       slave's swap(shortL, shortR)
                   end if
               end if
               -- Deal with the larger group by swapping if necessary and exiting if it only has two items or by going back to the top of the repeat if it has more.
               if (longDiff is 1) then
                   set lv to my lst's item l
                   set rv to my lst's item r
                   if (comparer's isGreater(lv, rv)) then
                       set my lst's item l to rv
                       set my lst's item r to lv
                       slave's swap(l, r)
                   end if
                   exit repeat
               end if
           end repeat (* END OF TAIL CALL ELIMINATION *)
       end qsrt
       
       -- Default comparison and slave handlers for an ordinary sort.
       on isGreater(a, b)
           (a > b)
       end isGreater
       
       on swap(a, b)
       end swap
   end script
   
   -- Process the input parameters.
   set listLen to (count theList)
   if (listLen > 1) then
       -- Negative and/or transposed range indices.
       if (l < 0) then set l to listLen + l + 1
       if (r < 0) then set r to listLen + r + 1
       if (l > r) then set {l, r} to {r, l}
       
       -- Supplied or default customisation scripts.
       if (customiser's class is record) then set {comparer:o's comparer, slave:o's slave} to (customiser & {comparer:o, slave:o})
       
       -- Do the sort.
       o's qsrt(l, r)
   end if
   
   return -- nothing.
end CustomQuicksort_mo3_3wp_tce

property sort : CustomQuicksort_mo3_3wp_tce

--(* Contrived demo:
-- Reverse sort a list of random integers, sorting a list of matching strings in parallel with it.

script backwards -- 'comparer' script object. Does 'less than' comparisons instead to reverse the sort.
   on isGreater(a, b)
       (a < b)
   end isGreater
end script

script parallel -- 'slave' script object. Duplicates moves from an ongoing Quicksort in its own list.
   property lst : missing value
   
   -- Swap items a and b in the slave list.
   on swap(a, b)
       tell my lst's item a
           set my lst's item a to my lst's item b
           set my lst's item b to it
       end tell
   end swap
end script

set l to {}
set l2 to {}
repeat 1000 times
   set end of my l to (random number 1000)
   set end of my l2 to result as text
end repeat
set parallel's lst to l2

sort(l, 1, -1, {comparer:backwards, slave:parallel})
l2
--*)


NG

Offline

 
  • Index
  •  » Code Exchange
  •  » Quicksort with median-of-3 pivot selection, 3-way partitioning, and …

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)