Saturday, December 16, 2017

#1 2015-06-25 02:05:34 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Effective way of finding the median in a list of numbers

Hello.

The handlers below are based on the quicksort algorithm, where we do take advantage of the fact that we don't need to perform a full sort in order to find the median of a list of numbers. The median is defined to be floor (n+1/ 2) if n is odd, and ceil(n+1/2) if n is even, where n is the length of the list of numbers.

|median| has a best/average case of O(n) , and a worst case performance of O(n^2) (sorted list, or reverse ordered).

There is a better way, to find the median, which works quite differently, the median of medians algorithm, that I may come back to. There is an easy way to get the median by Satimage.osax/smile, so I am not sure if it is worth the bother. smile

Edit

New version that works correctly.

New version that more than halves the time used, due to using a "local" random number generator.

New version that halves the time again for 1000 elements, I have changed the partitioning algorithm.

Applescript:

(* Demo
set maxitems to 1000
set L to {}
repeat maxitems times
   set end of L to rand(1, maxitems)
end repeat
set max to 0
repeat with i from 1 to maxitems
   if item i of L > max then set max to item i of L
end repeat
log "max: " & max

set min to (maxitems + 1)
repeat with i from 1 to maxitems
   if item i of L < min then set min to item i of L
end repeat
log "min: " & min
set t0 to (current date)
repeat 1000 times
   set {md, k} to |median|(L)
end repeat
set t1 to ((current date) - t0) / 1000
log "time T : " & t1
log "median val: " & md
set m to sortlist L -- Satimage needed!

log "low: " & item (k - 1) of m
log "median: " & item k of m
log "high: " & item (k + 1) of m
*)


script pseudo
   -- pure multiplicative random generator.
   -- Its faster most of the time than Standard Additions random number (10000 runs).
   -- it has been proved that the sequence doesn't repeat itself before 2^31 -2 calls have been made.
   -- yet: No warranties about anything.
   -- Rosen "Discrete Mathematics" p. 207
   property x0 : 3
   property rmod : 2 ^ 31 - 1
   property multiplier : 7 ^ 5
   
   on rand()
       set x0 to (multiplier * x0) mod rmod
       return (x0 / 1.0E+9)
   end rand
   on init()
       rand()
   end init
end script

on partition(|L|, low, high, pivotIndex)
   -- N. Wirth: Algorithms and Datastructures p. 93
   -- Implemented in AppleScript by McUsr
   (*
       Excerpt:
       Let us try the following algorithm:
       Pick any item at random (and call it x ):
       scan the array from the left until an item a_i > x is found
       then scan from the right until an item a_j < x is found.
       Now exchange the two items and continue this scan and swap process until
       the two scans meet somewhere in the middle of the array.
       The result is that the array is now partitioned into a left part with keys
       â‰¤ x, and a right part with keys ≥ x.
   *)

   script o
       property L : |L|
   end script
   -- select an item x at random
   set x to item pivotIndex of o's L
   set i to low
   set j to high
   repeat while i ≤ j
       repeat while item i of o's L < x
           set i to i + 1
       end repeat
       repeat while x < item j of o's L
           set j to j - 1
       end repeat
       
       if i < j then
           set w to item i of o's L
           set item i of o's L to item j of o's L
           set item j of o's L to w
       end if
       if i ≤ j then
           set i to i + 1
           set j to j - 1
       end if
   end repeat
   return i - 1 -- last element of the "left" partition
end partition



on randSelect(|L|, low, high, k)
   script o
       property L : |L|
   end script
   -- selects the k smallest integer (used for the median).
   -- It uses the basics of quicksort, to puth the k least integer
   -- into the kth position, which is then returned. It does so
   -- by partionting the list of elements, and returns the kth element
   -- when the list have been partioned on k.
   -- Implemented in AppleScript by McUsr: Standard quick select as fouund on:
   -- [url]http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-four/[/url]
   
   if low = high then return item low of o's L
   set r to high - low + 1
   
   -- set r to ((random number) * r mod r div 1 + low)
   set r to ((pseudo's rand()) * r mod r div 1 + low)
   set r to partition(o's L, low, high, r)
   set n to r - low + 1
   if k = n then
       return item r of o's L
   else if k < n then
       return randSelect(o's L, low, r - 1, k)
   else
       randSelect(o's L, r + 1, high, k - n)
   end if
end randSelect

on |median|(L)
   -- returns the median in a list of numbers
   -- it alters L, save a copy before invoking this
   -- handler, if you need the original list.
   -- Thanks to Nigel Garvey for spotting the problems
   -- with the first version.
   set ll to length of L
   if ll mod 2 = 0 then
       set k to ll div 2
   else
       set k to (ll + 1) div 2
   end if
   pseudo's init()
   return {randSelect(L, 1, ll, k), k}
end |median|

on rand(low, high)
   -- returns a random integer, within low and high inclusive
   -- Made by McUsr
   set k to high - low + 1
   return ((random number) * k mod k div 1 + low)
end rand

Last edited by McUsrII (2015-06-27 10:05:46 am)

Offline

 

#2 2015-06-25 03:07:07 am

Shane Stanley
Member
From:: Australia
Registered: 2002-12-07
Posts: 5200

Re: Effective way of finding the median in a list of numbers

McUsrII wrote:

There is an easy way to get the median by Satimage.osax/smile, so I am not sure if it is worth the bother.


There's also a vanilla way:

Applescript:

use framework "Foundation"

on medianOfList:theList
   set anNSExpression to current application's NSExpression's expressionForConstantValue:theList
   set newNSExpression to current application's NSExpression's expressionForFunction:"median:" arguments:{anNSExpression}
   return (newNSExpression's expressionValueWithObject:(missing value) context:(missing value)) as real
end medianOfList:

Just don't ask me to explain it...


Shane Stanley <sstanley@myriad-com.com.au>
www.macosxautomation.com/applescript/apps/

Offline

 

#3 2015-06-25 03:41:28 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Thanks Shane.

I'll have to look into NSExpression, that reminds me of the Cocoa Text System.  However, Smile is the full package now adays for free, so you can get nice graphs and such; when you graph a dataset, then a record is returned, with median, standard devation, mean, and such. smile

That is not a cure all, but it is a nice environment to use AppleScript interactively in. Not so much for writing apps in, but you can script it at least. So, it is good to have many alternatives, like always.

I guess the median in NSExpression, uses the linear time algorithm by the way.

Last edited by McUsrII (2015-06-25 03:43:55 am)

Offline

 

#4 2015-06-25 04:32:22 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Hi McUsrII.

Interesting. But it doesn't always return the right results. Often, but not always. A few examples with this test script, which I used to compare the results with those from the median-of-3 pivot selection in my recently posted Quicksort:

Applescript:

set maxitems to 3
set L to {}
repeat maxitems times
   set end of L to rand(1, 1000)
end repeat

return {L, |median|(L)}
--> {{361, 719, 524}, 719} -- Should be 524.
--> {{256, 643, 515}, 643} -- Should be 515.
--> {{425, 405, 928}, 405} -- Should be 425.
--> {{241, 4, 862}, 4} -- Should be 241.


NG

Offline

 

#5 2015-06-25 06:19:05 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello.

Well, maybe I should leave this with the idea. It seems to me that there are several issues here, I have fixed some errors, but it didn't help much. This is what you get, when you test something on just a few numbers.  I'll come back with something.

Offline

 

#6 2015-06-25 06:57:03 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Phew! It's quite tricky to get right, isn't it? I think this works:

Applescript:

(* Find the median of the values (presumed to be integers) in a range of a list.
By Nigel Garvey 2015, based on a Quicksort-based idea gleaned from McUsrII and others.
With odd numbers of items, the median value's the one which would be in the middle if the range were sorted. With even numbers of items, it's the average of the two middle values.

Parameters: (list, range index 1, range index 2)
*)

on medianValue(theList, l, r)
   script o
       property extract : missing value
   end script
   
   -- Process the range parameters.
   set listLen to (count theList)
   if (l < 0) then set l to listLen + l + 1
   if (r < 0) then set r to listLen + r + 1
   if (l > r) then set {l, r} to {r, l}
   if ((l < 1) or (r > listLen)) then error "Duff range parameters!" -- Compose as required.
   
   -- Extract a duplicate of the specified range (to preserve it, as some sorting will be done) and set some useful variables.
   set o's extract to theList's items l thru r
   set rangeLen to r - l + 1
   set oddLen to (rangeLen mod 2 is 1) -- Is an odd number of items involved?
   set k to (1 + rangeLen) div 2 -- (First) median position in the range.    
   set searchNum to 1 -- Indication of which result's being sought.
   
   -- The search process is like a Quicksort, but with recursion only occurring on the side containing the median position. It stops when the inner repeat indices cross at that position. When the number of items is even, a second search is done to the right of the found median. Since recursion would involve tail calls, a repeat's used here instead.
   set l to 1 -- Left index of extracted range.
   set r to rangeLen -- Right index ditto.
   repeat -- Tail call elimination repeat.
       set pivot to o's extract's item ((l + r) div 2)
       set i to l
       set j to r
       repeat until (i > j)
           set u to o's extract's item i
           repeat while (u < pivot)
               set i to i + 1
               set u to o's extract's item i
           end repeat
           
           set w to o's extract's item j
           repeat while (w > pivot)
               set j to j - 1
               set w to o's extract's item j
           end repeat
           
           if (i > j) then
           else
               set o's extract's item i to w
               set o's extract's item j to u
               set i to i + 1
               set j to j - 1
           end if
       end repeat
       
       if (j < k) then
           if (i > k) then
               -- The indices have crossed at the median position, which now contains a median value.
               set thisResult to item k of o's extract
               if (oddLen) then
                   -- That's it if the range has an odd number of values.
                   return thisResult
               else if (searchNum is 1) then
                   -- Otherwise, if this is the first value we need, store it and prepare to find the second, which is by now somewhere to the right of k.
                   set result1 to thisResult
                   set searchNum to 2
                   set k to k + 1
                   set l to k
                   set r to rangeLen
               else
                   -- Otherwise this is the second value we need. Return the average of it and the first, as an integer if possible.
                   set |median| to (result1 + thisResult) / 2
                   tell (|median| as integer) to if (it = |median|) then set |median| to it
                   return |median|
               end if
           else -- ((j < k) and (i ≤ k))
               -- k's to the right of where the indices crossed. Reset for the right "recursion".
               set l to i
           end if
       else -- (j ≥ k)
           -- k's to the left of where the indices crossed. Reset for the left "recursion".
           set r to j
       end if
   end repeat
end medianValue

--(* Demo:
set l to {}
repeat with i from 1 to 6
   set end of my l to (random number 100)
end repeat
log l

-- Find the median of values 1 thru -1 of l.
set m to medianValue(l, 1, -1)
--*)

Edit: General improvements and tidying-up: only the specified range is now extracted, the handler and properties in the script object are now in-line code, other unnecessary vestiges of the script's Quicksort origins have been removed, some comments and variable names have been changed.
Edit 2: The search for the second value, when needed, is now restricted to the right of the first, since it must already be on that side by then!

Last edited by Nigel Garvey (2015-07-08 12:37:05 pm)


NG

Offline

 

#7 2015-06-25 09:44:04 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

Yes, it was, but it is always developing with a challenge!  Your code looks awesome, -as allways.  I have updated my post #1 with a version that works. It wasn't that tricky once I found the correct method of  partitioning the list. That is, one that could handle several equal elements. That is partioning the list into elements that are lesser or equal to the pivot element, and greater or equal to the pivot element. I had forgotten how the standard partitioning of quicksort works, where it leverages on the  sublists to get everything in order, alas, such a partioniong scheme just wouldn't hack it here.

My version works, as far as I know after thoroughly testing, I'm not going to test my version against yours for speed. big_smile  I use three handlers, nor am I going to optimize it, as there is a better algorithm out there that is probably easier to implement (median of medians), should I ever need something faster than yours.

For some reason I can't make your script compile ( I get the message: "An end of line can't come after this", highlighting the line: "set pivot to my lst's item ((subl + subr) div 2)" (first line below the repeat).)

Edit

Pivot turns out to be reserved word on my machine.

Last edited by McUsrII (2015-06-25 11:27:21 pm)

Offline

 

#8 2015-06-26 03:43:13 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Hi McUsrII.

Yes. The results from your script seem to be more consistent now. I see your |median|() handler now returns a list containing both the median value and the median position. My handler just returns the value.

Our interpretations of "median" agree where there are odd numbers of items, but my handler uses the "average of two middle values" convention when the number of items is even. (When putting it together, I wasted a lot of time trying to exploit the fact that the average of items k and -k works for both even and odd list sizes. But in the end, it proved simpler and faster to use different code for the two situations!)

I see your script uses randomised pivot choosing, presumably following the theory that this is likely to produce better pivots more often than always using, say, the values from the middle of the ranges — and that therefore it'll make the process faster. But in fact, as Arthur and I discovered when writing our original Quicksort implementation, any slight theoretical advantage is totally obliterated by the time it takes to make random choices in AppleScript! That's why stuck with the middle values.


NG

Offline

 

#9 2015-06-26 03:59:18 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

I agree to everything you have written. The random number generator, is very hard to replicate in AppleScript preserving almost the same quality. So, I have given up on making one to replace it in pure AS, with regards to saving any speed, retaining somewhat the quality. (But at least I figured out, what is doable with regards to speed. I realized the overhead of using a random number generator, when I performed the times, regarding the partitioning of quick sort. )

I might slack on quality at a later point, however, and maybe make something that can beat random number, still  give sufficient quality with regards to the distribution of randomness. -I may also want to port this to "C", so therefore I stuck with the random number.

I figured both the item, and the median was reasonable to return, all the time I had computed the median. It works as it should now, and it isn't really meant to be used as a statistic median, more of a "computerese" median, for creating or manipulating partitions of datasets, or datasets, and such.

Thank you for finding the  error in my first version. smile

Edit

It may be feasible to make a simple random number generator, that just gives you a random distance to one of the sides in the current range. The hard part, is to prove that it will work out feasibly.

The precursor to this algorithm, was invented in order to find the k'th runner up in a sports tournament.

Last edited by McUsrII (2015-06-26 04:31:42 am)

Offline

 

#10 2015-06-26 07:10:36 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Just to note I've tidied up my script a bit now.


NG

Offline

 

#11 2015-06-26 11:39:13 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

I don't think it is doable to outmatch the random number generator of standard additions, with regard to speed, and I'm not sure that it in any case will be faster in a quick sort handler. However, it can be advantageous with an alternative sequence somewhere, for instance for use in  a secondary hash function.

Here is a little nifty generator by the way, a litle bit faster than random number, not fast enough for quicksort. smile

Applescript:

script pseudo
   -- pure multiplicative random generator.
   -- Its faster most of the time than Standard Additions random number (10000 runs).
   -- it has been proved that the sequence doesn't repeat itself before 2^31 -2 calls have been made.
   -- yet: No warranties about anything.
   -- Rosen "Discrete Mathematics" p. 207
   property x0 : 3    
   property rmod : 2 ^ 31 - 1
   property multiplier : 7 ^ 5
   
   on rand()
       set x0 to (multiplier * x0) mod rmod
       return (x0 / 1.0E+9)
   end rand
   on init()
       rand()
   end init
end script
set tstlist to {}

set t0 to (current date)
pseudo's init()
repeat 10000 times
   if 1 = 1 then
       set end of tstlist to (((pseudo's rand()) * 100) mod 100 div 1 + 1)
   else
       set end of tstlist to (random number) * 100 mod 100 div 1 + 1
   end if
end repeat
set res to ((current date) - t0) / 1000

set newlist to sortlist tstlist with ascending
-- Satimage.osax

log "res :" & res

set f to 1
set min to 100
set max to 0
set ctr to 1
repeat with i from 1 to 10000
   if item i of newlist ≠ f then
       log "num " & f & " f: " & ctr
       if ctr < min then set min to ctr
       if ctr > max then set max to ctr
       set ctr to 1
       set f to item i of newlist
   else
       set ctr to ctr + 1
   end if
end repeat
log "min: " & min & " max:" & max

Offline

 

#12 2015-06-27 12:51:49 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello.

So I mulled a little bit over the random number generator, thinking I would have to inline it by removing the handler and everything, and adding parameters, to keep the seed. I timed it in a script first, and found that there where no differences in execution time, when creating 15000 random numbers, so I implemented it directly into my median and parition handler in post #1. The speed gain was over 50%! smile

By the way, I have figured out that the main utility of this handler, is to pick the right item to start with, in order to have the best vantage point for building a balanced tree.

Offline

 

#13 2015-06-27 03:13:18 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Hi McUsrII.

Sorry for the delay in responding to your random number generator. I hope to knock together some quicksort versions for comparative testing in the next day or two. It looks as if it would be very fast in the actual selection of pivots, so the big question would then be the effectiveness of the pivots. Thanks for the code.  smile


NG

Offline

 

#14 2015-06-27 03:26:22 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

I'm not sure it will work in your case, nor if it is worth trying, because I think your algorithm is pretty optimized. And don't thank me, thank Kenneth Rosen for providing it, I'm glad if you find it usable, as I  owe you a whole lot. smile

Offline

 

#15 2015-06-27 07:44:33 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Well. After some initial tests with my median value script, it's not looking good for the random pivot theory at the moment.

The random pivot version of my script is just like the one in post #6, but with this section …

Applescript:

   -- The search process is like a Quicksort, but with recursion only occurring on the side containing the median position. It stops when the inner repeat indices cross at that position. Two searches are done when the number of items is even. Since recursion would involve tail calls, a repeat's used here instead.
   set l to extractL
   set r to extractR
   repeat -- Tail call elimination repeat.
       set pivot to o's extract's item ((l + r) div 2)

… modified like this:

Applescript:

   set rmod to 2 ^ 31 - 1
   set multiplier to 7 ^ 5
   set x0 to 3 * multiplier mod rmod -- Equivalent to init() with x0 starting at 3.
   
   -- The search process is like a Quicksort, but with recursion only occurring on the side containing the median position. It stops when the inner repeat indices cross at that position. Two searches are done when the number of items is even. Since recursion would involve tail calls, a repeat's used here instead.
   set l to extractL
   set r to extractR
   repeat -- Tail call elimination repeat.
       set x0 to x0 * multiplier mod rmod
       tell (r - l + 1) to set pivot to o's extract's item ((x0 / 1.0E+9) * it mod it div 1 + l)

The entire random-number process is in-line to eliminate the overhead of handler calls. The difference inside the repeat (equivalent to a recursion in a recursive version) is only seven very simple math operations and a variable setting. And yet the random pivot version generally tends to be slightly slower than the middle-item pivot version. This means that the advantages of randomly chosen pivots must be somewhere between negative and not enough to repay even this paltry investment. I'd expect difference to be even more apparent in a full Quicksort, which does more recursions and, in particular, more at lower levels, where the shorter range lengths would increase the chances of the same pivots being chosen anyway. I will try random pivots with full Quicksorts before dismissing the theory entirely as an old wives' tale. But not now as I have to go and mow the lawn. wink

Last edited by Nigel Garvey (2015-06-27 07:46:07 am)


NG

Offline

 

#16 2015-06-27 09:21:51 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

Shales are so much more practical than a lawn, but, a lawn has its charm. smile

It's not a wifes tale, although N. Wirth just uses it to illustrate how quicksort operates. He uses the low + high div 2 like you do. However, a Professor Eric D. Demaine of MIT, highly praised the random partioning of quicks sort, then remarked he didn't show it, but the students were supposed to have seen it in the book:  Cormen "Introduction To Algorithms". I have read through the handout for the "quick sort" lecture, and it wasn't there either. The philosophy behind it is described by Wirth however, and that is  that things fall into place faster when they are swapped over a longer distance.

I really shouldn't speculate too much on the advantages, with random partiontiong, the elements are put faster into the right partition I guess, due to the longer swapping distance. The call tree will be more unbalanced, because the differing sizes of the partions, but maybe some middle operations are saved for all I know.

There is of course also that,  hower small the random number generator is, it takes something to outperform (low+high) div 2. smile

Have a nice evening.

Edit

I changed the partitioning algorithm to one that swaps over longer distances, (Wirths), and now it has decreased yet another 50% in time (for 1000 elements). smile

It may also be, that the had a "tutorial-version" of quicksort, using the partitioning of a dataset, using a random pivot as an introduction to the subject, I don't know that before I have the "Cormen-book" in front of me.

Last edited by McUsrII (2015-06-27 10:07:22 am)

Offline

 

#17 2015-06-27 02:31:01 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello.

Niklaus Wirth also writes that  a random pivot, that just deviates by 1 or so from the middle, is used for breaking a "worst case pattern of input, where the pivot value with cause one of the partitions to be of length 1, again, again and again, due to that median recurrently hits the largest value in the current partition.

Other than that, reading passed the quicksort theme, I found the original Find algorithm by C. A. R. Hoare, which has partioning inlined, without any recursion, or random number generated pivot. smile I'll be back with that one later, at least the timing result. (Almost like the last partition implementation I posted).

Last edited by McUsrII (2015-06-27 02:38:53 pm)

Offline

 

#18 2015-06-27 07:23:58 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

"McUsrII wrote:

Niklaus Wirth also writes that  a random pivot, that just deviates by 1 or so from the middle, is used for breaking a "worst case pattern of input


Ah. That does seem to work slightly quicker than a pivot chosen randomly from the entire range, not least of all because it requires fewer instructions.  smile

Applescript:

-- Random from entire range.
tell (r - l + 1) to set pivot to o's extract's item ((x0 / 1.0E+9) * it mod it div 1 + l)

-- Randomly either the middle item or the one after it.
set pivot to o's extract's item ((l + r) div 2 + x0 div 5.0E+8 mod 2)

However, on my machine, it's still averaging marginally slower than simply going for the middle item.

Median-of-3 still produces the fastest sorts.  smile

Last edited by Nigel Garvey (2015-06-27 07:34:55 pm)


NG

Offline

 

#19 2015-06-28 02:30:52 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

I have no doubt that the median of 3 is the best approach, it is in interesting thought though, in itself, to skew the median slightly, to avoid always hitting upon the largest element in a partition, if the input is organized that way.

Here is my take on that, this approach doesn't consider usage of insertion sort, or other means when the partitions gets small, but it doesn't create any partitions of a single element, if it is avoidable. I use mod 3, since the median is divided by two, and makes a "pendulum", that points to the left of the median, the original, and the item right of the median, to avoid hitting the exact same element. The right happenstance, occurs quite seldom, but it should vary between  the element to the left of the median, and the original median for each halving of the partition.

Applescript:

(*
   Scheme for skewing a median with -1, 0 or 1 , if applicable.
   
   Idea: a mod 3 will never repeat itself, when the medians are divisible by two.
   
   So the median mod 3 will vary with values 0 - 2. the 0 result will be more seldom
   so most of the time, the median will be skewed 1 to the left, or kept at its
   original position, but it should really vary with each call .

   We don't bother to skew however, if that creates a partition of the first element.

*)

to switch(low, high)
   set aMedian to (low + high) div 2
   return aMedian - 1 * ((aMedian - low > 2) as integer) * ((aMedian) mod 3 - 1)
end switch

Last edited by McUsrII (2015-06-28 02:31:38 am)

Offline

 

#20 2015-06-28 08:42:10 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Hi McUsrII.

I think you're confusing a few ideas here:

1. Median. The value sought in a median value search. At each recursion, it's known to be one of the values in the current range, but not which. It has no relevance in Quicksort per se, apart from being the optimum pivot in the first round of partitioning.
2. Median position. My term for the position the median value would occupy if the list were sorted. This is a known entity. It's the middle position (or one of the two middle positions) in the entire range, but not necessarily so in the subranges handled by the recursions.
3. Pivot value. The value in the current subrange around which partitioning is based.
4. Position from which the current pivot value is taken. Depends on the pivot-choice method used.
5. Positions where instances of the current pivot end up after partitioning. Depends on the pivot value relative to the others in the current subrange.

It very often happens during partitioning that the left and right indices cross and stop two positions apart. This is because they've met on an instance of the pivot value and have caused it to be "swapped" with itself. They've now stopped either side of the pivot instance and it's in its final position in the sort.

This is what's exploited in a median value search. When the left and right indices stop either side of the median position, whatever's in that position must be an instance of the pivot equal to the median value. And because of the way recursion's applied up till that point (ie. to the subranges containing the median position, even when down to only one item), the situation must eventually occur.

So unlike in a full Quicksort, where the optimum pivot value is the median value of the current subrange, the optimum pivot in a median value search is the median of the entire range. We don't necessary want to avoid extreme pivot values in the subranges because one of them's likely to be the median value anyway. Obviously it's still a worst-case scenario if you consistently pick the wrong one, but hitting the right one greatly speeds up homing in on the final result. Perhaps a good system would be to use the higher of two (or three) values when subsorting a left partition or the lower of two (or three) when subsorting a right.

Last edited by Nigel Garvey (2015-06-28 08:45:19 am)


NG

Offline

 

#21 2015-06-28 12:43:05 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

My idea was solely, to avoid picking the one that lead to the "wrong" result, over and over, due to some freak pattern in the input, that coincides with the halving. And the code as such, was just meant to illustrate what I meant, as I grasp that sometimes something like that is used, as an "insurance" against the worst case scenario. I really meant to median position, and not the median, as you describe it. The two terms are often used interchangeably though. Maybe I should look over my use of pivot, and pivotvalue, I seem to use pivot for both of those as well. I do reckognize  the importance, to communicate such subjects in an unambigous way, and will take greater care for the future.

It's been a busy day, I'll post the next  version of  the median handler: find/quickselect by the afternoon.


By the way, now that a large part of the world that doesn't write from left to right parttake in programming, wouldn't it be better to name the the start and end of a range, consistently with "start" and "end" instead of "left" and "right". This is no critique of you of course, I came to think of it the other day, and felt  just that the general convention of using "left" and "right" to denote the start and end of a range, is a bit outdated by now. smile

Last edited by McUsrII (2015-06-28 12:55:24 pm)

Offline

 

#22 2015-06-28 02:19:45 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

McUsrII wrote:

By the way, now that a large part of the world that doesn't write from left to right parttake in programming, wouldn't it be better to name the the start and end of a range, consistently with "start" and "end" instead of "left" and "right". This is no critique of you of course, I came to think of it the other day, and felt  just that the general convention of using "left" and "right" to denote the start and end of a range, is a bit outdated by now. smile


Hah! Having been under the impression that the only dialect of AppleScript since Mac OS 9.0 has been "International English", I switched my system over to Arabic just now to see what things would look like in Script Editor. I was greeted with the ridiculous spectacle of script texts being in English and written from left to right, but aligned hard against the right margins without indentation. Lists were indeed ordered from right to left instead of vica versa — but only in the Result pane. Typing and running 'set l to {1, 2, 3, 4, 5}' returned '{5, 4, 3, 2, 1}'.

But in any case, since I'm unlikely at my age to write about scripts in anything other than a western European language — specifically British English — I think I'll stick with "left" and "right" where convenient to avoid the linguistic acrobatics that would be required otherwise.  smile

Last edited by Nigel Garvey (2015-06-28 02:23:02 pm)


NG

Offline

 

#23 2015-06-28 02:32:03 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

I didn't mean you, nor just AppleScript, I meant that the whole landscape for using languages has changed, since left or right was introduced, probably sometime in the 1950's. And I wasn't after how things are written, but the "mental model" of it. An Arabic, or Chinese person, people from other regions, may start from the right side, and therefore think of the left, as the end of it. Hopefully code is written in English, since that is the Latin of our day. But sticking with a western language, doesn't mean that we shouldn't adapt global mental-models we all can share.

Left and right is pretty established by now, and I think it is at least impractical to change it by now. But new algorithms and api's should probably consider such issues, along with the fact that we now support writing and reading from right to left, in order to provide as much of  "wholeness" as possible, making the environment we live and breathe in as friendly as possible on a global scale.

I mentioned it, in general, because it touched the theme of naming things unambiguosly. I actually pondered writing a blog post about it, but dismissed it, as being too "nerdy". smile

Offline

 

#24 2015-06-28 05:00:36 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello.

Here is the last and best version of the "find k'th least element", that can be used for finding the median of a list, probably in order to pick the root element when building a binary tree in order to keep it balanced.

Hoare's original version, beat  the previous version with almost 50% -again. smile

Applescript:


property scriptTitle : "Median Hoare Version"
set maxitems to 1000

script pseudo
   -- pure multiplicative random generator.
   -- Its faster most of the time than Standard Additions random number (10000 runs).
   -- it has been proved that the sequence doesn't repeat itself before 2^31 -2 calls have been made.
   -- yet: No warranties about anything.
   -- Rosen "Discrete Mathematics" p. 207
   property x0 : 3
   property rmod : 2 ^ 31 - 1
   property multiplier : 7 ^ 5
   
   on rand()
       set x0 to (multiplier * x0) mod rmod
       return (x0 / 1.0E+9)
   end rand
   on init()
       rand()
   end init
end script

pseudo's init()
set l to {}
repeat maxitems times
   set end of l to rand(1, maxitems)
end repeat
set max to 0
repeat with i from 1 to maxitems
   if item i of l > max then set max to item i of l
end repeat
log "max: " & max

set min to (maxitems + 1)
repeat with i from 1 to maxitems
   if item i of l < min then set min to item i of l
end repeat
log "min: " & min

set ll to length of l
set k to ll div 2 + 1 * (((ll mod 2) > 0) as integer)

set t0 to (current date)
if 1 = 1 then
   repeat 1000 times
       set md to find(l, k)
   end repeat
else
   set md to find(l, k)
end if
set t1 to ((current date) - t0) / 1000
log "time T : " & t1
log "median val: " & md
set t0 to (current date)
set m to countingsort(l, max)
set t1 to ((current date) - t0) / 1000
log "time T : " & t1

log "low: " & item (k - 1) of m
log "median: " & item k of m
log "high: " & item (k + 1) of m


(*
   This version uses at most 2n comparisions in the average case ( O(n) )
   and O(n^2) comparisions in worst case. Probability of a worst case is
   greater than ( 1 / n! ) and *much* lesser than 1/n.
*)


on find(|L|, k)
   -- returns the k'th least element of a list.
   -- By C.A.R Hoare in an article. Implemented in Pascal by N. Wirth
   -- Adapted to AppleScript by McUsr.
   script o
       property A : |L|
   end script
   
   set low to 1
   set high to length of |L|
   
   repeat while high > low
       set x to item k of o's A
       set i to low
       set j to high
       repeat while i ≤ j
           repeat while x > item i of o's A
               set i to i + 1
           end repeat
           repeat while item j of o's A > x
               set j to j - 1
           end repeat
           if i ≤ j then
               set w to item i of o's A
               set item i of o's A to item j of o's A
               set item j of o's A to w
               set i to i + 1
               set j to j - 1
           end if
       end repeat
       if j < k then set low to i
       if k < i then set high to j
   end repeat
   return item k of o's A
end find

on rand(low, high)
   -- returns a random integer, within low and high inclusive
   -- Made by McUsr
   set k to high - low + 1
   if 0 = 1 then
       return ((random number) * k mod k div 1 + low)
   else
       return ((pseudo's rand()) * k mod k div 1 + low)
   end if
end rand

on countingsort(|L|, k)
   -- countingsort: origin unknown. found it in an algorithms lecture from MIT
   -- Implemented in AppleScript by McUsr 2015/6/24
   script o
       property l : |L|
       property C : missing value
       property Cp : missing value
       property B : missing value
   end script
   set ll to length of o's l
   copy |L| to o's C
   repeat with i from 1 to ll
       set item i of o's C to 0
   end repeat
   
   repeat with i from 1 to ll
       set item (item i of o's l) of o's C to (item (item i of o's l) of o's C) + 1
   end repeat
   copy o's C to o's Cp
   
   repeat with i from 2 to k
       set item i of o's Cp to (item (i - 1) of o's Cp) + (item i of o's C)
   end repeat
   copy o's C to o's B
   
   repeat with j from ll to 1 by -1
       tell item j of o's l
           set item (item it of o's Cp) of o's B to it
           set item it of o's Cp to (item it of o's Cp) - 1
       end tell
   end repeat
   return o's B
end countingsort

Edit
Removed some debugging info.
Incorporated the fast random number generator, just for the hell of it. wink

Last edited by McUsrII (2015-06-28 05:28:34 pm)

Offline

 

#25 2015-06-29 05:01:04 pm

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4457

Re: Effective way of finding the median in a list of numbers

Here's a different approach by way of experiment. It counts how many values are less than and greater than each value in the range. When neither count is more than the half the length of the range, the current value is either the median value or one of the two which have to be averaged to get it. The huge cost of comparing every value with all the others is mitigated by only doing it with values lying between contracting limits which have proved too high or too low earlier in the process.

The script here actually takes two or three times as long as my other one, but the difference only becomes noticeable with about three thousand or so values.  smile

Edit: Previous bug fix replaced with a more satisfactory compromise. Also some reworking of comments and a couple of variable name changes.
Further Edit: It turns out that a simpler fix for the bug was simply to leave out the aspect of the "limits" idea which led to it! roll  The preliminary sample of lowest and highest values is no longer carried out. There's no discernable effect on performance.

Applescript:

(* Find the median of the values (presumed to be integers) in a range of a list.
By Nigel Garvey 2015.
With odd numbers of items, the median value's the one which would be in the middle if the range were sorted. With even numbers of items, it's the average of the two middle values.

Parameters: (list, range index 1, range index 2)
*)


on medianValue(theList, l, r)
   script o
       property lst : theList
   end script
   
   -- Process the range parameters.
   set listLen to (count theList)
   if (l < 0) then set l to listLen + l + 1
   if (r < 0) then set r to listLen + r + 1
   if (l > r) then set {l, r} to {r, l}
   if ((l < 1) or (r > listLen)) then error "Duff range parameters!" -- Compose as required.
   
   set rangeLen to r - l + 1
   set halfRangeLen to rangeLen div 2
   set oddLen to (rangeLen mod 2 is 1)
   -- Limits to be set and tightened as the search discovers what's too low or too high.
   set tooLow to missing value
   set tooHigh to missing value
   -- Result variables.
   set result1 to missing value
   set result2 to missing value
   
   -- The search takes each value which isn't known to be too low or too high and checks how many of the other values are higher than it and how many are lower. When neither count is more than half the length of the range, the current value is either the median or one of the two values which must be averaged to get it.
   repeat with i from l to r
       set iVal to item i of o's lst
       if (((tooLow is missing value) or (iVal > tooLow)) and ((tooHigh is missing value) or (iVal < tooHigh))) then
           -- This value's within the current limits. Compare it with all the other values in the range.
           set lowerCount to 0
           set higherCount to 0
           repeat with j from l to r
               if (j is i) then
                   -- Don't compare it with itself!
               else
                   set jVal to item j of o's lst
                   -- Increment the lower or higher count (or do nothing) accordingly.
                   -- (It's slightly faster to complete the counts and check them afterwards than it is to check after each increment in the hope of finishing early.)
                   if (jVal < iVal) then
                       set lowerCount to lowerCount + 1
                   else if (jVal > iVal) then
                       set higherCount to higherCount + 1
                   end if
               end if
           end repeat
           
           -- Now act on the count results.
           if (lowerCount > halfRangeLen) then
               -- Over half the other values are lower than this one, so it's too high.
               set tooHigh to iVal
           else if (higherCount > halfRangeLen) then
               -- Over half the other values are higher, so it's too low.
               set tooLow to iVal
           else if (result1 is missing value) then
               -- This value's the first hit. Log it and, if the range is an odd length, end the search.
               set result1 to iVal
               if (oddLen) then exit repeat
           else if (iVal is result1) then
               -- This value's the same as the first hit. Ignore it.
           else
               -- This value's the second hit. Log it and end the search.
               set result2 to iVal
               exit repeat
           end if
       end if
   end repeat
   
   -- Return the first hit if there's only one; otherwise the average of the two, as an integer if possible.
   if (result2 is missing value) then return result1
   set |median| to (result1 + result2) / 2
   tell |median| as integer to if (it is |median|) then set |median| to it
   return |median|
end medianValue

--(* Demo:
set l to {}
repeat with i from 1 to 6
   set end of my l to (random number 1000)
end repeat

log l
-- Find the median of values 1 thru -1 of l.
set m to medianValue(l, 1, -1)

Last edited by Nigel Garvey (2015-07-01 03:26:10 am)


NG

Offline

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)