Monday, December 17, 2018

#26 2015-06-30 02:38:41 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

Your last version, looks interesting too.  I think that there is something proportional to log n, that makes the difference kick in noticably at 3000 items.

Well, I didn't come around to it today, but I'm going to present two O(n) versions, at least over some days, so it works properly when I post it. The first is a cheat, and based on CountingSort above, the other is the median of medians algorithm, which is the real deal, that also has a worstcase of O(n). The problem with that solution as I see it, is that you have to calibrate the algorithm with the number of items, as I think you need at least 25 elements for the median of medians to work. Well, N.Wirths says that if you have less than or equal to 10 items  then you should sort and pick the median from the sorted elements. I am also a bit eager to use the median for building a complete binary tree, but I haven't figured yet, if that is possible or a feasible way to do it. (But then I really need the list to be sorted anyway.) smile

Offline

 

#27 2015-07-01 01:31:40 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello.

I came by this little random number generator, that promises to generate random numbers, perceived as random by humans, (not totally uniformly distributed). Which may be a technique suitable sometimes, when needing random number s for something. This random number generator, have been suitable for doom, so it should be suitable for small AppleScript games. You can read all about it in the links in the code.

Applescript:

script perlinsRand
   -- [url]https://github.com/id-Software/DOOM/blob/master/linuxdoom-1.10/m_random.c[/url]
   -- [url]https://news.ycombinator.com/item?id=9809998[/url]
   property randInts : {0, 8, 109, 220, 222, 241, 149, 107, 75, 248, 254, 140, 16, 66, ¬
       74, 21, 211, 47, 80, 242, 154, 27, 205, 128, 161, 89, 77, 36, ¬
       95, 110, 85, 48, 212, 140, 211, 249, 22, 79, 200, 50, 28, 188, ¬
       52, 140, 202, 120, 68, 145, 62, 70, 184, 190, 91, 197, 152, 224, ¬
       149, 104, 25, 178, 252, 182, 202, 182, 141, 197, 4, 81, 181, 242, ¬
       145, 42, 39, 227, 156, 198, 225, 193, 219, 93, 122, 175, 249, 0, ¬
       175, 143, 70, 239, 46, 246, 163, 53, 163, 109, 168, 135, 2, 235, ¬
       25, 92, 20, 145, 138, 77, 69, 166, 78, 176, 173, 212, 166, 113, ¬
       94, 161, 41, 50, 239, 49, 111, 164, 70, 60, 2, 37, 171, 75, ¬
       136, 156, 11, 56, 42, 146, 138, 229, 73, 146, 77, 61, 98, 196, ¬
       135, 106, 63, 197, 195, 86, 96, 203, 113, 101, 170, 247, 181, 113, ¬
       80, 250, 108, 7, 255, 237, 129, 226, 79, 107, 112, 166, 103, 241, ¬
       24, 223, 239, 120, 198, 58, 60, 82, 128, 3, 184, 66, 143, 224, ¬
       145, 224, 81, 206, 163, 45, 63, 90, 168, 114, 59, 33, 159, 95, ¬
       28, 139, 123, 98, 125, 196, 15, 70, 194, 253, 54, 14, 109, 226, ¬
       71, 17, 161, 93, 186, 87, 244, 138, 20, 52, 123, 251, 26, 36, ¬
       17, 46, 52, 231, 232, 76, 31, 221, 84, 37, 216, 165, 212, 106, ¬
       197, 242, 98, 43, 39, 175, 254, 145, 190, 84, 118, 222, 187, 136, ¬
       120, 163, 236, 249}
   property randEntries : length of randInts
   property curRand : 0
   on clearRnd()
       set curRand to 0
   end clearRnd
   on next()
       if curRand < randEntries then
           set curRand to curRand + 1
       else
           set curRand to 1
       end if
       return item curRand of randInts
   end next
end script


perlinsRand's next()

Offline

 

#28 2015-07-01 03:01:05 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4741

Re: Effective way of finding the median in a list of numbers

McUsrII wrote:

Hello Nigel.

Your last version, looks interesting too.  I think that there is something proportional to log n, that makes the difference kick in noticably at 3000 items.


Thanks, McUsrII. I must admit that, despite having implemented many sorts in AppleScript over the years, I've never got the hang of expressions like "log n" and "O(n)". I think it's because they didn't convey (to me) why things behaved they way they did — especially in a high-level language like AppleScript, where as much depends on individual command implementations and data characteristics as on the algorithms themselves. But I think it's about time I looked into "n" notation more thoroughly!  smile


NG

Online

 

#29 2015-07-01 03:47:38 am

DJ Bazzie Wazzie
Member
From:: the Netherlands
Registered: 2004-10-20
Posts: 2808
Website

Re: Effective way of finding the median in a list of numbers

Nigel Garvey wrote:

I think it's because they didn't convey (to me) why things behaved they way they did — especially in a high-level language like AppleScript, where as much depends on individual command implementations and data characteristics as on the algorithms themselves.


True, something that keeps AppleScript reminding me every day. It's never the obvious solution which in some sort of way AppleScript completely misses its targets.

Offline

 

#30 2015-07-01 06:31:33 am

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello Nigel.

I don't really ever saw Big O, or Big Theta or Omega as anything but as  a proportion as to how the processing time increases, as the dataset evolves. It is a very coarse measure in itself, just giving the order of magnitude, and it really isn't anything to worry that much about. It is really just one of the trade-off factors. Readability, maintainability, sheer size of the code and so on, being others. Correctness is of course not a trade-off. I seldom do work on so big datasets in AppleScript, that I have to worry about it, since it doesn't make sense, unless you vary wildly in the size of the datasets.

In this particular case, it may slow down because the handlers has a complexity of  n log n  that is, that the processing time increases more than the number of elements (linearly). Then the second handler may be slower  in a part of the algorithm, which maybe has a log N complexity by itself, (the inner loop), that makes up for the slow increase in time. I guess this since the time doesn't increase noticably before 3000 items. Your last algorithm is just a fraction slower, than your previous, and that fraction adds up over time, I actually don't think it is even a log N  increase that is involved, but something that adds up slower, maybe log 1/N, as just a wild guess, not having really calculated anything, not even timed it.  Just timing both for say 1000, 2000, and 3000 items, and comparing the differences, should give a good clue as to how they develop in propertions, as the number of items increases.

@DJ Bazzie Wazzie. That is also a good point, But big O notation, and complexity analysis, still gives a little hint about thing may progress, or how they should have progressed, hadn't the number of items in the list started to degenerate performance by itself for instance, due to number of items in a list larger than some treshold. At least we are more informed when we look it up, than not. -Big Oh notation, provides some guidance, nothing more, but some guidance, is better than none. The more info, the more educated the guess.

Complexity analysis is a huge subject, I personally think it is ok, to be familiar with, but not dwell to deep into, because not before long, you are reading up on finite fields, and the law of total probability, (besides starting to remember arithmetic and geometric progressions, and so on). But well, it has it's charm as well I guess.

Edit

They say that the longer you write about something, the less you really know about it. At least I know that MIT has an open courseware class on the net in Discrete Mathematics, which is a precursor to their open courseware class "Introduction to Algorithms". You really need to know the Discrete Mathematics, to get something out of that Algorithms course. Lots of video lectures, and readings online, and even notes at least from the recitations. You can pick and choose from whatever lectures you want to watch. The lectureres, are really aces, but I tend to shrink a little in my chair, when they enthusiastically talks about "very cool mathematics". wink

Last edited by McUsrII (2015-07-01 07:43:29 am)

Offline

 

#31 2015-07-01 12:54:52 pm

McUsrII
Member
Registered: 2012-11-21
Posts: 3046
Website

Re: Effective way of finding the median in a list of numbers

Hello.

Here is the "cheat" version for finding the median, the reason for this, is that I actually work with a lot of lists those day, where the max element is bounded by the length of the  list. That justifies this very special version, -that performs in linear time. O(N). It also sorts the list, which is a necessity, when finding the median with this version.

Applescript:

set thel to {4, 1, 3, 4, 3, 5, 1}
set soughtK to 4 -- median for 7 items is ceil (7 / 2 ) = 4
countingsortAndMedian(thel, 5, soughtK)
on countingsortAndMedian(|L|, maxval, k)
   -- countingsort: origin unknown. found it in an algorithms lecture from MIT
   -- Implemented in AppleScript by McUsr 2015/6/24
   script o
       property L : |L|
       property C : missing value
       property Cp : missing value
       property B : missing value
   end script
   set ll to length of o's L
   copy |L| to o's C
   repeat with i from 1 to ll
       set item i of o's C to 0
   end repeat
   
   repeat with i from 1 to ll
       set item (item i of o's L) of o's C to (item (item i of o's L) of o's C) + 1
   end repeat
   copy o's C to o's Cp
   
   repeat with i from 2 to maxval
       set item i of o's Cp to (item (i - 1) of o's Cp) + (item i of o's C)
   end repeat
   copy o's C to o's B
   
   repeat with j from ll to 1 by -1
       tell item j of o's L
           set item (item it of o's Cp) of o's B to it
           set item it of o's Cp to (item it of o's Cp) - 1
       end tell
   end repeat
   return {o's B, item k of o's B}
end countingsortAndMedian

I should also be able to build a complete binary tree, while looping over length/2 elements of the sorted array, after having inserted the root element as its root, and populated it with the elements on each side of the median as its children so that I can end up with a complete binary tree, should I wish to. (I see no use for this at the moment, but it may come in handy). The binaray tree, would however be a "read-only" binary tree, in that it would be quite costly to maintain, compared to the cost of re-building it, which should be almost as fast with a small number of items.

Last edited by McUsrII (2015-07-01 12:59:41 pm)

Offline

 

#32 2018-07-14 07:52:48 pm

technomorph
Member
Registered: 2017-12-14
Posts: 81

Re: Effective way of finding the median in a list of numbers

There's also a vanilla way:

Applescript:

use framework "Foundation"

on medianOfList:theList
   set anNSExpression to current application's NSExpression's expressionForConstantValue:theList
   set newNSExpression to current application's NSExpression's expressionForFunction:"median:" arguments:{anNSExpression}
   return (newNSExpression's expressionValueWithObject:(missing value) context:(missing value)) as real
end medianOfList:

Just don't ask me to explain it...



is there a way to use this to find Maximum and Minimum values in a list?

thanks

Offline

 

#33 2018-07-15 02:36:05 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4741

Re: Effective way of finding the median in a list of numbers

technomorph wrote:

is there a way to use this to find Maximum and Minimum values in a list?


Hi.

Substituting "max:" or "min:" for "median:" seems to work:

Applescript:

use AppleScript version "2.4" -- Mac OS 10.10 (Yosemite) or later.
use framework "Foundation"
use scripting additions

on minAndMaxFromList:theList
   set anNSExpression to current application's NSExpression's expressionForConstantValue:theList
   
   set newNSExpression to current application's NSExpression's expressionForFunction:"min:" arguments:{anNSExpression}
   set minVal to (newNSExpression's expressionValueWithObject:(missing value) context:(missing value))
   
   set newNSExpression to current application's NSExpression's expressionForFunction:"max:" arguments:{anNSExpression}
   set maxVal to (newNSExpression's expressionValueWithObject:(missing value) context:(missing value))
   
   return (current application's NSDictionary's dictionaryWithObjects:{minVal, maxVal} forKeys:{"min", "max"}) as record
end minAndMaxFromList:

set theList to {}
repeat 10 times
   set end of theList to (random number 20)
end repeat

{theList, my minAndMaxFromList:theList}

But this is slightly simpler. It doesn't work for medians:

Applescript:

use AppleScript version "2.4" -- Mac OS 10.10 (Yosemite) or later.
use framework "Foundation"
use scripting additions

on minAndMaxFromList:theList
   set anNSArray to current application's NSArray's arrayWithArray:theList
   
   set minVal to (anNSArray's valueForKeyPath:"@min.self")
   set maxval to (anNSArray's valueForKeyPath:"@max.self")
   
   return (current application's NSDictionary's dictionaryWithObjects:{minVal, maxval} forKeys:{"min", "max"}) as record
end minAndMaxFromList:

set theList to {}
repeat 10 times
   set end of theList to (random number 20)
end repeat

{theList, my minAndMaxFromList:theList}


NG

Online

 

#34 2018-07-15 09:30:04 pm

Marc Anthony
Member
From:: Dallas, TX
Registered: 2006-04-27
Posts: 843

Re: Effective way of finding the median in a list of numbers

I wrote this script in April 2015 to help me with some homework in a statistics class. Perhaps it's a relevant option.


set valueList to {59, 63, 59, 59, 60, 59, 63}
set valueList to sort(valueList)
set counted to count valueList

--Mean (average)
set mean to {}
set endvalue to 0
repeat with aVal in valueList
    set endvalue to endvalue + aVal
end repeat
set mean to endvalue / (counted)

--Median (middle value)
set median to {}
if counted mod 2 = 1 then
    set median's end to valueList's middle item --odd
else
    set median's end to ((valueList's middle item) + (valueList's reverse's middle item)) / 2 --even
end if

--Mode (most frequent occurence(s))
set isMode to {}
set traversalPast to {}
repeat with anItem from 1 to count valueList
    set counter to 0
    repeat with another in valueList
        if valueList's item anItem = another's contents then set counter to counter + 1
    end repeat
    if counter > 1 and traversalPast does not contain valueList's item anItem then set isMode's end to valueList's item anItem & ":" & counter & "x, "
    set traversalPast's end to valueList's item anItem
end repeat
if isMode is {} then set isMode to "None"

--Midrange (difference between 1st and last sorted values)
set midrange to ((valueList's item -1) + (valueList's item 1)) / 2

"Min: " & valueList's item 1 & " | Max: " & valueList's item -1 & return & ¬
    "Mean: " & mean & return & ¬
    "Median: " & median & return & ¬
    "Mode: " & isMode & return & ¬
    "Midrange: " & midrange & return & ¬
    "RANGE: " & ((valueList's item -1) - (valueList's item 1))




on sort(thelist)
    set AppleScript's text item delimiters to linefeed
    set new_string to (do shell script "echo " & (thelist as text)'s quoted form & " | sort -g")'s paragraphs
end sort

Offline

 

#35 2018-07-16 06:06:58 am

Nigel Garvey
Moderator
From:: Warwickshire, England
Registered: 2002-11-20
Posts: 4741

Re: Effective way of finding the median in a list of numbers

Marc Anthony wrote:

I wrote this script in April 2015 to help me with some homework in a statistics class. Perhaps it's a relevant option.


Hi Marc.

It seems to do the job, despite the implicit coercion of isMode to text at the end being influenced by the TIDs set in the sort handler and the sorted list containing text!  wink  I wasn't familiar with "mode" in the statistical sense before. From what I've been reading about it this morning, it should be either the most frequently occurring value in the data or the co-most frequently occurring values. The interpretation in your script isn't quite the same.

Once the list is sorted, of course, it's only necessary to have one repeat, and there are a few other optimisations which can be made.

Applescript:

set valueList to {}
repeat 10 times
   set end of valueList to random number 15
end repeat

if (valueList is {}) then
   set {minValue, maxValue, |mean|, |median|, isMode, midrange, range} to {"None", "None", "None", "None", "None", "None", "None"}
else
   set sortedList to sort(valueList) -- The result's a list of numeric texts, but it doesn't matter here.
   log sortedList
   set listLength to (count sortedList)
   
   -- Initialise various variables to the first item in the sorted list.
   set minValue to sortedList's beginning -- Minimum value.
   set thisValue to minValue -- Preset in case there's only one value and the repeat below isn't executed.
   set sumOfValues to minValue -- Sum for the calculation of the average.
   set valueBeingCounted to minValue -- The value whose occurrences are currently being counted.
   set counter to 1 -- The number of times it's occurred so far.
   set currentMode to {valueBeingCounted} -- The value(s) with the highest occurrence count so far.
   set highestOccurrenceCount to 1 -- The number of times they've occurred.
   -- Work through the rest of the sorted list, updating the sum of the values and the mode data.
   repeat with i from 2 to listLength
       set thisValue to item i of sortedList
       set sumOfValues to sumOfValues + thisValue
       if (thisValue is valueBeingCounted) then
           set counter to counter + 1
       else
           if (counter > highestOccurrenceCount) then
               set currentMode to {valueBeingCounted}
               set highestOccurrenceCount to counter
           else if (counter = highestOccurrenceCount) then
               set end of currentMode to valueBeingCounted
           end if
           set valueBeingCounted to thisValue
           set counter to 1
       end if
   end repeat
   -- If necessary, update the mode from the count in progress at the end of the repeat
   if (counter > highestOccurrenceCount) then
       set currentMode to {valueBeingCounted}
       set highestOccurrenceCount to counter
   else if (counter = highestOccurrenceCount) then
       set end of currentMode to valueBeingCounted
   end if
   
   -- The maximum value is the last that was fetched from the list.
   set maxValue to thisValue
   --Mean (average)
   set |mean| to sumOfValues / listLength
   --Median (middle value or average of two middle values)
   set m to (1 + listLength) div 2
   set |median| to item m of sortedList
   if (listLength - m = m) then set |median| to (|median| + (item (m + 1) of sortedList)) / 2 -- even number of items.
   --Mode (most frequently occurring value(s))
   if (highestOccurrenceCount = 1) then set currentMode to minValue
   set astid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to ", "
   set isMode to "{" & currentMode & "}:" & highestOccurrenceCount & "x"
   set AppleScript's text item delimiters to astid
   --Midrange (average of 1st and last sorted values)
   set midrange to (maxValue + minValue) / 2
   --Range (difference between 1st and last sorted values)
   set range to maxValue - minValue
end if

"Min: " & minValue & " | Max: " & maxValue & return & ¬
   "Mean: " & |mean| & return & ¬
   "Median: " & |median| & return & ¬
   "Mode: " & isMode & return & ¬
   "Midrange: " & midrange & return & ¬
   "RANGE: " & range


on sort(thelist)
   set astid to AppleScript's text item delimiters
   set AppleScript's text item delimiters to linefeed
   set sortedList to (do shell script "echo " & (thelist as text)'s quoted form & " | sort -g")'s paragraphs
   set AppleScript's text item delimiters to astid
   return sortedList -- List of text.
end sort

Shane's ASObjC code quoted in post #32 can be adapted to return the mode of a list by simply replacing "median:" with "mode:" and changing the coercion at the end to 'as list'. In this case, the "mode" returned is just a list of the most frequently occurring value(s), with no indication of how often they occur. If there's only one instance of each value in a list, the result is a list containing just the lowest value. The script above also returns the lowest value instead of "None", but I've no opinion about which is better.

Applescript:

use framework "Foundation"
use scripting additions

on modeOfList:theList
   set anNSExpression to current application's NSExpression's expressionForConstantValue:theList
   set newNSExpression to current application's NSExpression's expressionForFunction:"mode:" arguments:{anNSExpression}
   return (newNSExpression's expressionValueWithObject:(missing value) context:(missing value)) as list
end modeOfList:

set theList to {}
repeat 10 times
   set end of theList to (random number 15)
end repeat

{theList, my modeOfList:theList}


NG

Online

 

Board footer

Powered by FluxBB

RSS (new topics) RSS (active topics)