Most common item in a list

I find myself in need of a handler that will find the most common item in a list. The list happens to be pages in an InDesign document, but I would think that the best approach would be to write something generic that will handle any kind of list.

Here’s the only way my brain can break down the steps:

  1. Get the unique items of the list

  2. Count the instances of each item in the list. Store this information in record form {item:x,frequency:1}

  3. Sort the records according to frequency. Return the item of the last (most frequent) record.

That seems overly complicated, so before I waded in I thought I check here first to see if someone already has a nifty, three-line solution.

This will handle a list of numbers or strings. If you need support for a list of lists or records, it could be adaptated. It will return the last item more frequent in the list. For example, if it finds 4 “x” and 4 “y”, it will return “y”. It could be also adaptated to return the first more-frequent, or a list of more-frequents (modify “greaterInteger()”).

set theList to {}
repeat 50 times
	set theList to theList & items of "This is a simple test and I'll guess which is the most popular character in this sentence."
end repeat
--> theList contains now 4500 items

mostPopularItemInList(theList)

to mostPopularItemInList(x)
	script accellerator
		property theList : x
		property originalItems : {}
		property originalItemsCount : {}
		on indexof(theItem, theList) -- credits Emmanuel Levy
			set text item delimiters to return
			set theList to return & theList & return
			set text item delimiters to {""}
			try
				-1 + (count (paragraphs of (text 1 thru (offset of (return & theItem & return) in theList) of theList)))
			on error
				0
			end try
		end indexof
		to greaterInteger()
			set largerItem to 0
			set largerItemIndex to 0
			repeat with i from 1 to count originalItemsCount
				if originalItemsCount's item i > largerItem then
					set largerItem to originalItemsCount's item i
					set largerItemIndex to i
				end if
			end repeat
			originalItems's item largerItemIndex
		end greaterInteger
	end script
	
	repeat with i from 1 to count accellerator's theList
		considering case
			if accellerator's theList's item i is not in accellerator's originalItems then
				set accellerator's originalItems's end to accellerator's theList's item i
				set accellerator's originalItemsCount's end to 1
			else
				set ind to accellerator's indexof(accellerator's theList's item i, accellerator's originalItems)
				set accellerator's originalItemsCount's item ind to ((accellerator's originalItemsCount's item ind) + 1)
			end if
		end considering
	end repeat
	
	return accellerator's greaterInteger()
end mostPopularItemInList

Here’s a an attempt at a generic one:

on mostCommonItemIn(theList)
  if theList is {} then return -- make your own arrangements for empty lists
  script o
    property mainList : theList
    property remainderList : {}
  end script
  
  set highestScore to 0
  repeat until (o's mainList's length < highestScore)
    set currentItem to item 1 of o's mainList
    set currentScore to 1
    repeat with i from 2 to (count o's mainList)
      if (item i of o's mainList is currentItem) then
        set currentScore to currentScore + 1
      else
        set end of o's remainderList to item i of o's mainList
      end if
    end repeat
    
    if (currentScore < highestScore) then
    else
      set highestScore to currentScore
      set mostCommonItem to currentItem
    end if
    
    set o's mainList to o's remainderList
    set o's remainderList to {}
  end repeat
  
  return mostCommonItem
end mostCommonItemIn


set theList to {1, 2, {a:"Hello", b:2}, {"I am a list"}, "Hello", {b:2, a:"Hello"}, Wednesday, true}
mostCommonItemIn(theList)
--> {a:"Hello", b:2}

That assumed, of course, that there would would always be more of one item in the list than of any of the others. Here’s an improved version that allows for the fact that there may be two or more “most common” items:

on mostCommonItemsIn(theList)
  if theList is {} then return {}
  
  script o
    property mainList : theList
    property remainderList : {}
  end script
  
  set highestScore to 0
  set listLen to o's mainList's length
  repeat until (listLen < highestScore)
    set currentItem to item 1 of o's mainList
    set currentScore to 1
    repeat with i from 2 to listLen
      set x to item i of o's mainList
      if (x is currentItem) then
        set currentScore to currentScore + 1
      else
        set end of o's remainderList to x
      end if
    end repeat
    
    if (currentScore > highestScore) then
      set highestScore to currentScore
      set mostCommonItems to {currentItem}
    else if (currentScore = highestScore) then
      set end of mostCommonItems to currentItem
    end if

    set o's mainList to o's remainderList
    set o's remainderList to {}
    set listLen to listLen - currentScore
  end repeat
  
  return mostCommonItems
end mostCommonItemsIn


set theList to {false, true, 1, Wednesday, 2, {a:"Hello", b:2}, {"I am a list"}, "Hello", {b:2, a:"Hello"}, Wednesday, true, false}

mostCommonItemsIn(theList)
--> {false, true, Wednesday, {a:"Hello", b:2}}

Frequency count is a standard CS101 problem (i.e. Google is your friend). e.g. A typical solution using Python:

alist = ['a', 'b', 'c', 'd', 'b', 'c', 'a', 'c', 'c', 'd']

# count number of times an item appears (this implementation assumes items are immutable, e.g. strings)
freqs = {}
for item in alist:
    # item = item.lower() # uncomment this line to do case-insensitive word counts
    freqs[item] = freqs.get(item, 0) + 1
# find item(s) with highest frequency
high = max(freqs.values())
found = [item for item, freq in freqs.items() if freq == high]

print high, found # -> 4 ['c']

You can do it in AppleScript, but it’ll be deadly slow compared to languages like Perl and Python so probably best avoided if you can help it.

thanks, all. you’ve saved me much time and aggravation.

i’ve read about the trick of using a reference to a script’s list, as a way of speeding things up, but it never stuck in my brain before. maybe this time.

Python? NG? :evil:
Next time I’ll wait at least 24 hrs. before posting my answer. :rolleyes:

Here’s another one, using do shell script, which I’ve modified from a file that was posted on a FileMaker list by Bruce Robertson:
(the 1 can be changed to however many you want to see, 5, etc.)

set theData to {“red”, “blue”, “purple”, “green”, “red”, “red”, “red”, “yellow”}
my getTop(theData)

on getTop(theData)
set AppleScript’s text item delimiters to ASCII character 10
try
set shellStuff to " | sort | uniq -c | sort -rn | head -1"
set theTop to do shell script "echo " & quoted form of (theData as text) & shellStuff
–set theTop to do shell script "echo " & (theData as text) & shellStuff
–just gets the top thing, not the count
on error errMsg number errNum
set AppleScript’s text item delimiters to “”
end try
set AppleScript’s text item delimiters to “”
return theTop
end getTop