Hi,

Is there a fast way to return the item number of a certain item in a very long list?

For short list it is possible to loop through the code but for a list of 1000 items this is not very fast.

many thanks,

gecko

Hi,

Is there a fast way to return the item number of a certain item in a very long list?

For short list it is possible to loop through the code but for a list of 1000 items this is not very fast.

many thanks,

gecko

Hi gecko,

unfortunately ApplScript can’t search a list by its index number without parsing the whole list,

a binary search algorithm are a few lines more but should be much faster.

I’m not a mathematician at all and this is only a quick amateurish attempt of such an algorithm,

but it works

```
set value to "g"
set theList to {"a", "b", "c", "d", "e", "f", "g", "h"}
display dialog ((BinarySearch(theList, value)) as string)
on BinarySearch(theList, value)
set {low, high} to {1, count theList}
if {value} is not in theList then return false
copy high to lasth
set high to low + ((high - low) div 2)
repeat until high < low
if {value} is in items low thru high of theList then
if high = low then exit repeat
copy high to lasth
set high to low + ((high - low) div 2)
else
set low to high + 1
set high to lasth
end if
end repeat
return low
end BinarySearch
```

Hi gecko,

For quick access to list items, store your list in a script object. e.g.

```
-- just getting a big list
script S
property l : {}
-- just getting a list of words from dictionary
set l to paragraphs of (do shell script "look a")
end script
run S
set search_string to "anteater"
--set t1 to the ticks
set c to count l of S
set item_index to 0
repeat with i from 1 to c
set this_item to item i of l of S
if this_item is search_string then
set item_index to i
exit repeat
end if
end repeat
--set t2 to the ticks
--display dialog (t2 - t1)
{item_index, item item_index of l of S}
```

With this method, search_string can be any type. You can speed it up even faster if the list contains only strings of certain length, I think.

gl,

wow! very fast!!

thanks for the help,

gecko

Hi kel,

with Nigel Garvey’s “Lotsa” routine the binary search is approx. 8 times faster than parsing the list

(“Lotsa” requires GetMilliSec)

```
timeLotsa(20) -- Nigel Garvey's "Lotsa" routine for timing.
-- The Lotsa handler for comparison
on timeLotsa(lotsa)
script S
property l : {}
-- just getting a list of words from dictionary
set l to paragraphs of (do shell script "look a")
end script
run S
set search_string to "anteater"
repeat lotsa times
end repeat
-- Test 1.
set t to GetMilliSec
repeat lotsa times
set {low, high} to {1, count S's l}
copy high to lasth
set high to low + ((high - low) div 2)
repeat until high < low
if {search_string} is in items low thru high of S's l then
if high = low then exit repeat
copy high to lasth
set high to low + ((high - low) div 2)
else
set low to high + 1
set high to lasth
end if
end repeat
low
end repeat
set t1 to ((GetMilliSec) - t) / 1000
-- Test 2.
set t to GetMilliSec
repeat lotsa times
repeat with i from 1 to count S's l
if search_string is item i of S's l then exit repeat
end repeat
i
end repeat
set t2 to ((GetMilliSec) - t) / 1000
-- Timings.
return {t1, t2, t1 / t2, t2 / t1} -- both ratios if you don't know which is faster
end timeLotsa
```

Hi Stefan,

Isn’t a lot of the speed because the example uses a sorted list of strings? I don’t know a lot about sorting, but think I read that insertion sort is very fast on a sorted list. Wonder if this is true. Personally, with big list I think I would keep a lookup table for getting index to list items, but big lists are dangerous to work with anyway. I try looking at the binary sort.

Thanks,

No, a binary search routine in a sorted list is still faster because you compare only with “<” and “>” operators

for example this is a common binary search routine if the list is sorted

```
on BinarySearch(theList, value)
set {low, high} to {1, count theList}
set p to low + ((high - low) div 2)
repeat while low is less than or equal to high
if item p of theList > value then
set high to p - 1
else if item p of theList < value then
set low to p + 1
else
return p
end if
set p to low + ((high - low) div 2)
end repeat
return false
end BinarySearch
```

Hi Stefan,

Yeah I got search and sort mixed up. I remember binary search now. It’s simplicity made it easy to remember.

Thanks,

Hi, both.

The binary search is only about three times as fast on my machine, but that’s still pretty good! Thanks, Stefan.

Here’s a slightly optimised version. Besides the obvious script object, the ‘high’ and ‘low’ variables are handled more economically. Also, on the theory that the item’s more likely to be in the list than not, that test’s left to the end, when there’s only one item to test:

```
BinarySearch(get paragraphs of (do shell script "look a"), "anteater" as Unicode text)
on BinarySearch(TheList, value)
script o
property l : TheList
end script
set low to 1
set high to (count TheList)
repeat until (low = high)
set mid to (low + high) div 2
if ({value} is in items low thru mid of o's l) then
set high to mid
else
set low to mid + 1
end if
end repeat
if (item low of o's l is value) then return low
return false
end BinarySearch
```

Very neat, Nigel.

I think, you can often save an extra comparison loop, if you decrement the high value like incrementing the low one.

For this change the repeat condition has also to be adapted

```
on BinarySearch(theList, value)
local low, mid, high
script o
property l : theList
end script
set low to 1
set high to (count theList)
repeat while (low â‰¤ high)
set mid to (low + high) div 2
if ({value} is in items low thru mid of o's l) then
set high to mid - 1
else
set low to mid + 1
end if
end repeat
if (item low of o's l is value) then return low
return false
end BinarySearch
```

Hi, Stefan.

Just looking at this quickly, I suspect that it might not find the item if the item’s exactly at a ‘mid’ position. I have to go out now, but I hope to look at this subject again tonight. As with a linear seach, the time taken by a binary search depends on whereabouts in the list the item actually is. For instance, although the binary search is much faster at finding “anteater” in Kel’s list, the linear search seems to have a slight advantage when looking for “aardvark”.

This is obvious, as long as the position of the item to find is less than the number of loops plus the code overhead of the binary search.

By contrast the binary search is incredibly faster if the position is near to the end of the list.

On average a binary search is always faster than a linear search

Hi, Stefan.

I think I agree with that. And when the linear search is faster, it’s not by very much, so the binary search is generally a safe bet.

Contrary to my intial guess, your effort to save the extra comparison loop does find items in ‘mid’ positions. But it doesn’t, on average, in my tests, save any time. Sometimes it does fewer loops than my optimisation, sometimes more, sometimes the same number. The totals for the two methods over the first (list length - 1) items seem to be exactly the same, but finding the very last item in the list always takes one more loop with your optimisation than with mine.

Test method: log {low, mid, high} on each loop and call the handler with a logged loop that finds every item in the list. (The items must be unique, of course.) Use similar scripts for the two optimisations, vary the length of the input list, compare the event logs.

```
on BinarySearch(theList, value)
script o
property l : theList
end script
set low to 1
set high to (count theList)
repeat until (low = high)
set mid to (low + high) div 2
log {low, mid, high}
if ({value} is in items low thru mid of o's l) then
set high to mid
else
set low to mid + 1
end if
end repeat
if (item low of o's l is value) then return low
return false
end BinarySearch
set theList to {"a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "m", "o", "p"}
repeat with i from 1 to (count theList)
log BinarySearch(theList, item i of theList)
end repeat
```

You’re right, Nigel, your script is a bit faster.

Comparing directly both scripts with Lotsa (1000 loops and a list of 1000 items)

your version takes 49.066 seconds and mine 49.566 seconds

For multiple occurences in the list this modification should work:

```
set theList to {"a", "b", "c", "d", "e", "f", "a", "g", "h", "i", "j", "k", "a", "l", "m", "m", "o", "p"}
MultiBinarySearch(theList, "a", 0)
on MultiBinarySearch(theList, value, start)
script theScript
property refList : theList
end script
if {value} is not in theScript's refList then return false
set indexes to {}
set low to 1
set high to (count theScript's refList)
repeat until (low = high)
set mid to (low + high) div 2
--log {low, mid, high}
if ({value} is in items low thru mid of theScript's refList) and ({value} is in items (mid + 1) thru high of theScript's refList) then
set indexes to indexes & MultiBinarySearch(items (mid + 1) thru high of theScript's refList, value, (mid + 1))
set high to mid
else if ({value} is in items low thru mid of theScript's refList) then
set high to mid
else
set low to mid + 1
end if
end repeat
if (item low of theScript's refList is value) then
set beginning of indexes to (low + start)
return indexes
end if
end MultiBinarySearch
--result: {1, 14, 8}
```

(Results are not in chronicle order, though.

Some routine like QuickSort would have to do the rest:

http://bbs.applescript.net/viewtopic.php?id=17340)

To search for multiple occurrences of one value in a list, use this one, rather:

```
set theValue to "a"
set theList to {"a", "b", "c", "d", "e", "f", "a", "g", "h", "i", "j", "k", "a", "l", "m", "m", "o", "p"}
Qsearch(theList, 1, count theList, theValue)
to Qsearch(array, leftEnd, rightEnd, theValue)
script A
property l : array
end script
set theIndexes to {}
set {i, j} to {leftEnd, rightEnd}
set v to ((leftEnd + rightEnd) div 2) -- pivot in the middle
repeat while (j > i)
repeat while (i is not greater than v)
if item i of A's l = theValue then
set theIndexes to theIndexes & {i}
exit repeat
else
set i to i + 1
end if
end repeat
repeat while (j is greater than v)
if item j of A's l = theValue then
set theIndexes to theIndexes & {j}
exit repeat
else
set j to j - 1
end if
end repeat
if (i is not greater than j) then
set {i, j} to {i + 1, j - 1}
end if
end repeat
if (leftEnd < j) then Qsearch(A's l, leftEnd, j, theValue)
if (rightEnd > i) then Qsearch(A's l, i, rightEnd, theValue)
theIndexes
end Qsearch
-- The Hoare's QuickSort algorithm will put the result in sequential order.
```

– Result: {1, 13, 7}

Thanks to Vincent and mghilissen for the idea! This version returns the indices in the right order:

```
on MultiBinarySearch(theList, value, l, r)
script o
property lst : theList
property indices : {}
on mbsrch(l, r)
if (item l of my lst is value) then set end of my indices to l
set l to l + 1
set m to (l + r) div 2
if (m â‰¥ l) and ({value} is in items l thru m of my lst) then mbsrch(l, m)
set m to m + 1
if (r â‰¥ m) and ({value} is in items m thru r of my lst) then mbsrch(m, r)
end mbsrch
end script
o's mbsrch(l, r)
return o's indices
end MultiBinarySearch
set theList to {"a", "b", "c", "d", "e", "f", "a", "g", "h", "i", "j", "k", "a", "l", "m", "m", "o", "p", "a", "e"}
MultiBinarySearch(theList, "a", 1, (count theList))
--> {1, 7, 13, 19}
```

[i]Edit: Simplified the code to clarify its working and to remove an “optimisation” that was actually having slightly the opposite effect!

I’ve had a chance to study Vincent’s and mghilissen’s scripts this morning. I hope they won’t mind if I comment here.

Vincent’s script returns the right number of indices, but those returned from lower recursion levels aren’t correct. The value passed to ‘start’ in the recursive calls should be ‘(mid + start)’, not ‘(mid + 1)’.

mghilissen’s script is based on the Quicksort algorithm and is actually a series of converging linear searches that find the first instance of the value, then the last, then the second, then the second-to-last, and so on. For this reason, it’s slightly slower than a single linear search straight through the list. The two recursive calls at the end contribute nothing to the final result but add a terrific amount to the running time.

Binary searches are only effective “ and are only necessary “ because AppleScript’s a “high level” language. The ‘is in’ command actually carries out a linear search itself, but at a lower coding level. This is much faster than using high-level AppleScript commands to carry out the individual details of a linear search. (In much the same way, a person will walk across the room much faster if you tell them to walk across the room than if you give them individual instructions for lifting each leg, shifting their weight forward, and falling in a controlled manner back onto the foot of the current leg. There’s probably a similar allegory to do with Government interference and the National Health Service, but I won’t go into that here. )[/i]

I am absolutely delighted to take lessons from you, Nigel. Only perfect practice makes perfect. Thanks for your insight.

Michael

This is awesome, Nigel

Ditto!:D:cool: