Search for an list item in another list item

Hi Everybody,

I like to perform a search/compare on two list items and result the same existing list elements alone.
Please see my code below, and correct me how to get the desired result without using repeat loop.


set xx to {"a", "b", "f"}
set yy to {"a", "b", "c", "d", "e"}
set zz to every item of xx that is in yy
-- result would be a is {"a", "b"}


set xx to {"a", "b", "f"}
set yy to {"a", "b", "c", "d", "e"}

set zz to CommonElements(xx, yy)

on CommonElements(aList, bList)
	set commonList to {}
	try
		set commonList to CommonElements(rest of aList, bList)
		if (item 1 of aList) is in bList then
			copy item 1 of aList to beginning of commonList
		end if
	end try
	return commonList
end CommonElements

That’s clever mickerson. I was curious to see if it was actually faster too, so I ran a speed test of your code versus this code…

on CommonElements(aList, bList)
	set commonList to {}
	repeat with a in aList
		if a is in bList then set end of commonList to contents of a
	end repeat
	return commonList
end CommonElements

It was 7 times slower. I ran 5000 tests on each and here’s the results. Note, your code was test #2 and the test times are in seconds.

“First Test Time: 0.358
Second Test Time: 2.459
Times Faster: 6.9”

I suspected it would be slower than the repeat loop. The overhead of the repeated calls costs.

BTW, do you know how deep the stack is in AppleScript or is that machine dependent?

Apparently, it’s faster to avoid an error than to trap it.

on CommonElementsBeta(aList, bList)
	set commonList to {}
	try
		if 0 < (count of aList) then
			set commonList to CommonElementsBeta(rest of aList, bList)
			if (item 1 of aList) is in bList then
				copy item 1 of aList to beginning of commonList
			end if
		end if
	end try
	return commonList
end CommonElementsBeta

On 80,000 repetitions, this took 6 seconds. The method in Post #2 took 15 and the repeat loop took 4.

Hey, mikerickson (Mike?). I still have questions about this approach/your construction, but how would this variant fare in your speed test? I suspect that purging the code of “rest” may increase the speed.

set xx to {"a", "b", "f"}
set yy to {"a", "b", "c", "d", "e"}
CE3(xx, yy, 0)

on CE3(aList, bList, counter)
	set commonList to {}
	set counter to counter + 1
	if counter < (count aList) then try
		set commonList to CE3(aList, bList, counter)
		tell aList's item counter to if it is in bList then set beginning of commonList to it
	end try
	commonList
end CE3

**I edited this to include a conditional consideration on my counter. The reason that I thought this might be faster is that I seem to remember reading that “rest” retains its iterations in memory. Can anyone verify this one way or the other?

I’m at work (Windows) and can’t test your version, but I don’t see how it will terminate.

In my version, the method does not call CommonElements when aList = {}
When it does call CommonElements, it passes an list that is shorter than the aList that was passed to it.
Every time CommonElements is called, the aList argument gets shorter until it is empty and the recursion stops.

I see no shortening feature of the method you posted, hence the stack error.

This is a (perhaps) useful excersize in recursive functions, but to return the list of elements common to two lists, the a repeat loop is faster. Best Practices rules like “don’t loop” are useful guides, but when there are always exceptions to those rules.

I ran my own tests on three of these arrangements, and I’ve come up with significantly different results. In my testing, the recursive method performed 19 times faster. I’d be interested to learn the testing environment and parameters of the other testers. The details of my test were as follows: I created two text files containing numeric lists of 15,000 integers; list one was 1-15,000, and list two was 1 to 29,999, in multiples of 2. Before putting the respective code snippets through their paces, each text file was read into memory and assigned to the variables {xx, yy}.

RESULTS (in seconds):

  1. standard repeat loop (Hank’s): 134

  2. CommonElementsBeta: 8

  3. CE3: 7

Testing Environment:
iMac 2.9GHz
OS 10.5.6

I perform speed tests using the method outlined in post #4 here:
http://bbs.applescript.net/viewtopic.php?id=17015

I certainly didn’t use such large lists in my tests, so that could make a difference.

I just put the three methods into a script and ran this.

set testSize to 90000

set startTime to current date
repeat with i from 1 to testSize
	set zz to CommonElementsAlpha(xx, yy)
end repeat
set alphatime to (current date) - startTime

set startTime to current date
repeat with i from 1 to testSize
	set zz to CommonElementsBeta(xx, yy)
end repeat
set BetaTime to (current date) - startTime

set startTime to current date
repeat with i from 1 to testSize
	set zz to CELoop(xx, yy)
end repeat
set LoopTime to (current date) - startTime

return {alphatime, BetaTime, LoopTime} -- returns {18, 6, 4}

This technique isn’t to find out how fast (in an absolute sense) a particular method is, but it will compare different routines to find which is faster.
Your technique found that looping is 7 times faster than recusion on CommaonElementsAlpha.
My routine found a factor of 18/4.
For a quick and dirty “which is faster”, I’d say our results match.
If I was looking for a more absolute number, I’d vary the length and degree that the two lists match.

result => 0.279033

I just ran Mike’s supplied test with the 90000 iterations, using the small list of original values supplied by the OP.
xx = {“a”, “b”, “f”}
yy = {“a”, “b”, “c”, “d”, “e”}

I wasn’t sure if CELoop or CommonElementsAlpha referred to my code, so I relabeled the routines to the original names. This test took notably more time, as it was performed on my home machine, a G4 Mac Mini with OS 10.4.11; RAM was also limited to 512MB, where I have 4GB on my work setup. Comparing this against the other’s results, I’m not sure it proves anything definitively about the performance of the routines other than “it depends”. :confused:

RESULTS

  1. CommonElements (Hank’s): 24
  2. CommonElementsBeta (Mike’s): 28
  3. CE3 (Marc’s): 15

That’s a breakneck result time, Craig, but I don’t know the language. How fast would it go with dual lists of 15,000 items?

It’s Ruby.

Are you asking how fast it will go through dual lists of 15K over 90,000 iterations?
I don’t know. If I were doing that much processing I would use C or Objective-C.

Hello again, Craig. If you’d be willing to do the test, one iteration using that size list would work. I was just curious to see how the performance scales. For my initial test, I generated a list of integers 1-15,000 and another that was 1-30,000 by 2, for parity. Thanks for your time.

This script will create the two test files.

This will test those files.

An easy way to test code in many languages is TextMate.
It is by far my favorite text editor!