Referring to list as a script property for faster list access

adayzdone · May 25, 2012, 11:38pm

I am tweaking an old script to speed thigns up and would like to verify that by reference a list as a script property, I would shave time off each one of the instances below.

repeat with aDate in my datelist

if not (my dateTest contains short date string of aDate) then
				set end of my dateTest to short date string of aDate

if isoImageDate is not in my projList then make new project with properties {name:isoImageDate}

I would love to hear any other ideas on how to speed things up!
Here is the script in its entirety:

property delimiter : "-"
property datelist : {}
property dateTest : {}
property projList : {}

tell application "Aperture"
	-- Get the selected project ID
	tell item 1 of (selection as list) to set projId to parent's id
	
	-- Get date of every image in the selected project
	tell project id projId to set datelist to every image version's (value of EXIF tag "ImageDate")
	
	tell library 1
		-- Create "Imported by Date" folder if it does not exist
		if not (exists folder "Imported by Date") then make new folder with properties {name:"Imported by Date"}
		
		-- Get name of every project in "Imported by Date" folder. 
		--This is time consuming and should not be included in the loop below.
		tell folder "Imported by Date" to set projList to name of every project
		
		repeat with aDate in my datelist
			-- Test each date to avoid processing duplicates
			if not (my dateTest contains short date string of aDate) then
				set end of my dateTest to short date string of aDate
				
				-- Convert the image date to YYYY-MM-DD format 
				set projectYear to year of aDate
				set projectMonth to (month of aDate as integer) as string
				if length of projectMonth is 1 then set projectMonth to "0" & projectMonth
				set projectDay to (day of aDate as integer) as string
				if length of projectDay is 1 then set projectDay to "0" & projectDay
				set isoImageDate to projectYear & delimiter & projectMonth & delimiter & projectDay as string
				
				tell folder "Imported by Date"
					--Create  the project if it does not exist
					if isoImageDate is not in my projList then make new project with properties {name:isoImageDate}
					
					-- Move the images into the project
					move (every image version of project id projId whose value of EXIF tag "CaptureYear" is year of aDate and value of EXIF tag "CaptureMonthOfYear" is month of aDate as integer and value of EXIF tag "CaptureDayOfMonth" is day of aDate) to project isoImageDate
				end tell
			end if
		end repeat
		-- Move the initial project to the Trash if no images remain
		if (count of image versions of project id projId) is 0 then delete project id projId
	end tell
end tell

DJ_Bazzie_Wazzie · May 26, 2012, 12:04am

a repeat with var in list doesn’t make a script much faster per se. Only when list is a reference.

A small example

set bigList to {}
--this only takes a second now
repeat with x from 1 to 12500
	set end of bigList to x
end repeat
set sum to 0
--this takes very long
repeat with theNumber in bigList
	set sum to sum + theNumber
end repeat

This took around 10 seconds in my machine (i7 quad core) because repeat with var in list is very slow this way. But what if I make it a reference?

set a to {}
set bigList to a reference to a
repeat with x from 1 to 12500
	set end of bigList to x
end repeat
set sum to 0
repeat with theNumber in bigList
	set sum to sum + theNumber
end repeat

Now it is more than 10 times faster. A small change with huge improvement!

adayzdone · May 26, 2012, 12:18am

Try this DJ:

set bigList to {}
--this only takes a second now
repeat with x from 1 to 12500
	set end of bigList to x
end repeat
set sum to 0
--this takes very long
repeat with theNumber in my bigList
	set sum to sum + theNumber
end repeat

I am referring to the list as a script property by including “my” in the repeat statement.

DJ_Bazzie_Wazzie · May 26, 2012, 12:59am

Thanks! I was still in the assumption that accessing script object properties is slower than references.

regulus6633 · May 26, 2012, 10:03am

Here’s a comparison of some different techniques to speed access to large lists. Each technique is in a “repeat 50 times” loop so we can get some time measurements in seconds. The times are from my MacBook Pro 2 GHz core I7 with 4 GB RAM.

Here’s the scripts… Base Script

set inTime to current date

repeat 50 times
	set a to {}
	repeat with x from 1 to 12500
		set end of a to x
	end repeat
	
	set sum to 0
	repeat with theNumber in a
		set sum to sum + theNumber
	end repeat
end repeat

set totalTime to (current date) - inTime
return {totalTime, sum}
--> {300, 78131250}

Using “a reference to”

set inTime to current date

repeat 50 times
	set a to {}
	set bigList to a reference to a
	repeat with x from 1 to 12500
		set end of bigList to x
	end repeat
	
	set sum to 0
	repeat with theNumber in bigList
		set sum to sum + theNumber
	end repeat
end repeat

set totalTime to (current date) - inTime
return {totalTime, sum}
--{7, 78131250}

Using “my” to make a script object

set inTime to current date

repeat 50 times
	set a to {}
	repeat with x from 1 to 12500
		set end of my a to x
	end repeat
	
	set sum to 0
	repeat with theNumber in my a
		set sum to sum + theNumber
	end repeat
end repeat

set totalTime to (current date) - inTime
return {totalTime, sum}
--{3, 78131250}

Using a distinct script object

set inTime to current date

repeat 50 times
	script s
		property bigList : {}
	end script
	set s's bigList to {} -- needed to make sure the list is emptied during each loop
	
	repeat with x from 1 to 12500
		set end of s's bigList to x
	end repeat
	
	set sum to 0
	repeat with theNumber in s's bigList
		set sum to sum + theNumber
	end repeat
end repeat

set totalTime to (current date) - inTime
return {totalTime, sum}
--{3, 78131250}

adayzdone · May 26, 2012, 12:31pm

Thanks for setting up the test Hank.

Example 2 and 3 are testing same thing because the script itself is a top level script object, right? The reason to create a declare a script object for this purpose would be if the variable existed inside a handler (it was local) and therefore you could not use “my”.

regulus6633 · May 26, 2012, 1:18pm

Actually I have never thought of using “my” to create a script object. I always used the “distinct” script object approach. That’s actually what interested me about this thread. I setup the tests to see if “my” did the same thing as the distinct script object. It seems both approaches work equally well. Your explanation sounds like a reasonable approach to why you may use one option versus the other. I now have another tool in my arsenal, so good thread.

Nigel_Garvey · May 26, 2012, 1:31pm

adayzdone:

I am tweaking an old script to speed thigns up and would like to verify that by reference a list as a script property, I would shave time off each one of the instances below.
repeat with aDate in my datelist

Interesting. Either things have changed recently or I missed this particular variation when testing for myself a few years ago. I didn’t think using a reference to the list variable made any difference in a ‘repeat with in ’ context. But the above does indeed seem to be slightly faster than:

repeat with i from 1 to (count dateList)
	set aDate to item i of my dateList

However, one important caveat is that the ‘repeat with in my ’ form can only be used in a getting context. Changing 's contents is a complete non-starter:

repeat with aDate in my dateList
	set aDate's contents to aDate + days
	--> error "Can't set item 1 of dateList of «script» to date \"Sunday 27 May 2012 09:22:28\"." number -10006 from item 1 of dateList
end repeat

With commands affecting the list itself (‘count’, ‘contains’, ‘is in’), using a reference to the list variable used to be a slight handicap rather than an advantage. Testing today in Snow Leopard, I find that ‘my dateTest contains .’ is slightly faster than plain ‘dateTest contains .’ only if the list is quite short or the item occurs near the beginning of it. Otherwise, the non-referenced form is faster.

An obvious minor efficiency in the code immediately above would be to extract each aDate’s ‘short date string’ once only:

set sdString to short date string of aDate
if not (my dateTest contains sdString) then
	set end of my dateTest to sdString

Or use a value that’s faster to obtain and faster to compare:

set dayNum to (adate - (date "Saturday 1 January 1583 00:00:00")) div days -- The date format should be adjusted, if necessary, to suit the compiling machine.
if not (my dateTest contains dayNum) then
	set end of my dateTest to dayNum

But most efficiencies in your script ” including referencing the list variables if the lists aren’t very long ” will be insignificant against the time it takes to communicate with the application.

adayzdone · May 26, 2012, 2:00pm

Great insights, Thanks Nigel.

DJ_Bazzie_Wazzie · May 26, 2012, 5:25pm

You’re not the only one :D. I’m also sure that these tests by Hank would have a different result in AppleScript 1.x. But anyway I’m glad this topic is born, otherwise I still wouldn’t knew it.

Marc_Anthony · May 26, 2012, 6:27pm

regulus6633:

Using “my” to make a script object…


	...repeat with x from 1 to 12500
		set end of my a to x
	end repeat
	
	set sum to 0
	repeat with theNumber in my a
		set sum to sum + theNumber
	end repeat...

Just FYI: The second loop’s my is the only one that’s responsible for the performance improvement. I tested this on my machine”a G4 running 10.4.11”and actually had a 2 second performance increase by removing the “my” from the first loop (“set end of my a”).

regulus6633 · May 26, 2012, 11:10pm

Good catch Marc. I tested this with the below 2 scripts. I get the same results though, not an improvement (granted my method of measuring the time isn’t the most accurate and I’m running 10.7). I guess the “set end of bigList…” is a very optimized command so there is no performance gain. We only see the gain when iterating the big list, not when filling it. Here’s the scripts I ran…

set inTime to current date

repeat 50 times
	set a to {}
	repeat with x from 1 to 12500
		set end of a to x
	end repeat
	
	set sum to 0
	repeat with theNumber in my a
		set sum to sum + theNumber
	end repeat
end repeat

set totalTime to (current date) - inTime
return {totalTime, sum}
--{3, 78131250}

set inTime to current date

repeat 50 times
	set a to {}
	repeat with x from 1 to 12500
		set end of a to x
	end repeat
	
	script s
		property bigList : {}
	end script
	set s's bigList to a
	
	set sum to 0
	repeat with theNumber in s's bigList
		set sum to sum + theNumber
	end repeat
	
	set s's bigList to {} -- needed to make sure the list is emptied during each loop
end repeat

set totalTime to (current date) - inTime
return {totalTime, sum}

adayzdone · May 27, 2012, 11:09pm

I did a speed test for several methods of accessing lists. Anyone interested can find the worksheet here:
http://goo.gl/QdCcK

DJ_Bazzie_Wazzie · May 28, 2012, 11:37am

Just for the curious ones:

Using a script object instead of my can be faster. When my gets bigger it seems that resolving the local variable of it takes more time. When I added 25 handlers and 10 properties into my it went from 10 ms to 25 ms on a clean run. In that case a script object (which is very small) seems to be faster. Clean run is compiling between each run because my gets cached and used the second run. From an cached my it only took 5ms. But those numbers are all relative, I’m running an ‘old’ MBP i7 and it should be faster on my newer MBP or my newest iMac. But I think that these differences percentage-wise will be the same on every machine.

So after all my conclusion is that with large script (objects) my is much more unreliable performance-wise than references or using an extra script object.

p.s. I subtracted 2 ms from my actual measured time because that’s the time it takes to execute the perl script.