Embedded Script Objects for Speed in Processing

Adam_Bell · May 20, 2006, 5:51pm

I have seen it stated several times in several places that the manipulation of large lists is faster if the lists are properties of embedded script objects within the script than if they are simply part of the script. For example this simple ASCII sort should run faster with the internal script than it does without:

script BL
	property Big_List : { a list of 1500 words }
	property index_list : {}
	property sorted_list : {}
end script

set t1 to GetMilliSec
repeat (count of BL's Big_List) times
	set the low_item to ""
	repeat with i from 1 to (count of BL's Big_List)
		if i is not in the BL's index_list then
			set this_item to item i of BL's Big_List as text
			if the low_item is "" then
				set the low_item to this_item
				set the low_item_index to i
			else if this_item comes before the low_item then
				set the low_item to this_item
				set the low_item_index to i
			end if
		end if
	end repeat
	set the end of BL's sorted_list to the low_item
	set the end of BL's index_list to the low_item_index
end repeat
set t2 to GetMilliSec
t2 - t1

After testing this a bit with a 1500-word list, I haven’t found that to be the case for every run; and have found the results to be quite variable, in fact.

Am I doing something wrong, or is the original assertion incorrect?

Adam Bell

Nigel_Garvey · May 20, 2006, 11:04pm

Hi, Adam.

Access to the items or properties of a list is faster if you use a reference to identify the list rather than a simple variable. Assigning the list to a property of a script object allows you to use a reference expression like ‘BL’s Big_List’ rather than just, say, ‘Big_List’. The script object itself doesn’t make access faster, it simply allows a reference expression to be written into the script. This is particularly useful inside a handler, where the list variable may be local and thus unreferenceable. At the top level of a script, though, properties and variables that aren’t explicitly declared local belong to the script itself and so you can use the keyword ‘my’ to set up the reference:

set Big_List to { 1500-item list } -- A list assigned to a top-level variable.

repeat with i from 1 to 1500
	item i of my Big_List -- 'my Big_List' is a reference.
end repeat

The speed advantage with a reference only comes when accessing things belonging to the list. Commands acting on the list itself ” such as ‘count’, ‘contains’, ‘is in’, etc. ” are often a trifle slower if the list is referenced.

In your script, there are other things too that can be done to improve the speed ” principally, counting Big_List only once and assigning the result to a variable before the repeats. (Otherwise it’ll be counted 1501 times, always with the same result!)

In extreme cases, you can sometimes squeeze out an extra fraction of a second or so by doing positive tests instead of negative ones.

-- Negative test:
if (i is not in the BL's index_list) then
	-- Blah blah blah
end if

-- Positive test:
if (i is in the BL's index_list) then
else
	-- Blah blah blah
end if

I would think you could leave out the ‘as text’ coercion too.

If you know that all your text is the same case, or that the characters involved only have one case, putting the whole thing in a ‘considering case’ block can also give quite a boost. It saves the internal comparison routines having to check and allow for case and casedness.

Your central 'if . else if ’ block is a bit perplexing as the action is the same in both parts. As the script works correctly, I supposed you’re looking for an ‘or’ condition. It shouldn’t make any difference to the speed, but it’ll make the script a little shorter.

The above is all quoted from past experience. I haven’t tested it today to see what actual improvement it makes to your script’s speed!

set Big_List to { a list of 1500 words }
set index_list to {}
set sorted_list to {}

set t1 to GetMilliSec
set BL_count to (count Big_List) -- Count Big_List just once, before the repeats.
repeat BL_count times
	set the low_item to ""
	repeat with i from 1 to BL_count
		-- Positive test, and no reference when accessing the list itself.
		if (i is in the index_list) then 
		else
			 -- A reference when accessing an item in the list.
			set this_item to item i of my Big_List
			if (the low_item is "") or (this_item comes before the low_item) then
				set the low_item to this_item
				set the low_item_index to i
			end if
		end if
	end repeat
	-- References while setting the lists' ends.
	set the end of my sorted_list to the low_item
	set the end of my index_list to the low_item_index
end repeat
set t2 to GetMilliSec
t2 - t1

Bruce_Phillips · May 20, 2006, 11:26pm

This would work, right?

set Big_List to {"a list of 1500 words"}

count Big_List
repeat with i from 1 to result
	-- whatever
end

Adam_Bell · May 20, 2006, 11:38pm

Nigel;

Thanks for the reply and cleaned up code. I note that it runs at exactly the same speed with or without the internal script object and refererences to it. I wasn’t trying to be elegant with the sort, btw - I was primarily intererested in whether the script object made a difference, and it is still my observation that it doesn’t - the timing with “bl’s” in place is the same as with the “my” you used.

In my dual-core G5/2.3, the unreferenced forms take 39 seconds and the referenced forms takes 26 (a ratio of 1.444). Adding a ‘my’ here and there is obviously the way to go in problems like this one.

AB

Nigel_Garvey · May 21, 2006, 1:43am

Bruce Phillips:

Nigel Garvey:

counting Big_List only once and assigning the result to a variable before the repeats. (Otherwise it’ll be counted 1501 times, always with the same result!)

This would work, right?
set Big_List to {"a list of 1500 words"}

count Big_List
repeat with i from 1 to result
	-- whatever
end

Hi, Bruce.

Yeah. That’s fine if there’s just one repeat block. The result goes straight into the target for the repeat. Adam’s script, though, has an inner repeat that’s set up and cycled through in each of the 1500 iterations of the outer repeat. The value of ‘result’ is different every time the inner repeat’s set up, so a variable’s the only way to preserve the target value for that ” apart from recounting the list every time as in Adam’s original.

Bruce_Phillips · May 23, 2006, 12:41am

Would something like this get ˜entire contents’ every time?

set auto_backup to "path:to:auto backup:" (* modify as required *)
tell application "Finder" to repeat with f in (get files of entire contents ¬
	of folder auto_backup whose name does not start with "bu_")
	tell f to set name to "bu_" & name
end repeat

Adam_Bell · May 23, 2006, 2:03pm

Did you mean this response to go here, Bruce, or in this thread: http://bbs.applescript.net/viewtopic.php?id=17272?

Bruce_Phillips · May 23, 2006, 2:45pm

Here.

kai · June 1, 2006, 1:04am

Bruce Phillips:

Would something like this get ˜entire contents’ every time?

set auto_backup to "path:to:auto backup:" (* modify as required *)
tell application "Finder" to repeat with f in (get files of entire contents ¬
	of folder auto_backup whose name does not start with "bu_")
	tell f to set name to "bu_" & name
end repeat

No, that wouldn’t be very efficient, Bruce. The ‘get’ evaluates the target and creates a list of Finder file references - much the same as assigning a value to a variable using the ‘set’ command. (BTW, that code sure looks familiar… ;))

Incidentally, if anyone still has any lingering doubts about the efficacy of using references to access lists, perhaps I can just add my 2¢ to the discussion.

If you have the latest super-duper whizzy machine, don’t wish to distribute scripts or advice for more general use, process only relatively short lists, aren’t dealing with repeat-intensive situations and/or haven’t otherwise optimised your script for performance - then don’t worry about it. Otherwise, referencing might (among other optimisation techniques) be worth considering.

The following demo script, which perhaps illustrates the point better than anything I might say, could take some time (up to a few minutes) to execute on some machines.

Requires GetMilliSec:

property rtf1 : "{\\rtf1\\mac\\ansicpg10000\\cocoartf824\\cocoasubrtf330
\\readonlydoc1{\\fonttbl\\f0\\fswiss\\fcharset77 Helvetica;\\f1\\fswiss\\fcharset77 Helvetica-Bold;\\f2\\fswiss\\fcharset77 Helvetica-Oblique;
}
{\\colortbl;\\red255\\green255\\blue255;\\red0\\green51\\blue204;\\red204\\green0\\blue0;\\red204\\green204\\blue204;
}
\\paperw11900\\paperh16840\\margl1440\\margr1440\\vieww12740\\viewh"

property rtf2 : "\\viewkind0
\\pard\\tx280\\tqr\\tx2260\\tqr\\tx4540\\tqr\\tx6240\\tqr\\tx7940\\tqr\\tx9640\\tqr\\tx12480\\ql\\qnatural\\pardirnatural

\\f0\\fs24 \\cf0 \\up4 \\
\\pard\\tqc\\tx1280\\tqc\\tx5220\\tqc\\tx8600\\tqr\\tx12480\\ql\\qnatural\\pardirnatural
\\cf0 \\up0 \\super 	
\\f1\\b List Length	\\cf2 Direct Access\\cf0 	\\cf3 Script Object\\cf0 	Speed Comparison
\\f0\\b0 \\
\\pard\\tx280\\tqr\\tx2260\\tqr\\tx4540\\tqr\\tx6240\\tqr\\tx7940\\tqr\\tx9640\\tqr\\tx12480\\ql\\qnatural\\pardirnatural

\\f2\\i \\cf0 \\dn4 \\ul \\ulc0 	items	growth	\\cf2 \\ulc2 ms	growth\\cf0 \\ulc0 	\\cf3 \\ulc3 ms	growth\\cf0 \\ulc0 	factor
\\f0\\i0 \\
"
property base0 : missing value
property base1 : missing value
property base2 : missing value

on access_times from current_list for list_count against curr_test
	set m1 to GetMilliSec
	repeat with i from 1 to list_count
		set current_list's item i to current_list's item i
	end repeat
	set m2 to GetMilliSec
	script o
		property l : current_list
	end script
	repeat with i from 1 to list_count
		set o's l's item i to o's l's item i
	end repeat
	set m3 to GetMilliSec
	set t1 to m2 - m1
	set t2 to m3 - m2
	if t1 is 0 or t2 is 0 then return "\\f2\\i \\cf4 results too small for comparison" & return & "\\f0\\i0 \\cf0 \\'ca	\\'ca	\\'ca	\\'ca	\\'ca	\\"
	if base0 is missing value then
		set base0 to curr_test
		set base1 to t1
		set base2 to t2
	end if
	list_count & tab & (2 ^ (curr_test - base0) div 1) & tab & "\\cf2 \\ulc2 " & t1 div 1 & tab & (round (t1 / base1)) & "\\cf0 \\ulc0 " & tab & "\\cf3 \\ulc3 " & t2 div 1 & tab & (round (t2 / base2)) & "\\cf0 \\ulc0 " & tab & "x " & (round (t1 / t2)) & "\\"
end access_times

to compare_access_times at result_file from list_count for test_count
	set current_list to {"test"}
	repeat while (count current_list) < list_count
		set current_list to current_list & current_list
	end repeat
	set current_list to current_list's items 1 thru list_count
	set file_ref to open for access result_file with write permission
	try
		set eof file_ref to 0
		write rtf1 & 2500 - 1.2 mod (application "TextEdit"'s version) div 1.2 * 1120 + 360 * test_count & rtf2 to file_ref
		access_times from current_list for list_count against 1 (* a first run can be less reliable - so don't record initial results *)
		set base0 to missing value
		repeat with curr_test from 1 to test_count
			write return & tab & (access_times from current_list for list_count against curr_test) to file_ref
			set list_count to list_count * 2
			set current_list to current_list & current_list
		end repeat
		set eof file_ref to (get eof file_ref) - 1
		write "}" to file_ref
	end try
	close access file_ref
end compare_access_times

launch application "TextEdit"
set result_file to ((path to desktop as Unicode text) & "List Access Times.rtf") as file specification

compare_access_times at result_file from 250 for 6
(* 'from' [initial number of listed items] 'for' [number of rows] (in which list length is doubled each time) *)

tell application "TextEdit"
	activate
	open result_file
end tell

Edit notes (see discussion below):
¢ view height calculation modified to take account of TextEdit’s behaviour in version 1.2 (Mac OS X Jaguar)
¢ as integer expressions replaced by div or round - also to allow testing in Jaguar

Bruce_Phillips · June 1, 2006, 1:08am

Just wanted to make sure.

Nigel_Garvey · June 1, 2006, 12:39pm

That’s a lovely bit of RTF work, kai.

kai · June 10, 2006, 2:50am

Ooh - thanks, Nigel. (Missed that most generous compliment earlier.)

I’m sure the little feature that changes the window height didn’t escape your attention - but, just in case anyone missed it, I’ll mention it anyway. The following statement in the script should produce a list in TextEdit with six rows of data - which the window height should accommodate:

compare_access_times at result_file from 250 for 6

If the ‘for’ parameter is then amended to, say, 4 - the window height should be automatically adjusted to fit the resulting four rows of data:

compare_access_times at result_file from 250 for 4

(The ‘at’ parameter may also be adjusted. Just don’t set the value of either parameter too high - unless you have a very fast machine and/or a great deal of patience! ;))

Adam_Bell · June 10, 2006, 4:04pm

kai:

I’m sure the little feature that changes the window height didn’t escape your attention - but, just in case anyone missed it, I’ll mention it anyway. The following statement in the script should produce a list in TextEdit with six rows of data - which the window height should accommodate:
compare_access_times at result_file from 250 for 6
If the ‘for’ parameter is then amended to, say, 4 - the window height should be automatically adjusted to fit the resulting four rows of data:
compare_access_times at result_file from 250 for 4
(The ‘at’ parameter may also be adjusted. Just don’t set the value of either parameter too high - unless you have a very fast machine and/or a great deal of patience! ;))

I not only missed it, but I’m still missing it, Kai. Too fiendishly clever for this plodder.

Nigel_Garvey · June 11, 2006, 9:50am

Hi, kai.

As a matter of historical interest, TextEdit 1.2 (which came with Jaguar) doesn’t hide its ruler and tool bar when displaying the document produced by your script, so the window isn’t quite big enough to display all the results without scrolling. The ‘viewh’ setting appears to apply to the whole window rather than just the viewing area. Changing the base figure in ‘compare_access_times’ from 1380 to 2500 produces a satisfactory result on my Jaguar machine. (Any Jaguar users thinking of trying this will also need to change every instance of ‘(expression as integer)’ in the last line of ‘access_times’ to ‘(round (expression))’.)

kai · June 11, 2006, 5:41pm

Thanks for the heads-up on that, Nigel.

I’m not sure where that leaves us in terms of TextEdit version 1.3 (presumably released with Panther) but, for the moment, I’ll assume that (like version 1.4 in Tiger) it opens read-only documents without showing the ruler/tool bar. (If any obliging Panther user knows different, perhaps they could chip in.)

I’ve modified the above script to take account of this information. (No doubt you’ll let us know if the adjustment doesn’t work in Jaguar. ;))

To complement the change (and thus to cover your other point), the real-to-integer coercions (introduced in AppleScript 1.9.2/Mac OS X 10.3) have been replaced by div or round functions as appropriate. I’m sure Jaguar users will appreciate your continued efforts to represent their interests.

kai · June 11, 2006, 5:46pm

You mean the mechanics of it, Adam?

As you probably know already, an RTF header holds various formatting data, including stuff like font and color tables. Apple has also extended Microsoft’s standard RTF specification with custom commands - to support formatting constructs available in the Cocoa text system. These include document attribute extensions to specify (amongst other things) the size of a document’s display area (measured in twips).

So I basically split the RTF header into two parts (represented by the properties rtf1 and rtf2) - just at the point where the value for the document’s view height would normally appear. (Note that the property rtf1 ends with the RTF extension label \viewh.)

As mentioned earlier, the number of rows in the document is set near the beginning of the (implicit) run handler - by amending the compare_access_times handler’s for parameter. This value, when introduced to the subroutine, is represented by the variable test_count.

While the width of the final RTF document is fixed, I calculated that the view height would need to be 69 points (1380 twips) - to accommodate the table headers - with a further 18 points (360 twips) for each row. So the document’s overall view height can be derived by multiplying the row height (360) by the number of rows (test_count) - and adding the product to the initial height (1380).

This calculation is performed in the second line of the subroutine’s try statement:

write rtf1 & 1380 + 360 * test_count & rtf2 to file_ref

As you’ll see, the view height calculation is placed between the properties rtf1 and rtf2. So the expression evaluates the math calculation, coerces it to text (implicitly, through concatenation with the value of rtf1), concatenates it with rtf2 - and then writes the result to file.

That describes the statement in the original script - which has just been modified. So I suppose I should add this:

To take account of Nigel’s info about TextEdit version 1.2, the calculation referred to above has now been changed to:
write rtf1 & 2500 - 1.2 mod (application "TextEdit"'s version) div 1.2 * 1120 + 360 * test_count & rtf2 to file_ref
This replaces the initial height from a fixed value of 1380 to a calculated one:
2500 - 1.2 mod (application "TextEdit"'s version) div 1.2 * 1120
The expressions should evaluate like this:

expression version â‰¤ 1.2 version > 1.2

application “TextEdit”'s version “1.2” “1.4”
1.2 mod result 0.0 1.2
result div 1.2 0 1
result * 1120 0 1120
2500 - result 2500 1380

So to sum up briefly, the document’s view height is calculated at runtime and then inserted, at the appropriate position, in the RTF header. Programmatically (considering the mix of AppleScript and RTF code), perhaps somewhat abstruse - though conceptually, quite simple.

Adam_Bell · June 11, 2006, 6:35pm

As you point out, “conceptually quite simple”; but mighty sophisticated and very elegant just the same. Thanks for the explanation, Kai. It must have been a major project to discover and assemble all of that.

Nigel_Garvey · June 11, 2006, 9:26pm

That’s a lovely bit of BBCode work, kai.

kai · June 11, 2006, 10:37pm