Unable to use 'reference to operator' to speed up data parsing

When I attempt to insert a ‘reference to operator’ in a loop within XCode AppleScript, I get the error message

According to the AppleScript Language Guide, it is more efficient to use the a reference to operator when inserting a large number of items into a list, rather than to access the list directly. For example, using direct access, the following script takes about 10 seconds to create a list of 10,000 integers:

set bigList to {}
set numItems to 10000
set t to (time of (current date)) --Start timing operations
repeat with n from 1 to numItems
    copy n to the end of bigList
    -- DON'T DO THE FOLLOWING--it's even slower!
    -- set bigList to bigList & n
end
set total to (time of (current date)) - t --End timing

But the following script, which uses the a reference to operator, creates a list of 100,000 integers (ten times the size) in just a couple of seconds:

set bigList to {}
set bigListRef to a reference to bigList
set numItems to 100000
set t to (time of (current date)) --Start timing operations
repeat with n from 1 to numItems
    copy n to the end of bigListRef
end
set total to (time of (current date)) - t --End timing

This is all wonderful, except that I can not get the second example to run inside an XCode AppleScript, though it runs as stated in the AppleScript Editor version. The XCode version generates the the error message

. Why is this? Is there a technique to create a large list quickly within an XCode AppleScript?

I always use the term my as a referencing identifier; works great in AS Studio:

set bigList to {}
set numItems to 100000
set t to (time of (current date)) --Start timing operations
repeat with n from 1 to numItems
	copy n to the end of my bigList --Use my as the referencing term here
end repeat
set total to (time of (current date)) - t --End timing

Thanks for the suggestion Craig, but I can’t get your version to work either. I get that same error message: ‘Can’t make insertion point of bigList into type reference (-1700)’. Could it have to do with the fact that I am using OSX 10.4 ?

(I am not a user of AS Studio, ignore me as appropriate.)

Is your myList a local variable? I get the same -1700 error in plain AppleScript (Script Editor) when I put your sample code in a handler and declare local bigList at the top. The error happens whether I use the original a reference to technique or the my technique. If you do not want to introduce a global or property in your code you should be able to use a script object to house the reference:

to test()
	local bigList, bigListRef, numItems, t, n, total -- bigList refers to this handler's bigList (no longer present), not the bigList property in the following script object
	script s
		property bigList : {}
	end script
	set bigListRef to a reference to s's bigList -- or omit this and use "copy n to the end of s's bigList" later
	set numItems to 40000
	set t to (time of (current date)) --Start timing operations
	repeat with n from 1 to numItems
		copy n to the end of bigListRef
	end repeat
	set total to (time of (current date)) - t --End timing
end test
on run
	try
		test()
	on error m number n
		return {n, m}
	end try
end run

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari Version 3.2.1 (4525.27.1)
Operating System: Mac OS X (10.4)

Edit History: Small formatting change and add comment about local bigList vs. s’s bigList property.

I don’t think the OS X version is the issue; I did a ton of AS Studio using Tiger. Make sure you are using the proper variable names; I edited your script before I posted it, perhaps you did not notice that? Here is another version, using set end of instead of copy to:

set bigList to {}
set numItems to 100000
set t to (time of (current date)) --Start timing operations
repeat with n from 1 to numItems
	set the end of my bigList to n
end repeat
set total to (time of (current date)) - t --End timing

Honestly, this is precisely the code pattern I have used in all my AS Studio projects, so I cannot imagine why you are having difficulty, unless it is a global vs. local variable issue. Remember that in AS Studio, everything happens inside of a handler, so that if a variable is going to be referenced in multiple handlers, it must be declared a global variable in the beginning of your code. If I need a big list of data to be initialized, I would do this:


global honkingList
my MakeHonkingList()

to MakeHonkingList()
--code to create the list
end MakeHonkingList

Hope this helps,

Just to try to clarify the rôle of ‘a reference to’ in all this: it’s simply a way of getting a reference (as opposed to the referenced object) into a variable.

set bigList to {}
set bigListRef to a reference to bigList
--> bigList of «script»

script s
	property bigList : {}
end script

set bigListRef to a reference to s's bigList
--> bigList of «script s»

In the first result above, «script» is the script in which the code appears. This can be referred to in the code as ‘me’, so the reference ‘bigList of «script»’ is the same as writing ‘bigList of me’ or ‘my bigList’ directly into the script code.

In the second result, the stored reference is more obviously the same as the reference in the script code that was used to set it.

In practice, although accessing the items or the end of ‘bigListRef’ is very much faster than simply accessing the items or the end of ‘bigList’, it’s actually slightly faster still to have the reference in the script code itself rather than in a variable ” ie. ‘my bigList’ or ‘s’s bigList’, as appropriate.

Thanks for all the posts. I see that it is impossible to refer to a local variable inside a subroutine. Can anyone explain why? I believe the reason for using a reference was to access the large data set without of copying it. Apparently a loop construct copies the data set outside the loop to the same variable used inside the loop–so that at each pass, the entire data set is re-copied. The pointer allows the loop variable to point to the same address where the data is first held outside the loop. I don’t see why this would require a globally defined reference, however. What is the feature of the global declaration inside a subroutine that allows it to work?

After trying different schemes with and without the use of references, I found great differences in performance. Below is the actual subroutine I use to output 7000 records of my tab-separated database. Without using references, the following subroutine takes 170 seconds to process the data:


subParseData()
on subParseData()
	set bigList to {}
	set spellRuleFilePath to POSIX path of (((path to scripts folder) as text) & "SpellAwareAuxilaryScripts") as text
	set x to do shell script "cat " & spellRuleFilePath & "/SpellDictWordBitsCopy.txt"
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to tab 
	set myparagraphs to every paragraph of x
	set paraCount to count of myparagraphs
	set time1 to (time of (current date))
	repeat with eachparagraph in myparagraphs
		set sublist to every text item of eachparagraph as list
		copy sublist to end of bigList
	end repeat
	set time2 to (time of (current date))
	set AppleScript's text item delimiters to tid
	return (time2 - time1)
end subParseData

When running the subroutine after creating a reference to one of the loop variables, ‘bigList’, (and declaring it global) the process takes only 10 seconds:


subParseData()
on subParseData()

	global bigList (* declare global for referent *)

	set bigList to {}
	set spellRuleFilePath to POSIX path of (((path to scripts folder) as text) & "SpellAwareAuxilaryScripts") as text
	set x to do shell script "cat " & spellRuleFilePath & "/SpellDictWordBitsCopy.txt"
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to tab 
	set bigListRef to a reference to bigList (* assign reference here *)

	set myparagraphs to every paragraph of x
	set paraCount to count of myparagraphs
	set time1 to (time of (current date))
	repeat with eachparagraph in myparagraphs
		set sublist to every text item of eachparagraph as list

		copy sublist to end of bigListRef (* use reference within loop here *)

	end repeat
	set time2 to (time of (current date))
	set AppleScript's text item delimiters to tid
	return (time2 - time1)
end subParseData

Best of all, if I provide both loop variables with references, the process takes only 1 second!


subParseData()
on subParseData()

	global bigList (* declare global for referents here *)
	global myparagraphs

	set bigList to {}
	set spellRuleFilePath to POSIX path of (((path to scripts folder) as text) & "SpellAwareAuxilaryScripts") as text
	set x to do shell script "cat " & spellRuleFilePath & "/SpellDictWordBitsCopy.txt"
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to tab
	set myparagraphs to every paragraph of x
	set paraCount to count of myparagraphs

        set bigListRef to a reference to bigList  (* assign reference #1 here *)
	set myparagraphsRef to a reference to myparagraphs  (* assign reference #2 here *)

	set time1 to (time of (current date))

	repeat with eachparagraph in myparagraphsRef (* use reference #1 within loop here *)
		set sublist to every text item of eachparagraph as list 

		copy sublist to end of bigListRef  (* use reference #2 within loop here *)

	end repeat
	set time2 to (time of (current date))
	set AppleScript's text item delimiters to tid
	return (time2 - time1)
end subParseData

I believe this pretty well demonstrates the power of pointers in AppleScripts when translating large text data to lists. :slight_smile:

[All examples are plain AppleScript (Script Editor).]

One rationale is that AppleScript references are a way to make a ‘delayed evaluation reference’ to value(s) in other script objects or applications. The local variables in your handlers are inaccessible to other, independent script objects and applications, so they are also inaccessible via an AppleScript references. What looks like a reference to a local variable is actually a reference to a global variable with the same name (such a situation is probably an error unless a shadowing local has been declared).

property a : "a"
on run
	local a, b -- local a shadows global a
	set a to "A"
	set b to a reference to a
	b & "" --> "a", the global value
end run

The best description I have found of the behind the scenes details of using this technique is “According to Chris Nebel, this referential approach sidesteps certain time-consuming safety checks that are built into list accesses.” from Nigel Garvey. I have not bothered to dig up the referenced communication from Chris Nebel though.

Try very hard to forget the idea that AppleScript references are even remotely similar to pointers (or even references) in other languages (C, Java). In my experience they “taste” more like a limited form of quoted expressions in Lisp (the target is not evaluated when the reference is created, but it can be later if necessary).

Here is a small annotated example to try to demonsrate what I mean by quoting:

tell application "TextEdit"
	-- AppleEvent information is from the Event Log window/tab
	log "no reference"
	set a to paragraphs of first document --> AppleEvent: get every paragraph of document 1
	count of a --> no AppleEvent, we are counting the elements of an AppleScript list
	log "reference"
	set a to a reference to paragraphs of first document --> no AppleEvent, the target of the reference is quoted here and only evaluated later, if necessary.
	count of a --> AppleEvent: count every paragraph of document 1
end tell

Thank you Chris for the light you shine on the nature of AppleScript ‘references’. You show that a reference is not a pointer but a device that suspends the compilation (‘delayed evaluation reference’ (?)) of the referent item until it is called in the run-time script (an apple event). Moreover, the speed benefit of ‘references’ occurs because it side-steps(due to its ‘delayed evaluation reference’) conventional double-checking processes of list assignments. Finally, you suggest that ‘references’ need to refer to global variables because they “are a way to make a ‘delayed evaluation reference’ to value(s) in other script objects or applications”. Apparently other scripts or applications can only operate on global variables and such things that refer to these (for reasons which I take on faith are good). Perhaps this topic is thorny enough that I would be better off thinking of ‘references’ in a pragmatic way: it’s that which speeds up list compilation, provided certain details are observed: 1) global declaration of the referent variable or as a script object property. 2) that the ‘referent’ syntax is used (e.g. ‘my’ or ‘of’).

As a side note, I found that the ‘shadow’ variable ‘a’ does not have to be declared ‘local’ for the reference to work without invoking an error message. The following works fine:


property a : "a"
on run
	local b
 (* remove local declaration of a *)
	set a to "A"
	set b to a reference to a
	b & "" --> "a", the global value
end run

Therefore it seems to me you are showing how a reference always refer to the global or property variable and can never refer to a local variable–though I never suspected it would, understanding it’s proper use. Perhaps I missing the point of this demonstration.

I think you got the point. It was just that references can only refer to global variables, even if there is a local variable with the same name. Such a shadowing local variable is not required, but it could cause confusion if present (incorrect thoughts to avoid while reading code: “Hmm, there is a local variable named ˜a’ in this handler and it also uses ˜a reference to a’, so that is a reference to the local variable.”).

OK–I see your point. Thanks. :slight_smile:

One more thing about using references as a trick way to quickly populate a ASStudio table. I found that in order to follow the Model-View-Control (MVC) architecture (see http://www.mactech.com/articles/mactech/Vol.18/18.07/July02AppleScriptandCocoa//) for ASStudio, it’s necessary to encapsulate my table populating scripts within an script object so I could load and call it within another ASStudio file script whenever a control is activated–say a ‘load data file button’. In this case, you must declare your referent variables inside yet another script object (as shown in Nigel Garvey’s examples, above)–and not declare such variables as global variables (as in my previous examples).

For example the following script can be loaded and run from a remote AppleScript:

script testScript
	
	script n
		property dataRecordsFromLongFile : ""
		property thisItem : {}
		property listToPopulateView : {}
	end script
	
	on thisScript()
		--global dataRecordsFromLongFile
                --global thisItem
                --global listToPopulateView

		set dataRecordsFromLongFileRef to a reference to n's dataRecordsFromLongFile
		set thisItemRef to a reference to n's thisItem
		set listToPopulateViewRef to a reference to n's listToPopulateView
		set dataRecordsFromLongFile to "one	two	three" 
		set text item delimiters to tab

               (* note that the data list must be assigned to external property for rapid reference to work *)
		set n's dataRecordsFromLongFile to every text item of dataRecordsFromLongFile

		set text item delimiters to ""
		repeat with thisItemRef in dataRecordsFromLongFileRef
			set end of listToPopulateViewRef to thisItemRef as list
		end repeat
		return contents of listToPopulateViewRef
	end thisScript

end script

Note that I do not use global variable declarations for ‘thisItem’ and ‘listToPopulateView’, and ‘dataRecordsFromLongFile’ but place these within their own script as properties and set up references to these external properties. Also note that I must assign my data text file to one of these external properties: ‘set n’s dataRecordsFromLongFile to every text item of dataRecordsFromLongFile’.

The following will serve as the remote script (you’ll have to save the above script somewhere and supply the path to ‘myScriptFilePath’):


load script myScriptFilePath --path to the script above (POSIX path if in ASStudio)
set loadedTestScript to testScript of result
run thisScript() of loadedTestScript

I hope this is helpful.