Limitations on saving a list in a script object

Hi all,

I’ve got a long text file and have split its elements into a single list using applescript’s text item delimiters. This has been put inside a script object to allow quick searches. The process takes 9 seconds.

I would like to save the script object to load it again next time without having to repeat the 9 second process. The list’s length is 152,325 items and I’m getting a stack overflow error. Is this simply a technical limitation?

To test the script, download my long text file and put it on your desktop:
https://www.dropbox.com/s/s0c8ofufwl94od5/longtext.txt?dl=0


--Make a script object with a property
script TestScript
	property List1 : {}
end script

--Input text file
set InputFile to ((path to desktop) as text) & "LongText.txt"
set textdata to read file InputFile

--Split the text into a list inside the script object
set AppleScript's text item delimiters to {return, tab}
set TestScript's List1 to text items of textdata
set AppleScript's text item delimiters to ""

--Save the script object
set ListCount to count TestScript's List1 --This is for info only
set SavedFile to ((path to desktop) as text) & "LongText.scpt"
store script TestScript in file SavedFile replacing yes

Yes. And dealing with lists of that size in AppleScript is always going to be very slow.

This will create and save the list as an array in a property list file in about a tenth of a second:

You can read it back in like this:

set theValues to current application's NSArray's arrayWithContentsOfFile:(POSIX path of outputFile)

Keeping it as an array rather than as a list will make everything much faster.

Thanks Shane, that’s amazing.

My first mistake after loading the plist as an NSArray was to try turning it back into a list. Script Debugger crashed.

Realising I need to do the searching operations in AppleScript ObjC, I returned to chapter 5 of your book and found the “indexOfObject:” command. I hazarded a guess and discovered the corresponding “objectAtIndex:” command and that is all I need!

As you may have guessed, this task involves getting the corresponding EAN from a given item number. The EANs come right after each item number in the list, errr… array. For example:

set thisIndex to theValues’s indexOfObject:“978847”
set thisValue to theValues’s objectAtIndex:(thisIndex + 1)

This example comes from near the end of the text file. The search took 0.3 seconds and that includes the initial reading of the file into an NSArray – a lot faster than >9 seconds!

I just noticed that on some text files I always got an error (and received missing value) when the script tries to convert it into an NSString:


set theText to current application's NSString's stringWithContentsOfFile:(POSIX path of inputFile) encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)

I was able to get around it by removing the encoding and error handling parts, like this:


set theText to current application's NSString's stringWithContentsOfFile:(POSIX path of inputFile) --encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)

It seems to work but have I done something bad…?

Hello,

I came across a similar case when I had a list of strings larger than the maximum size. I then solved this problem by placing the text in a text file, and the file itself was loaded onto a RAM-disk created in advance. The speed of operations immediately increased then hundreds of times.

That is, I recommend that you do not put text in a script object variable, but read data directly from an electronic text file.

My text file was about 1 GB, but it worked nice.

Unwise might be a better word. If you’re getting an error, it probably means they’re not UTF-8 files – better to find out what they are, and use that. Otherwise you have a couple of options:

  • Try UTF-8, and fall-back to another encoding on error; or

  • Use stringWithContentsOfFile:usedEncoding:error: instead. This does the guessing for you.

Here is a pure AppleScript that saves a large list to a .dat file using subroutine “set_Prefs()”
It then reads them back in using subroutine “get_Prefs()”

I also used a script object to speed up the use of large lists, but I only save the list, not the script object


use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

script M
	property clist : {}
end script
repeat with i from 1 to 200000 -- generate a list with 200K strings
	set end of M's clist to "item " & i
end repeat
display alert (count of M's clist)
set_Prefs("testprefs.dat", M's clist)
activate me
display alert "List saved..."
set M's clist to {}
set M's clist to get_Prefs("testprefs.dat")
display alert (count of M's clist)

on get_Prefs(pFile as string)
	local cfile, show_list
	set cfile to (path to preferences as text) & pFile
	try
		set cfile to open for access cfile with write permission
	on error
		display alert "Uh-oh! Error opening file…" giving up after 10
		return false
	end try
	try
		set show_list to read cfile from 1 as list
	on error
		display alert "File Empty!" giving up after 10
		write {} to cfile as list
		set show_list to {}
	end try
	close access cfile
	return show_list
end get_Prefs

on set_Prefs(pFile as string, pList as list)
	local cfile
	set cfile to (path to preferences as text) & pFile
	try
		set cfile to open for access cfile with write permission
	on error
		display alert "Uh-oh! Error opening file…" giving up after 10
		return false
	end try
	try
		set eof cfile to 0
		write pList to cfile as list
	on error
		display alert "Error! Can't write to preference file…" giving up after 10
	end try
	close access cfile
	return true
end set_Prefs

it writes and reads the list to and from the prefs file pretty fast for AppleScript

I looked at the documentation and there are about 24 different NSStringEncodings I could fall back on. I couldn’t see how to use the stringWithContentsOfFile:usedEncoding:error: method – what do I use for the usedEncoding? How does it do the guessing for me…?

But I did find another way to do it – get AppleScript to read the file and then use stringWithString:


set readText to read (POSIX path of (inputFile as alias))
set theText to current application's NSString's stringWithString:readText

But I’d like to know how to use stringWithContentsOfFile:usedEncoding:error: if you could please explain that to me.

In practice it’s likely to be MacRoman or Latin1, depending on the source platform. The UTF ones are the only ones likely to error; most of the others are just simple straight translations,

The usedEncoding is an out value — you can pass reference if you want the value found returned. The exact heuristics it uses aren’t documented, but it’s basically what something like TextEdit uses. So:

set theText to current application's NSString's stringWithContentsOfFile:(POSIX path of inputFile) usedEncoding:(missing value) |error|:(missing value)