Well, I made slow progress today mainly because of a couple surprises which took me straight back to the overhead issue related to performance mentioned above.
Using the AS Editor for sanity while working this out - First I realized that working on my sample file was not a good idea, when checking the live files they ranged in size from the 1500 record (acutally lines ) that I was basing my project on, to the maximum file size with just over 44,000 lines in it. Also, because it was a unix file with linefeeds, and also had thousands of extra spaces in the records, making space delimited importing almost impossible, so I had previously read the entire file to a variable, then used a repeat to clean up the records using “words of” which worked well.
So the repeat code now looked like this:
repeat with i from 1 to count of fileContents
set oneRecord to words of text item i of fileContents
set end of listOfrecords to {col1:text item 1 of oneRecord, col2:text item 2 of oneRecord, col3:text item 3 of oneRecord, col4:text item 4 of oneRecord, col5:text item 5 of oneRecord, col6:text item 6 of oneRecord, col7:text item 7 of oneRecord, col8:text item 8 of oneRecord, col9:text item 9 of oneRecord, col10:text item 10 of oneRecord, col11:text item 11 of oneRecord, col12:text item 12 of oneRecord, col13:text item 13 of oneRecord, col14:text item 14 of oneRecord, col15:text item 15 of oneRecord, col16:text item 16 of oneRecord, col17:text item 17 of oneRecord}
end repeat
This worked extremely well on the 1500 record files, processing them in about 12 seconds on a 2.4 c2d iMac, and even up to 10,000 records it was taking around 2 minutes…but once I started using the 44,000 list file, I soon realized that the project could not go ahead as it was taking more than 30 minutes (the time limit I force quit the script) to process that huge list.
Just as I was about to give up on it, tonight I found this after many google searches, regarding referencing of lists in AS. (Halfway down the page) - search the page using “bigList” http://developer.apple.com/mac/library/documentation/AppleScript/Conceptual/AppleScriptLangGuide/reference/ASLR_classes.html
This changed everything! First I have posted a sample of the file here with data converted to animal names: http://www.wikiupload.com/download_page.php?id=192398 (if there was a way to host it on this site please let me know as I don’t know how long the sample will be available from a file host site and I think this is a great example for others in the same situation)
The above file takes me 33minutes 32 seconds to read using the above code (2012 secs)
Now, after a straight read of the file into variable fileContents, I set up the references and the repeat code looks like this, a variant of Shane’s great advice above.
set listOfrecordsref to a reference to listOfrecords
set fileContentsref to a reference to fileContents
repeat with i from 1 to count of fileContents
set oneRecord to words of text item i of fileContentsref
copy {col1:text item 1 of oneRecord, col2:text item 2 of oneRecord, col3:text item 3 of oneRecord, col4:text item 4 of oneRecord, col5:text item 5 of oneRecord, col6:text item 6 of oneRecord, col7:text item 7 of oneRecord, col8:text item 8 of oneRecord, col9:text item 9 of oneRecord, col10:text item 10 of oneRecord, col11:text item 11 of oneRecord, col12:text item 12 of oneRecord, col13:text item 13 of oneRecord, col14:text item 14 of oneRecord, col15:text item 15 of oneRecord, col16:text item 16 of oneRecord, col17:text item 17 of oneRecord} to the end of listOfrecordsref
end repeat
This processes the same file in THREE seconds (3) on my system. To check that I actually had the records processed (because I didn’t believe it for a number of retries!.. I simply performed a set operation to another variable in the editor at the end of the script and it showed as a result(and took a very very long time!. Logging does the same thing and much faster but you won’t see the quote characters that are in the records. This threw me off for a while until I figured it out.
Note that I also returned to the “copy to the end of the listOfrecords” method as well as using the “set x to a reference of the list” as shown in the developer page examples. Doing this to the oneRecord, due to its small size, had no impact on performance, but using referencing on the fileContents AND listOfrecords, both of which grew exponentially through the repeat’s iterations, reduced processing time by a factor of 400! That is absolutely remarkable.
So, now that I’m back in business I will get this ported into the ASOC project and continue tomorrow to hopefully wire everything into the TableView. Hope this helps others. And of course please feel free to add your advice to make the code better or point out any issues.
EDIT: the code optimization above regarding references only worked in the AS editor and not in XCode with the exception of the listOfrecordsref code in the repeat. See my post below.