How can I speed up Applescript when doing a repeat loop to check if an item is in a list. The list has about 25000 items in it. As Applescript goes through the list it gets slower and slower.
Why ? How can I improve on this?
Here’s some of the code:
repeat with theFile in theCaptureLocationFileList
---at this point theFile is a string
---at this point theCaptureLocationFileList is a list of strings
---at this point theVerifiedList_in_Capture_LOG is a list of strings (about 25 000 items to process)
if ABort_Copy is true then
exit repeat
end if
if theFile is in theVerifiedList_in_Capture_LOG then
my Info_Display("Verified: " & (theFile as string))
else --not verified
set theFile to theFile as alias
set FileinCaptureLog to GetFileinCaptureLog(theFile)
end if
end repeat
Without knowing more of what you’re doing, I can’t be specific, but huge increases in speed are possible using a script object.
to DoSomething(args)
script S
property A : missing value
property B : missing value
end script
set S's A to args
set S's B to paragraphs of S's A
-- do stuff
return something
end DoSomething
Well here’s whats goin on. I am building an app that copies files to a network location. It then generates a md5 checksum of both the source and destination files and saves it into a log. Basically the purpose is too make sure the files copied correctly and if not to do it again or alert the user. It should aid the workflow in such a way as to eliminate too much user interaction.
I built the log system with sqlite.
So here’s the thing. I don’t want to generate md5’s for files already in the log unless there was a copy or md5 error.
So I get a list from sqlite with the paths of each file in the log with correctly generated md5 values. Then I get a new path list for newly added files to the source folder. Next I’m comparing the file list with the log list (repeat loop) to see if the file path is in the log list. This is the process that takes really long to complete.
This system will generally not have more than about 25000 files “ONLINE” so I need to speed through the “Verified” files and get to the job of copying and generating md5’s for the newly added files. Currently this process is taking over 30 minutes and is WAY TOO SLOW.
Then to speed up your loop, you use a script object for its variables.
The idea with a script object (V in the sample) in simplest terms is that all of its data is read into memory.
-- way up at the beginning of your script:
script V
property VerCapInLog : missing value
property CapLocList : missing value
end script
-- read the data in, then
set V's VerCapInLog to theVerifiedList_in_Capture_LOG
set V's CapLocList to theCaptureLocationFileList
-- set things up as necessary, then
repeat with theFile in V's CapLocList
---at this point theFile is a string
---at this point theCaptureLocationFileList is a list of strings
---at this point theVerifiedList_in_Capture_LOG is a list of strings (about 25 000 items to process)
if ABort_Copy is true then exit repeat
if theFile is in V's VerCapInLog then
my Info_Display("Verified: " & (theFile as string)) --- very slow.
else --not verified
set theFile to theFile as alias
set FileinCaptureLog to GetFileinCaptureLog(theFile)
end if
end repeat
Without getting into gory details, its because a list or string in a script object exists as a separate entity and a much faster algorithm is used for finding its parts than is used when you ask for “item j of Blah” or compare as you are doing. It can be five or six times faster with long lists.
Well it certainly is faster. Thanks for the tip on Script objects Adam. I’ll be using it a lot more.
However, my original problem still remains. It still slows down halfway through. It gets to the point were it does only 2 loops per second. With almost 15 000 left to do you can see were this is going.
I would suggest running a sed or awk shell script from within AppleScript. (sed and awk are both pattern matching “programs”.) I know those are more root programs and have the impression that they’re faster. Now, that being said, I don’t know too much about them myself. However, there is a lot available with a little googleing
A sed script to copy every line that contained “aa” from file1 to file 2 would be:
(run in terminal)
sed ‘/aa/p’ file1.txt > file2.txt
I expect that it’s slowing down because you’re using so much memory that it’s beginning to page it to disk. Disk swapping is extremely slow, so you’ll have to segment the task.
What would be the best approach in doing this. I suspect you are right about the disk paging.
I have tried creating temp files with 1000 paths in each saved file as list. (A sort of cache if you wish) Then using another loop I am reading each files contents and in theory looping through only 1000 paths at a time, yet the main “meat” loop still runs slow.
Is there a way to free the memory after it’s used?
This is odd. The machine is not nearly out of memory. My Mac has 3.5 GB of RAM. When my app runs it uses about 20 MB of RAM and between 420MB and 440MB of virtual memory.
AppleScript Release Notes for Mac OS X version 10.5
[u]http://developer.apple.com/releasenotes/AppleScript/RN-AppleScript/index.html[/u]
… Bug Fixes
AppleScript
*The delay command uses less CPU. [3178086]
* Impossible object specifiers in math expressions, such as 1 + character 2 of “9”, produce an error instead of a random result. [4029175]
* The numerical limits for repeat loops accept real numbers; they will be rounded to integers. [4215670]
* Object specifiers other than application and date specifiers, in particular alias “…” and POSIX file “…”, are not evaluated at compile time, and will be left exactly as originally typed. [4444698] → * AppleScript no longer limits script memory usage to 32MB. [4511477]
* Counting the paragraphs of an empty string gives a result of zero. [4588706]
* Raw data literals («data …») are no longer limited to 127 bytes. [4986420]
I am on Tiger. If this is my problem, how can I clear the 32 MB to make space for the next round in the loop?
I tried to delete the variables (which I set at the start of the repeat loop) , right at the end of the repeat block just before it goes back for the next loop. Does it actually do anything when you delete the variable?
The slow down still happens though. I have abandoned the cache file idea because the effects are the same. As the loop continues it still slows down. I am starting to think that it definately has something to do with memory and as you discovered , it might be an applescript limitation.
There MUST be a way around this. I will try to optimize the “main” handler and then post it here later for others to view.
I am sure there are faster ways of doing what I am trying to achieve.
OK here is what I have so far. Thanks to whoever sits through all this !
on Begin_CopySecure() --current handler
set START_TIME to current date
set END_TIME to "Not Set"
ControlProgressIndeterminate(true)
DisableCopySecureButton()
set theGraphicOn to my Working_In_Graphic("Capture")
set theState to my Check_CaptureLocation_BackupLocation()
if theState is true then
my Info_Display("Gathering files to process ...")
delay 0.5
set theCaptureLocationFileList to my Get_Location_Files(CaptureLocation)
--set Files_Buildt to my BuildCacheFiles(theCaptureLocationFileList)
my Info_Display("Gathering Verified files from Log ...")
set theVerifiedList_in_Capture_LOG to Get_Verified_CaptureList_From_SQL()
--set UnVerifiedList to {}
set MaxCount to count items in theCaptureLocationFileList
my SETControlProgress(0, MaxCount)
--my Info_Display("Finding Un-Verified files...")
set x to 0
my Info_Display("Allocating memory...") ---remove after testing
script Verified_Items
property VerCapInLog : missing value
property CapLocList : missing value
end script
my Info_Display("Setup Verified list...") ---remove after testing
set Verified_Items's VerCapInLog to theVerifiedList_in_Capture_LOG
my Info_Display("Setup Capture list...") ---remove after testing
set Verified_Items's CapLocList to theCaptureLocationFileList
my Info_Display("Start process...")
set x to 0
set MaxCount to count items in theCaptureLocationFileList
my SETControlProgress(0, MaxCount)
repeat with theFile in Verified_Items's CapLocList
if ABort_Copy is true then
-- set ABort_Copy to false
exit repeat
end if
if theFile is in Verified_Items's VerCapInLog then
my Info_Display("Verified: " & (theFile as string))
else --not verified
set theFile to theFile as alias
set FileinCaptureLog to GetFileinCaptureLog(theFile)
if FileinCaptureLog = "" then
set theGraphicOn to my Working_In_Graphic("Capture")
set CaptureFile_MD5_Value to my Get_MD5_CHECKSUM_VALUE(theFile)
my SaveFilein_Capture_LOG(theFile, CaptureFile_MD5_Value)
set FileinCaptureLog to GetFileinCaptureLog(theFile)
--set FileinCaptureLog to Set_File_to_ONLINE(FileinCaptureLog)
else
--set FileinCaptureLog to Set_File_to_ONLINE(FileinCaptureLog)
end if
--set FileinCaptureLog to GetFileinCaptureLog(theFile)
try
set HasVerified to item 6 of FileinCaptureLog
on error
set HasVerified to "NO"
end try
if HasVerified = "YES" then
my Info_Display("Verified: " & (theFile as string))
--Scan next file
else
set LoopCount to 0
repeat
if ABort_Copy is true then
-- set ABort_Copy to false
exit repeat
end if
set LoopCount to LoopCount + 1
set theGraphicOn to my Working_In_Graphic("Backup")
set FileinBackupLog to GetFileinBackupLog(theFile)
--display dialog "File in Backup LOG: " & FileinBackupLog
if FileinBackupLog = "" then -- file is not in log
--display dialog "file not in log look for in file system"
set theFileinBackup_FileSystem to my GetFileinBackup_FileSystem(theFile)
--display dialog "found " & (item 2 of theFileinBackup_FileSystem)
if item 2 of theFileinBackup_FileSystem = "File Path does not exist-error 1" then
try
set ParentDirinBackup to my GetParentDirinBackup_FileSystem(theFile)
on error
set ParentDirinBackup to my CreateParentDirinBackup_FileSystem(theFile)
end try
set theGraphicOn to my Working_In_Graphic("All_OFF")
my progressWheel_Run(true)
set File_Has_Copied_with_MD5Value_For_Capture_File to my Copy_File_from_Capture_to_Backup(theFile, ParentDirinBackup)
my progressWheel_Run(false)
--my SaveFilein_Backup_LOG(theFile, File_Has_Copied_with_MD5Value_For_Capture_File)
else
--try
if item 1 of theFileinBackup_FileSystem is true then
set theGraphicOn to my Working_In_Graphic("BackupGraphic")
set Backup_File_MD5_Value to Get_MD5_CHECKSUM_VALUE(item 2 of theFileinBackup_FileSystem)
--display dialog Backup_File_MD5_Value
my SaveFilein_Backup_LOG(item 2 of theFileinBackup_FileSystem, Backup_File_MD5_Value)
--display dialog "Saved in Backup Log"
end if
--end try
end if
else -- file is in log
set theGraphicOn to my Working_In_Graphic("BackupGraphic")
set theFileinBackup_FileSystem to my GetFileinBackup_FileSystem(theFile)
try
if item 1 of theFileinBackup_FileSystem is true then
set CaptureFile_MD5 to my GetFileinCaptureLog(theFile)
set BackupFile_MD5 to my GetFileinBackupLog(item 2 of theFileinBackup_FileSystem)
set AppleScript's text item delimiters to "|"
set CaptureFile_MD5 to every text item in CaptureFile_MD5
try
set CaptureFile_MD5 to item 8 of CaptureFile_MD5
on error
set CaptureFile_MD5 to 1
end try
set BackupFile_MD5 to every text item in BackupFile_MD5
try
set BackupFile_MD5 to item 8 of BackupFile_MD5
on error
set BackupFile_MD5 to 0
end try
set AppleScript's text item delimiters to ""
set theGraphicOn to my Working_In_Graphic("Compare")
set MD5_Compares to my Compare_MD5_Checksum(CaptureFile_MD5, BackupFile_MD5)
if MD5_Compares = true then
my Save_File_as_VERIFIED(theFile, CaptureFile_MD5)
else
my DeleteFile_from_BackupLOG_AND_FileSYSTEM(item 2 of theFileinBackup_FileSystem)
end if
else -- file not in file system
if item 2 of theFileinBackup_FileSystem = "File Path does not exist-error 1" then
try
set ParentDirinBackup to my GetParentDirinBackup_FileSystem(theFile)
on error
set ParentDirinBackup to my CreateParentDirinBackup_FileSystem(theFile)
end try
my progressWheel_Run(true)
set theGraphicOn to my Working_In_Graphic("All_OFF")
set File_Has_Copied to my Copy_File_from_Capture_to_Backup(theFile, ParentDirinBackup)
my progressWheel_Run(false)
end if
end if
end try
end if
if LoopCount is 3 then
exit repeat
end if
set theGraphicOn to my Working_In_Graphic("All_OFF")
end repeat
end if
end if --file verified
---scan next file
my IncrementControlProgress(1)
set x to x + 1
RemainingInfo("Remaining: " & (MaxCount - x))
end repeat
ControlProgressIndeterminate(true)
delay 0.5
if ABort_Copy is true then
my Info_Display("Finishing last command... Waiting to Abort ")
else
my Info_Display("Get Files in Backup Location...")
set theGraphicOn to my Working_In_Graphic("Backup")
set theBackupLocationFileList to my Get_Location_Files(BackupLocation)
---Is file in dest log? if it is generate md5
my Info_Display("Get Files in Backup Log...")
set Files_in_Backup_Log to my Get_Files_In_Backup_Log()
script Backup_Items
property FileinLog : missing value
property BackLocList : missing value
end script
my Info_Display("Loading into memory...") ---remove after testing
set Backup_Items's FileinLog to Files_in_Backup_Log
set Backup_Items's BackLocList to theBackupLocationFileList
set x to 0
set MaxCount to count items in theBackupLocationFileList
my SETControlProgress(0, MaxCount)
repeat with someFile in Backup_Items's BackLocList
if ABort_Copy is true then
--set ABort_Copy to false
exit repeat
end if
if someFile is in Backup_Items's FileinLog then
--Set_File_in_Backup_to_ONLINE(someFile)
--Set_File_in_Backup_to_Verified(someFile)
try
my Info_Display("Files in Backup ONLINE: " & someFile)
end try
else
try
set BackupMD5 to my Get_MD5_CHECKSUM_VALUE(someFile)
on error
set BackupMD5 to "FAILED"
end try
my SaveFilein_Backup_LOG(someFile, BackupMD5)
(*if BackupMD5 is "FAILED" then
--
else
--Set_File_in_Backup_to_Verified(someFile)
--Set_File_in_Backup_to_ONLINE(someFile)
end if*)
end if
try
my IncrementControlProgress(1)
set x to x + 1
my RemainingInfo("Remaining: " & (MaxCount - x))
end try
end repeat
end if
---SCAN SOURCE LOG FOR OFFLINE FILES
--set DELETE_OFFLINE_Capture_Files_In_Log to my Delete_Offline_Files_in_Log("Capture")
--my Set_ALL_BackupLogfiles_to_OFFLINE()
---Read next file in dest dir
(*if ABort_Copy is true then
--
else
set DELETE_OFFLINE_Capture_Files_In_Log to my Delete_Offline_Files_in_Log("Backup")
end if*)
else
my DisplayAlerttoUSER("Capture & destination error")
end if
my Info_Display("Run Complete !")
EnableCopySecureButton()
RemainingInfo("")
set AppleScript's text item delimiters to ""
set END_TIME to current date ---remove time stuff after testing start , remember current date at top of handler
set FINAL_TIME_TAKEN to (END_TIME - START_TIME) / 60
set FINAL_TIME_TAKEN to FINAL_TIME_TAKEN as string
if (count of characters in FINAL_TIME_TAKEN) is greater than 4 then
set FINAL_TIME_TAKEN to characters 1 thru 4 of FINAL_TIME_TAKEN as string
end if
my Info_Display("Run Started on " & (START_TIME as string))
my Info_Display("Run Ended on " & (END_TIME as string))
my Info_Display("Run Took " & (FINAL_TIME_TAKEN as string) & " minutes to complete.") ---remove time stuff after testing end
end Begin_CopySecure
So hopefully I got it right and the verified items should cause an early next repeat. This is the part that I hope to speed up and where applescript also gets slower and slower as I iterate through the list of items.
This might be worth a try. The entire operation of creating a list of 25,000 strings and then searching it for one specific string took less than one second on my machine.
icta
-- time the following operation:
set start_time to (time of (current date)) -- start timing
set my_list to {}
-- use the "a reference to" operator:
set my_list_ref to a reference to my_list
-- create a list of 25,000 strings:
set number_of_items to 25000
repeat with x from 1 to number_of_items
set this_string to "String_" & x
copy this_string to the end of my_list_ref
end repeat
-- look for a specific string:
if my_list_ref contains "String_24000" then
set string_does_exit to true
end if
set end_time to (time of (current date)) -- stop timing.
set elapsed_time to end_time - start_time
log "1. my_list is shown below:"
log my_list
log return
log "2. elapsed_time is is " & elapsed_time & " seconds."
log return
log "3. string_does_exit is: " & string_does_exit & "."