How to speed up Applescript?

Hi ,

How can I speed up Applescript when doing a repeat loop to check if an item is in a list. The list has about 25000 items in it. As Applescript goes through the list it gets slower and slower.

Why ? How can I improve on this?

Here’s some of the code:

repeat with theFile in theCaptureLocationFileList
---at this point theFile is a string
---at this point theCaptureLocationFileList is a list of strings
---at this point theVerifiedList_in_Capture_LOG is a list of strings (about 25 000 items to process)
			if ABort_Copy is true then
				exit repeat
			end if
			if theFile is in theVerifiedList_in_Capture_LOG then
				my Info_Display("Verified: " & (theFile as string))
			else --not verified
				set theFile to theFile as alias
				set FileinCaptureLog to GetFileinCaptureLog(theFile)
                     end if
end repeat

Thanks for having a look !

EM

Hi

Would it be better to break the list up into smaller lists, say 1000 items per list ?

kind regards

EM

Without knowing more of what you’re doing, I can’t be specific, but huge increases in speed are possible using a script object.

to DoSomething(args)
	script S
		property A : missing value
		property B : missing value
	end script
	set S's A to args
	set S's B to paragraphs of S's A
	-- do stuff
	return something
end DoSomething

Well here’s whats goin on. I am building an app that copies files to a network location. It then generates a md5 checksum of both the source and destination files and saves it into a log. Basically the purpose is too make sure the files copied correctly and if not to do it again or alert the user. It should aid the workflow in such a way as to eliminate too much user interaction.

I built the log system with sqlite.

So here’s the thing. I don’t want to generate md5’s for files already in the log unless there was a copy or md5 error.

So I get a list from sqlite with the paths of each file in the log with correctly generated md5 values. Then I get a new path list for newly added files to the source folder. Next I’m comparing the file list with the log list (repeat loop) to see if the file path is in the log list. This is the process that takes really long to complete.

This system will generally not have more than about 25000 files “ONLINE” so I need to speed through the “Verified” files and get to the job of copying and generating md5’s for the newly added files. Currently this process is taking over 30 minutes and is WAY TOO SLOW.

Hope this makes sense

kind regards

EM

Then to speed up your loop, you use a script object for its variables.

The idea with a script object (V in the sample) in simplest terms is that all of its data is read into memory.

-- way up at the beginning of your script:
script V
	property VerCapInLog : missing value
	property CapLocList : missing value
end script
-- read the data in, then
set V's VerCapInLog to theVerifiedList_in_Capture_LOG
set V's CapLocList to theCaptureLocationFileList
-- set things up as necessary, then

repeat with theFile in V's CapLocList
	---at this point theFile is a string
	---at this point theCaptureLocationFileList is a list of strings
	---at this point theVerifiedList_in_Capture_LOG is a list of strings (about 25 000 items to process)
	if ABort_Copy is true then exit repeat
	if theFile is in V's VerCapInLog then
		my Info_Display("Verified: " & (theFile as string)) --- very slow.
	else --not verified
		set theFile to theFile as alias
		set FileinCaptureLog to GetFileinCaptureLog(theFile)
	end if
end repeat

Thanks Adam

I’ll give it a bash tomorrow and let you know what happens.

Just out of curiousity, is the list not loaded into memory already by assigning it to a variable?

kind regards

EM

Without getting into gory details, its because a list or string in a script object exists as a separate entity and a much faster algorithm is used for finding its parts than is used when you ask for “item j of Blah” or compare as you are doing. It can be five or six times faster with long lists.

Well it certainly is faster. Thanks for the tip on Script objects Adam. I’ll be using it a lot more.

However, my original problem still remains. It still slows down halfway through. It gets to the point were it does only 2 loops per second. With almost 15 000 left to do you can see were this is going.

Any ideas?

Kind regards

EM

I would suggest running a sed or awk shell script from within AppleScript. (sed and awk are both pattern matching “programs”.) I know those are more root programs and have the impression that they’re faster. Now, that being said, I don’t know too much about them myself. However, there is a lot available with a little googleing

A sed script to copy every line that contained “aa” from file1 to file 2 would be:
(run in terminal)
sed ‘/aa/p’ file1.txt > file2.txt

(better sed scripters may correct me on this)

I expect that it’s slowing down because you’re using so much memory that it’s beginning to page it to disk. Disk swapping is extremely slow, so you’ll have to segment the task.

I agree with even_dana, I might right the data to a text file and use a grep shell script to check for the existence of the items in question.

Adam

What would be the best approach in doing this. I suspect you are right about the disk paging.

I have tried creating temp files with 1000 paths in each saved file as list. (A sort of cache if you wish) Then using another loop I am reading each files contents and in theory looping through only 1000 paths at a time, yet the main “meat” loop still runs slow.

Is there a way to free the memory after it’s used?

regards

EM

This is odd. The machine is not nearly out of memory. My Mac has 3.5 GB of RAM. When my app runs it uses about 20 MB of RAM and between 420MB and 440MB of virtual memory.

I also run a few other apps at the same time.

The machine always has 1.5 GB RAM free.

What gives?

regards
EM

Hi EM

… are you running Tiger or Leopard ?

AppleScript Release Notes for Mac OS X version 10.5
[u]http://developer.apple.com/releasenotes/AppleScript/RN-AppleScript/index.html[/u]

Bug Fixes
AppleScript
*The delay command uses less CPU. [3178086]
* Impossible object specifiers in math expressions, such as 1 + character 2 of “9”, produce an error instead of a random result. [4029175]
* The numerical limits for repeat loops accept real numbers; they will be rounded to integers. [4215670]
* Object specifiers other than application and date specifiers, in particular alias “…” and POSIX file “…”, are not evaluated at compile time, and will be left exactly as originally typed. [4444698]
→ * AppleScript no longer limits script memory usage to 32MB. [4511477]
* Counting the paragraphs of an empty string gives a result of zero. [4588706]
* Raw data literals («data …») are no longer limited to 127 bytes. [4986420]

Thanks for your reply clemhoff

I am on Tiger. If this is my problem, how can I clear the 32 MB to make space for the next round in the loop?

I tried to delete the variables (which I set at the start of the repeat loop) , right at the end of the repeat block just before it goes back for the next loop. Does it actually do anything when you delete the variable?

The slow down still happens though. I have abandoned the cache file idea because the effects are the same. As the loop continues it still slows down. I am starting to think that it definately has something to do with memory and as you discovered , it might be an applescript limitation.

There MUST be a way around this. I will try to optimize the “main” handler and then post it here later for others to view.

I am sure there are faster ways of doing what I am trying to achieve.

cheers

EM

OK here is what I have so far. Thanks to whoever sits through all this !

on Begin_CopySecure() --current handler
	set START_TIME to current date
	set END_TIME to "Not Set"
	ControlProgressIndeterminate(true)
	DisableCopySecureButton()
	set theGraphicOn to my Working_In_Graphic("Capture")
	set theState to my Check_CaptureLocation_BackupLocation()
	if theState is true then
		my Info_Display("Gathering files to process ...")
		delay 0.5
		set theCaptureLocationFileList to my Get_Location_Files(CaptureLocation)
		--set Files_Buildt to my BuildCacheFiles(theCaptureLocationFileList)
		my Info_Display("Gathering Verified files from Log ...")
		set theVerifiedList_in_Capture_LOG to Get_Verified_CaptureList_From_SQL()
		--set UnVerifiedList to {}
		set MaxCount to count items in theCaptureLocationFileList
		my SETControlProgress(0, MaxCount)
		--my Info_Display("Finding Un-Verified files...")
		set x to 0
		
		my Info_Display("Allocating memory...") ---remove after testing
		
		script Verified_Items
			property VerCapInLog : missing value
			property CapLocList : missing value
		end script
		
		my Info_Display("Setup Verified list...") ---remove after testing
		set Verified_Items's VerCapInLog to theVerifiedList_in_Capture_LOG
		my Info_Display("Setup Capture list...") ---remove after testing
		set Verified_Items's CapLocList to theCaptureLocationFileList
		
		my Info_Display("Start process...")
		
		set x to 0
		set MaxCount to count items in theCaptureLocationFileList
		my SETControlProgress(0, MaxCount)
		repeat with theFile in Verified_Items's CapLocList
			if ABort_Copy is true then
				--	set ABort_Copy to false
				exit repeat
			end if
			if theFile is in Verified_Items's VerCapInLog then
				my Info_Display("Verified: " & (theFile as string))
				
			else --not verified
				
				set theFile to theFile as alias
				set FileinCaptureLog to GetFileinCaptureLog(theFile)
				
				if FileinCaptureLog = "" then
					set theGraphicOn to my Working_In_Graphic("Capture")
					set CaptureFile_MD5_Value to my Get_MD5_CHECKSUM_VALUE(theFile)
					my SaveFilein_Capture_LOG(theFile, CaptureFile_MD5_Value)
					set FileinCaptureLog to GetFileinCaptureLog(theFile)
					--set FileinCaptureLog to Set_File_to_ONLINE(FileinCaptureLog)
				else
					--set FileinCaptureLog to Set_File_to_ONLINE(FileinCaptureLog)
				end if
				--set FileinCaptureLog to GetFileinCaptureLog(theFile)
				try
					set HasVerified to item 6 of FileinCaptureLog
				on error
					set HasVerified to "NO"
				end try
				if HasVerified = "YES" then
					my Info_Display("Verified: " & (theFile as string))
					--Scan next file
				else
					set LoopCount to 0
					repeat
						if ABort_Copy is true then
							--	set ABort_Copy to false
							exit repeat
						end if
						set LoopCount to LoopCount + 1
						set theGraphicOn to my Working_In_Graphic("Backup")
						set FileinBackupLog to GetFileinBackupLog(theFile)
						--display dialog "File in Backup LOG: " & FileinBackupLog
						if FileinBackupLog = "" then -- file is not in log
							--display dialog "file not in log look for in file system"
							set theFileinBackup_FileSystem to my GetFileinBackup_FileSystem(theFile)
							
							--display dialog "found " & (item 2 of theFileinBackup_FileSystem)
							if item 2 of theFileinBackup_FileSystem = "File Path does not exist-error 1" then
								try
									set ParentDirinBackup to my GetParentDirinBackup_FileSystem(theFile)
								on error
									set ParentDirinBackup to my CreateParentDirinBackup_FileSystem(theFile)
								end try
								set theGraphicOn to my Working_In_Graphic("All_OFF")
								my progressWheel_Run(true)
								set File_Has_Copied_with_MD5Value_For_Capture_File to my Copy_File_from_Capture_to_Backup(theFile, ParentDirinBackup)
								my progressWheel_Run(false)
								--my SaveFilein_Backup_LOG(theFile, File_Has_Copied_with_MD5Value_For_Capture_File)
							else
								--try
								if item 1 of theFileinBackup_FileSystem is true then
									set theGraphicOn to my Working_In_Graphic("BackupGraphic")
									set Backup_File_MD5_Value to Get_MD5_CHECKSUM_VALUE(item 2 of theFileinBackup_FileSystem)
									--display dialog Backup_File_MD5_Value
									
									my SaveFilein_Backup_LOG(item 2 of theFileinBackup_FileSystem, Backup_File_MD5_Value)
									--display dialog "Saved in Backup Log"
								end if
								--end try
							end if
							
						else -- file is in log 
							set theGraphicOn to my Working_In_Graphic("BackupGraphic")
							set theFileinBackup_FileSystem to my GetFileinBackup_FileSystem(theFile)
							try
								if item 1 of theFileinBackup_FileSystem is true then
									set CaptureFile_MD5 to my GetFileinCaptureLog(theFile)
									set BackupFile_MD5 to my GetFileinBackupLog(item 2 of theFileinBackup_FileSystem)
									set AppleScript's text item delimiters to "|"
									set CaptureFile_MD5 to every text item in CaptureFile_MD5
									try
										set CaptureFile_MD5 to item 8 of CaptureFile_MD5
									on error
										set CaptureFile_MD5 to 1
									end try
									set BackupFile_MD5 to every text item in BackupFile_MD5
									try
										set BackupFile_MD5 to item 8 of BackupFile_MD5
									on error
										set BackupFile_MD5 to 0
									end try
									set AppleScript's text item delimiters to ""
									set theGraphicOn to my Working_In_Graphic("Compare")
									set MD5_Compares to my Compare_MD5_Checksum(CaptureFile_MD5, BackupFile_MD5)
									if MD5_Compares = true then
										my Save_File_as_VERIFIED(theFile, CaptureFile_MD5)
									else
										my DeleteFile_from_BackupLOG_AND_FileSYSTEM(item 2 of theFileinBackup_FileSystem)
									end if
								else -- file not in file system
									if item 2 of theFileinBackup_FileSystem = "File Path does not exist-error 1" then
										try
											set ParentDirinBackup to my GetParentDirinBackup_FileSystem(theFile)
										on error
											set ParentDirinBackup to my CreateParentDirinBackup_FileSystem(theFile)
										end try
										my progressWheel_Run(true)
										set theGraphicOn to my Working_In_Graphic("All_OFF")
										set File_Has_Copied to my Copy_File_from_Capture_to_Backup(theFile, ParentDirinBackup)
										my progressWheel_Run(false)
									end if
									
								end if
							end try
						end if
						
						if LoopCount is 3 then
							exit repeat
						end if
						set theGraphicOn to my Working_In_Graphic("All_OFF")
					end repeat
				end if
				
			end if --file verified
			---scan next file
			my IncrementControlProgress(1)
			set x to x + 1
			RemainingInfo("Remaining: " & (MaxCount - x))
		end repeat
		ControlProgressIndeterminate(true)
		delay 0.5
		if ABort_Copy is true then
			my Info_Display("Finishing last command... Waiting to Abort ")
		else
			my Info_Display("Get Files in Backup Location...")
			set theGraphicOn to my Working_In_Graphic("Backup")
			
			set theBackupLocationFileList to my Get_Location_Files(BackupLocation)
			---Is file in dest log? if it is generate md5
			my Info_Display("Get Files in Backup Log...")
			
			set Files_in_Backup_Log to my Get_Files_In_Backup_Log()
			
			script Backup_Items
				property FileinLog : missing value
				property BackLocList : missing value
			end script
			
			my Info_Display("Loading into memory...") ---remove after testing
			set Backup_Items's FileinLog to Files_in_Backup_Log
			set Backup_Items's BackLocList to theBackupLocationFileList
			
			set x to 0
			set MaxCount to count items in theBackupLocationFileList
			my SETControlProgress(0, MaxCount)
			
			repeat with someFile in Backup_Items's BackLocList
				if ABort_Copy is true then
					--set ABort_Copy to false
					exit repeat
				end if
				
				if someFile is in Backup_Items's FileinLog then
					--Set_File_in_Backup_to_ONLINE(someFile) 
					--Set_File_in_Backup_to_Verified(someFile)
					try
						my Info_Display("Files in Backup ONLINE: " & someFile)
					end try
				else
					try
						set BackupMD5 to my Get_MD5_CHECKSUM_VALUE(someFile)
					on error
						set BackupMD5 to "FAILED"
					end try
					my SaveFilein_Backup_LOG(someFile, BackupMD5)
					(*if BackupMD5 is "FAILED" then
					--
				else
					
					--Set_File_in_Backup_to_Verified(someFile)
					--Set_File_in_Backup_to_ONLINE(someFile)
				end if*)
				end if
				try
					my IncrementControlProgress(1)
					set x to x + 1
					my RemainingInfo("Remaining: " & (MaxCount - x))
				end try
			end repeat
		end if
		---SCAN SOURCE LOG FOR OFFLINE FILES
		
		--set DELETE_OFFLINE_Capture_Files_In_Log to my Delete_Offline_Files_in_Log("Capture")
		--my Set_ALL_BackupLogfiles_to_OFFLINE()
		---Read next file in dest dir
		(*if ABort_Copy is true then
			--
		else
			set DELETE_OFFLINE_Capture_Files_In_Log to my Delete_Offline_Files_in_Log("Backup")
		end if*)
	else
		my DisplayAlerttoUSER("Capture & destination error")
		
	end if
	
	my Info_Display("Run Complete !")
	EnableCopySecureButton()
	RemainingInfo("")
	set AppleScript's text item delimiters to ""
	set END_TIME to current date ---remove time stuff after testing start , remember current date at top of handler
	set FINAL_TIME_TAKEN to (END_TIME - START_TIME) / 60
	set FINAL_TIME_TAKEN to FINAL_TIME_TAKEN as string
	if (count of characters in FINAL_TIME_TAKEN) is greater than 4 then
		set FINAL_TIME_TAKEN to characters 1 thru 4 of FINAL_TIME_TAKEN as string
	end if
	my Info_Display("Run Started on " & (START_TIME as string))
	my Info_Display("Run Ended on " & (END_TIME as string))
	my Info_Display("Run Took " & (FINAL_TIME_TAKEN as string) & " minutes to complete.") ---remove time stuff after testing end
end Begin_CopySecure

So hopefully I got it right and the verified items should cause an early next repeat. This is the part that I hope to speed up and where applescript also gets slower and slower as I iterate through the list of items.

Thanks again to anyone who made it to this point

Kind regards

EM

Any ideas anyone?

Hello EM,

This might be worth a try. The entire operation of creating a list of 25,000 strings and then searching it for one specific string took less than one second on my machine.

icta



-- time the following operation:
set start_time to (time of (current date)) -- start timing

set my_list to {}
-- use the "a reference to" operator: 
set my_list_ref to a reference to my_list

-- create a list of 25,000 strings:
set number_of_items to 25000
repeat with x from 1 to number_of_items
	set this_string to "String_" & x
	copy this_string to the end of my_list_ref
end repeat

-- look for a specific string:
if my_list_ref contains "String_24000" then
	set string_does_exit to true
end if

set end_time to (time of (current date)) -- stop timing.
set elapsed_time to end_time - start_time

log "1. my_list is shown below:"
log my_list
log return
log "2. elapsed_time is is " & elapsed_time & " seconds."
log return
log "3. string_does_exit is: " & string_does_exit & "."


Thanks for the reply, I’ll give it a go and report back.

What does the “reference to” operator do?

kind regards

EM

So I tried the suggested “a reference to” operator.

There is no significant speed increase and the process still slows down as the loop runs.

Thanks anyway.

What does this “a reference to” operator do ?

Adam suggested that I segment the task. How do I do that ?

regards

EM