general speed / optimization question

Hi

i’ve got a behemoth of a script (my first big one) that I’m not going to post b/c it probably (well I know it is) is really sloppy, plus I want hints, not answers. Anyway, it is used to automate naming of image files that are placed in a directory tree by first validating the format of the filename, then checking first locally then in a remote mirror directory tree for the next available serial number

e.g.
file names are like 100x200_023.tif

the first segment being a category, the second segment being a subcategory, and after the underscore being an arbitrary serial number. Anyway, there is a lot of looping that goes on, and the script is recursive. once I run it it goes through the whole tree and names all the files correctly.

It works, but it really slows down almost to the point of not being worth running. If I split the job into parts i.e. just run the script on sub-directories rather than starting at the root, it is much faster.

My question is this: what kinds of guidelines should I be following in regards to avoiding this slowdown. I have to think it is something to do with the looping, some memory leak or something, but I’ve never dealt with that sort of thing before, so I’m not too familiar with it. Any advice would be greatly appreciated.

thanks,
Andy S

Do not loop on itemss but on references on items :

The following is very quick :


repeat with x in myList
    DoWhatEverYouWantOnYOurItem(x)
end repeat

This version is slow:


repeat with x in from 1 to count myList
    set myItem to item x of myList
    DoWhatEverYouWantOnYOurItem(myItem)
end repeat

In a first time this shoud accelerate your script a lot.

This part scares me:

Does this mean for every file you’re scanning the local and remote directories to find the next serial number?

If so you can realize a tremendous speedup by performing this check once and incrementing it as you go through your loop. In general file system searches are slow and you should avoid them as much as possible. Running one check to determine the ‘next serial number’ at the beginning of your script will make a big difference.

Try to think of ways to keep the number of disk accesses to a minimum. Don’t loop through each individual item in a folder to see if it’s a file or a folder, or to get its name. Ask the Finder specifically for the ‘name of every file’ in the folder. When you’ve dealt with the file names, then get ‘every folder’ of the folder and run the recursion on each item in the resulting list.

I did not understand this example:

"The following is very quick :
repeat with x in myList
DoWhatEverYouWantOnYOurItem(x)
end repeat

This version is slow:
repeat with x in from 1 to count myList
set myItem to item x of myList
DoWhatEverYouWantOnYOurItem(myItem)
end repeat"

How would this look like in my script here:

repeat with CurrentFolderNummer from 0 to NumberOfFoldersInFolderList
		
		if not CurrentFolderNummer = 0 then
			set derOrdner to item CurrentFolderNummer of FolderList as alias
		end if
		
		my gehtlos(derOrdner, wasTun, dieLogfileLocation, dieMovefileFolderLocation)
		
	end repeat

Like this?

repeat with CurrentFolderNummer from 0 to FolderList
	
	if not CurrentFolderNummer = 0 then
		set derOrdner to CurrentFolderNummer
	end if
	
	my gehtlos(derOrdner, wasTun, dieLogfileLocation, dieMovefileFolderLocation)
	
end repeat

Hi Airbuff,

your script can be optimized like this

repeat with CurrentFolderNummer in FolderList
	my gehtlos((CurrentFolderNummer as alias), wasTun, dieLogfileLocation, dieMovefileFolderLocation)
end repeat

Hi Trash Man,

This is a good naming convention, if you know that there will be less than 999 files, as you can easily find the last index by getting the last file with:

tell app “Finder”
set last_file to last file of somefolder whose name begins with “100x200_”
end tell

You then increment that index (“023” as integer) to get the next index.

gl,

Here’s another great tip I read about in Bill Cheeseman’s article.

When you search several folders, instead of looping through each file in a bunch of folders, you can use the alias reference.


to search for itemName at specialFolders
    repeat with aFolder in specialFolders
       try
          return alias ((path to aFolder as string) & itemName) as string
       on error number -43 --item not there, go to next folder
       end try
    end repeat
    return "" --nothing found, return empty string
 end search
 

If the file with itemName is not in the folder aFolder, it will error and go on to the next folder. So, instead of searching all items for one with the same name, just use the name and try coercing to alias.

I haven’t test speed differences but I believe him. Maybe I’ll test it today.

gl,

Hi Steffan,
thanks for the answer. It´s working but I don´t get it. Could you explain, why this is faster?

repeat with CurrentFolderNummer in FolderList
   my gehtlos((CurrentFolderNummer as alias), wasTun, dieLogfileLocation, dieMovefileFolderLocation)
end repeat
repeat with i in theList

refers directly to the list

repeat with i from 1 to count theList set a to item i of theList
picks one item from the list every loop, which takes time

but sometimes you need having access to the index variable,
then you must use version 2

Is “index variable” = e.g. the motherfolder?

the index variable (i in the script) is the number to index the items of the list, which is incremented automatically.