I need to speed up this handler please

I’ve got an ApplescriptObjC project that prepares reports on up to a years activity.

There’s 365 files with up to 350 paragraphs in each file, each paragraph of which has an email embedded.

Getting the email is no hassle, but preparing the yearly count of how many times the email occurs is slow.

Can this handler be replaced by a shell script? I’m not proficient in writing shells by any means.

Regards

Santa



if tempClientString is in items of theClientList then
									set x to count of items of theClientList
									repeat with tempCycle from 1 to x -- now check through existing clients
										if tempClientString = item tempCycle of theClientList as text then
											set item (tempCycle) of theClientList2 to (item (tempCycle) of theClientList2) + 1
											exit repeat
										end if
									end repeat
								else
									if tempClientString ≠ "" and tempClientString is not in {","} then
										set end of theClientList to tempClientString
										set end of theClientList2 to 1
									end if
								end if

Model: 27" iMac
Browser: Safari 533.19.4
Operating System: Mac OS X (10.6)

Well, by a lot of experience and testing, repeat loops are 8x faster if you use them like this:

if tempClientString is in items of theClientList then
	--set x to count of items of theClientList
	set counterVar to 1
	repeat with tempCycle in theClientList  -- now check through existing clients
		if tempClientString = ((tempCycle) as text) then
			set item (counterVar) of theClientList2 to (item (counterVar) of theClientList2) + 1
			exit repeat
		end if
		set counterVar to counterVar + 1
	end repeat
else
	if tempClientString ≠ "" and tempClientString is not in {","} then
		set end of theClientList to tempClientString
		set end of theClientList2 to 1
	end if
end if

No idea why, but cycling a 2000+ item list took 30 seconds instead of 4 minutes in this manner. I think it’s generally faster to go through records than say paragraphs of a text. Then setting a variable to a text’s paragraphs as text would be faster than cycling through it’s paragraphs.

Also, although I haven’t tested it, since you are using ASOC you could use NSArray and cycle through them. You can use a repeat loop on an array (or NSMutableArray if you need to change contents in it) just fine. You can access an item in a an array’s item like this :

repeat with arrayItem in theBigArray
set anItem to arrayItem's objectAtIndex_(1)
...

Also, I could advise to use a different way of addressing the items in your arrays like this:

set anArray to {itemName:"The name", itemNumber:2, itemSubject:"a subject"}

you can later access any item in the list like this:

repeat with anItem in anArray
set theNameOfTheItem to (itemName of anItem) as string
...

Some or all of this could help improve the speed of your script I believe. If you need more info, let me know.

Thanks Leon

I’ve re-written my script to deal with 200 paragraph ‘chunks’ of the main list of clients, and used your suggested method. It does speed it up, but it still takes 20 minutes to deal with a test folder of 28,000 entries, and the main folder has over 120,000.

I’ll try an array next, to see if that speeds it up.

How do you start an array off? I’m presuming you declare it as a property, but in any particular form?

Regards

Santa

Wow, that’s a lot of data to go through! Let’s see what can be done to help get this done faster.

There is two kinds of arrays. Depends on wether you need to makes modifications in them or not. NSArray and NSMutableArray objects are initiated like this:

tell current application's NSArray to set theArray to arrayWithArray_(((paragraphs of theText) as list))
tell current application's NSMutableArray to set theMutableArray to arrayWithCapacity_(255) --don't worry about the number, it will automatically expand as needed, it's only a start point.

theMutableArray's addObjectsFromArray_(((paragraphs of theText) as list))

For finding things in arrays, there is somethings much more powerful than a repeat loop: NSPredicate.

You first need to create a new NSPredicate object that will define what to find in the array. Like this (with NSArray):

tell current application's NSPredicate to set thePredicate to predicateWithFormat_("SELF contains[cd] %@", stringToFind) --there is different matching options possible, this should do the trick for your needs
set theFilteredArray to theArray's filteredArrayUsingPredicate_(thePredicate)

and with NSMutableArray:

tell current application's NSPredicate to set thePredicate to predicateWithFormat_("SELF contains[cd] %@", stringToFind)
set theFilteredArray to theMutableArray's filterUsingPredicate_(thePredicate)

in both scenarios, the stringToFind variable is, obviously, what you need to look for, thus have it ready in advance. In both cases the result will be an array with only the matching content. You can then count them, sort them, etc. The documentation in Xcode for both methods will list everything you can do to them. NSMutableArray is a subclass of NSArray, so any method that works for the latter will work on the former.

There is also a method in NSArray that only checks if a certain object exists in the array and then returns a true or false. But it’s a bit more tricky because you need to create an NSString object before it can work. So it could go like this (untested)

set myStringToFind to "I want to find this phrase in the array."
tell current application's NSString to set stringToFind to stringWithString_(myStringToFind)
set theResult to theArray's containsObject_(stringToFind)
log theResult

Of course, it has to match an array’s item exactly for this to work. The previous method was a bit more flexible, ignoring case and diacriticals.

If need be, you could also sort the arrays’ content before, if you think that having them in a sorted order would help (either the original arrays or the results arrays). Like this:

theFilteredArray's sortUsingSelector_("compare:") --simple sort, usefull also for number sorting
theArray's sortUsingSelector_("caseInsensitiveCompare:") --more precise and natural sort

But I’m stating all this without knowing the contents of your files, what you need to match against and also what is the desired result. It could totally change, but this is only to give you some directions.

But, 120,000 files is a huge amount of data to go through. It’s not unusual if it takes time considering the amount of data and AppleScript’s nature. I don’t think you’ll be able to bring it down to 30 seconds even with methods like the ones i just showed you, but it should be faster.

Also, there is methods to load the contents of files straight into arrays and strings. Accessing the disk to get the contents of those files takes time too, maybe these methods would help:

tell current application's NSString to set theFileContents to stringWithContentsOfFile_encoding_error_("/Users/me/Desktop/theFile.txt", current application's NSUTF8StringEncoding, missing value) --Encoding should change depending on the file, but UTF8 is a good default I think
tell current application's NSArray to set theFileContentsAsArray to arrayWithContentsOfFile_("/Users/me/Desktop/theFile.plist") --of course, the file's contents has to be already in an array kind of structure like XML for this to work.

All I have posted here is untested and needs to be adapted to your app and your needs.

Let me know if this helps, and have fun!

Browser: Safari 531.22.7
Operating System: Mac OS X (10.6)

If you really need to speed this up, I think you will have to go to Objective C. I had a test program that had a table with 1 million rows of random numbers (the table was populated by an NSArray of one million entries). My program added a second column that kept a running total of the numbers in the first column. Using the following ASOC code to loop through the array and add the second column took 9 minutes:

on doTotalsAS_(sender)
		set theTotal to 0
		repeat with anEntry in theData
			set theTotal to theTotal + (anEntry's valueForKey_("amountA") as real)
			anEntry's addObject_forKey_(theTotal, "amountB")
		end repeat
		setTheData_(theData)
	end doTotalsAS_

Using the following Objective C code, the same program took 2 seconds ( 270 times faster) :

-(void)makeTotals:(NSMutableArray *)myArray{
float total=0;
for (id obj in myArray){
    total = total + [[obj valueForKey:@"amountA"] floatValue];
    [obj setValue:[NSNumber numberWithFloat:total] forKey:@"amountB"];
    }
}   

Ric

Of course ObjC will beat ASOC in speed and power. But ObjC is out of reach for so many so that’s why ASOC beats ObjC in ease of use. Unless he can find someone to code that part of his app in ObjC, i think he can greatly speed up his app with the examples I have given him.

I know by experience now that things that used to take 30 or 60 seconds to accomplish before in plain old AS takes less than a second now with ASOC, if, of course, I manipulate cocoa objects.

G’day, and thanks fellas, looks like I’ve got some learning to do!

G’day again

I’m in deep do-do here.

I simply can’t get the following to work. I thought I’d start simply, but I’m tearing my hair out.

For a start, this line

theClientListArray’s addObjectsFromArray_({tempClientString}) --< Make VERY long list

only makes a list of 23 items out of 83 clients, all the same name and email address, and equal to item 1 of the list of names in the files.

and this line

set item xx of theClientList2 to theFilteredArray's |count|()

gives me an error that the variable ‘theFilteredArray’ is not defined.

Heeeeeelp please!

Regards

Santa


on ClientReport()
		set DataClientYear to theYear's titleOfSelectedItem() as integer
		set TallyName2 to ((path to desktop) & "Mail Manager Folder:Mail Data " & DataClientYear) as text
		set TestClientDate to current date
		set EndClientDate to current date
		set year of TestClientDate to DataClientYear
		set month of TestClientDate to (startingMonth's indexOfSelectedItem()) + 1
		set day of TestClientDate to 1
		set the hours of TestClientDate to 0
		set the minutes of TestClientDate to 0
		set the seconds of TestClientDate to 0
		set year of EndClientDate to DataClientYear
		set month of EndClientDate to (endingMonth's indexOfSelectedItem()) + 1
		set day of EndClientDate to 1
		set the hours of EndClientDate to 0
		set the minutes of EndClientDate to 0
		set the seconds of EndClientDate to 0
		set HourlyString to {}
		set tempStartingMonth to month of TestClientDate as integer
		set tempendingmonth to month of EndClientDate as integer
		set theClientList to {}
		set theClientList2 to {}
		tell application "Finder"
			set theFiles to files of folder TallyName2 as alias list
			set xx to 0
			set TheBarIncrement to 100 / (count of theFiles)
		end tell
		Progress's setMaxValue_(100)
		Progress's setDoubleValue_(0)
		set BarCount to 0
		tell current application's NSMutableArray to set theClientListArray to arrayWithCapacity_(10)
		repeat with individualFile in theFiles
			set BarCount to BarCount + 0.5
			Progress's setDoubleValue_(TheBarIncrement * BarCount)
			tell application "Finder"
				set theCreationDate to name of individualFile
			end tell
			set theCMonth to word 2 of theCreationDate as integer
			if theCMonth ≥ tempStartingMonth and theCMonth ≤ tempendingmonth then
				set tempWholeList to my ReadFile2(individualFile as text) as list
				repeat with xxx from 1 to count of paragraphs of item 1 of tempWholeList -- run through the clients in .dat
					try
						set paragraphCycle to paragraph xxx of item 1 of tempWholeList as text
						set x to offset of "," in paragraphCycle
						set y to offset of ">" in paragraphCycle
						if x = 0 then set x to -1
						if y = 0 then set y to count of paragraphCycle
						if y > 10 then
							set tempClientString to characters (x + 1) thru y of paragraphCycle as text
							--if character 1 of tempClientString = " " then set tempClientString to characters 2 thru -1 of tempClientString
							set y to offset of ">" in tempClientString
							if y = 0 then set y to (offset of "," in tempClientString) - 1
							if y ≠ 0 then
								set tempClientString to characters 1 thru y of tempClientString as text
								repeat
									if character 1 of tempClientString is in {" ", "\""} then
										try
											set tempClientString to characters 2 thru -1 of tempClientString as text
										end try
									else
										exit repeat
									end if
								end repeat
								if tempClientString ≠ "" and tempClientString is not in {","} then
									theClientListArray's addObjectsFromArray_({tempClientString}) --< Make VERY long list
									if tempClientString is not in items of theClientList then
										set end of theClientList to tempClientString
										set end of theClientList2 to 0
									end if
								end if
							end if
						end if
					on error errmsg number errnum
						display dialog errmsg & " " & errnum
					end try
				end repeat
			end if
		end repeat
		set yy to count of theClientList
		repeat with xx from 1 to yy
			set stringToFind to item xx of theClientList
			tell current application's NSPredicate to set thePredicate to predicateWithFormat_("SELF contains[cd] %@", stringToFind)
			set theFilteredArray to theClientListArray's filterUsingPredicate_(thePredicate)
			say theClientListArray's |count|()
			try
				set item xx of theClientList2 to theFilteredArray's |count|()
			on error errmsg number errnum
				display dialog errmsg & " " & errnum
			end try
		end repeat
		
		
		set thecount to count of theClientList
		set DisplayString to {"There were no clients for this period."}
		set DisplayAlphabetString to {"There were no clients for this period."}
		set temp1 to {}
		try
			set theincrement to 20 / thecount --<-- This will crap itself on empty month
			repeat with x from 1 to thecount
				set t to item x of theClientList
				repeat
					if character 1 of t is in {"\"", " ", "?"} then
						set tt to characters 2 thru (count of t) of t as text
						set item x of theClientList to tt
						set t to tt
					else
						exit repeat
					end if
				end repeat
				if character 1 of t = "<" then
					try
						set t to characters 2 thru -1 of t & "<" as text
					end try
					set item x of theClientList to t
				end if
			end repeat
			set {temp4, temp3} to my sort2Lists(theClientList, theClientList2)
			set temp1 to {}
			set temp2 to {}
			repeat with x from (count of temp4) to 1 by -1
				set end of temp1 to item x of temp3
				set t to item x of temp4
				try
					if character -1 of t = "<" then
						set t to "<" & characters 1 thru -2 of t as text
					end if
				end try
				set end of temp2 to t -- ClientList
			end repeat
			set DisplayAlphabetString to {}
			repeat with x from (count of temp1) to 1 by -1
				set tempstring to "        "
				set tempNum to item x of temp1
				if tempNum > 9 then set tempstring to "      "
				if tempNum > 99 then set tempstring to "    "
				if tempNum > 999 then set tempstring to "  "
				set end of DisplayAlphabetString to (item x of temp1 as text) & tempstring & item x of temp2
			end repeat
			set {temp1, temp2} to my sort2Lists(temp1, temp2)
			set DisplayString to {}
			repeat with x from (count of temp1) to 1 by -1
				set tempstring to "        "
				set tempNum to item x of temp1
				if tempNum > 9 then set tempstring to "      "
				if tempNum > 99 then set tempstring to "    "
				if tempNum > 999 then set tempstring to "  "
				
				--Set email
				set end of DisplayString to (item x of temp1 as text) & tempstring & item x of temp2
			end repeat
		on error errmsg
			display dialog errmsg
		end try
		set GraphData to {}
		--ClientToGraph's removeAllItems()
		ClientListAlpha's removeAllItems()
		ClientListAlpha's addItemsWithTitles_(DisplayAlphabetString)
		ClientListNumer's removeAllItems()
		ClientListNumer's addItemsWithTitles_(DisplayString)
		Progress's setDoubleValue_(0)
	end ClientReport

The method, filterUsingPredicate doesn’t return a value (notice in the docs that it has a void return type). If you want to leave theClientListArray unaltered, you need to use filteredArrayUsingPredicate instead.

Ric

Thanks Ric, and it’s incredibly fast.

Now to see what /i can do with loading files.

the format is {datestring, Email Clients name & ’ ', and othe stuff as paragraphs.

I need to pull out Email Clients name & ’ ',

Should be fun!

Regards

Santa

G’day
Ric, (or whoever can help)

This part, whilst much, much faster than Applescript, is bogging down.

How would I go about replacing it with ObjC (seeing as you mentioned how fast it was, grin)

Regards

Santa


set yy to count of theClientList
		 repeat with xx from 1 to yy
			 set stringToFind to item xx of theClientList
			tell current application's NSPredicate to set thePredicate to predicateWithFormat_("SELF matches[cd] %@", stringToFind)
			set theFilteredArray to theClientListArray's filteredArrayUsingPredicate_(thePredicate)
			set item xx of theClientList2 to theFilteredArray's |count|()
		end repeat
		

Nobody mentioned applescript’s own way to loop through list much much faster. Set the list in an scipt object and you can iterate through it much much faster. See script below

script speedUp
	property publicList : missing value
end script

--first we need to make a list 
set aList to {}
repeat with x from 1 to 125000
	set end of aList to x
end repeat
--before we're here this will take a few seconds
--now we're setting the list in an script object
display dialog "We'sve created a list and now we going to sum all values"
set speedUp's publicList to aList
--now I'm going to sum all values
set sumOfAllValues to 0
repeat with aNum in speedUp's publicList
	set sumOfAllValues to sumOfAllValues + aNum
end repeat
display dialog "repeat loop is finished, Click OK to do it the old fashioned way (takes like forever)"

--here is the old way to compare the differences. This takes several minutes on an i7 machine. 
set sumOfAllValues to 0
repeat with aNum in aList
	set sumOfAllValues to sumOfAllValues + aNum
end repeat
return sumOfAllValues

Yes, indeed it is much faster. But this syntax won’t work as is in ASOC. The script object is the equivalent of an ObjC class in ASOC, and there can’t be more than one in each file I believe.

Maybe if you create another AS class file and put your variable in there, then connect it to the main app delegate via an object cube in IB. I don’t understand why it would be faster this way, but hey, as long as it works!

No matter what Santa, never, ever use a repeat loop with numbers (like repeat with anItem from 1 to 20000), unless you have no other choice. They are 8 times slower than going through a list of items with something like “repeat with anItem in listOfItems”. If you need a counter, set a variable to 1 before the repeat loop begins and increment it by 1 just before the end repeat line.

Thus, this code would improve in speed:

set yy to count of theClientList
 repeat with xx from 1 to yy
           set stringToFind to item xx of theClientList
           tell current application's NSPredicate to set thePredicate to predicateWithFormat_("SELF matches[cd] %@", stringToFind)
           set theFilteredArray to theClientListArray's filteredArrayUsingPredicate_(thePredicate)
           set item xx of theClientList2 to theFilteredArray's |count|()
 end repeat

if you were to change it to this:

set xx to 1
repeat with stringToFind in theClientList -- assuming the variable is a list of items
           tell current application's NSPredicate to set thePredicate to predicateWithFormat_("SELF matches[cd] %@", stringToFind)
           set theFilteredArray to theClientListArray's filteredArrayUsingPredicate_(thePredicate)
           set item xx of theClientList2 to ((theFilteredArray's |count|()) as integer)
           set xx to xx + 1
end repeat

Keep it up Santa! Your project is entirely feasible, I’ve seen worst! :slight_smile:

Thanks Leon.

I’ve amended my script to not use a numbered repeat loop, but to be honest, it takes exactly the same time, 202 seconds for 28,067 email addresses, with 1200+ Clients.

However, I’m using the ‘like’ predicate now. There seemed to be a problem with ‘Matching’ as it never ended. I can’t find anywhere to tell me if the terminology used should be different.

DJ, I’ve tried your method, and it certainly has sped up parts of my script, but using it on the attached code only cut 2 seconds off.

Regards

Santa.


set y to 1
		repeat with xx in theClientList
			set BarCount to BarCount + MinorIncrement
			Progress's setDoubleValue_(TheBarIncrement * BarCount)
			tell current application's NSPredicate to set thePredicate to predicateWithFormat_("SELF like[cd] %@", xx)
			set theFilteredArray to theClientListArray's filteredArrayUsingPredicate_(thePredicate)
			set item y of theClientList2 to (theFilteredArray's |count|()) as integer
			set y to y + 1
		end repeat
		

Thanks Leon, I forgot about the inheritance limits of 1 in ASOC. I didn’t test my script in asoc, I did in script editor. It leads me to another question. If I make a new script file and put only the line property publicList : {} in it and load it with load script, does it still have the same problem?

Another thing leonard:

I’ve tested it and it worked properly even in an asoc project with an script object inside the script file and with the property in e separate file. But I ran into a limitations in asoc that I don’t have in Applescript.

The next lines runs in an memory allocation error.

set aList to {}
repeat with x from 1 to 125000
   set end of aList to x
end repeat

it’s something about that I reached the maximum number of byte usage. When I change my previous script with a loop of 95750 the speedup script object works as expected and is still very fast compared with a normal list. The most weird thing is that when working in 96000 cycles sometimes it will run and sometimes it wont. So it seems that the size of my list isn’t the same size in bytes every run.

I didn’t know about the applescript’s list limitation in ASOC. Can’t figure out why an applescript list has limitations in ASOC and not in ASS or in a script.

Well, you don’t need to use load script in ASOC. Just create a new NSObject in the main NIB file of your app in IB (by dragging the blue Object cube from the library to the NIB file), and then set it’s class in the view identity section of the Inspector the the name of your new class file and that’s it.

Then create a property with missing value in your main app delegate and connect it in IB to your shiny new blue cube. Then from your main app delegate you can set the other class’s property like this:

set theOtherClass's theProperty to {}

and call it’s content like this:

set theOtherClassPropertyContent to theOtherClass's theProperty

I’m certainly no Objective-C expert, but I think the following objective C code will do what you want, but I can’t test it since I don’t have the lists to test it with. Both theClientList and theClientListArray will have to be NSArrays not applescript lists for this to work. To add the new file, go to the file menu of xcode and choose New File. Choose the Objective-C class, push next and then name the new file ObjCFile. You should get a .h and a .m file. Copy and paste so the .h file looks like this:

#import <Cocoa/Cocoa.h>

@interface ObjCFile : NSObject {

}

+(id)countStuffFrom:(NSArray *)theClientList in:(NSArray *)theClientListArray;
@end

Then go to the .m and make it look like this:

#import "ObjCFile.h"

@implementation ObjCFile

+(id)countStuffFrom:(NSArray *)theClientList in:(NSArray *)theClientListArray{
    NSMutableArray *theClientList2 = [[NSMutableArray alloc]init];
    for (id stringToFind in theClientList){
        NSPredicate *pred =[NSPredicate predicateWithFormat:(@"SELF matches[cd] %@",stringToFind)];
        NSArray *theFilteredArray = [theClientListArray filteredArrayUsingPredicate:pred];
        [theClientList2 addObject:[NSNumber numberWithInt:[theFilteredArray count]]];
    }
         return theClientList2;
}
@end

In your ASOC code you would call this method by passing it both theClientList and theClientListArray like so:

set theClientList2 to current application's ObjCFile's countStuffFrom_in_(theClientList, theClientListArray)

I hope this helps.

Ric

Not sure on what the limitations are either, but in this case I prefer to play with Cocoa objects. So this might do the trick (untested):

set aList to current application's NSMutableArray's arrayWithCapacity_(255) --it will expand as needed
repeat with x from 1 to 125000
aList's addObject_(x)
end repeat

Then you can use NSArray’s querying methods to get values out of the array, like objectAtIndex:, and check if it contains an object with containsObject:.

Hi,

be careful with this line


[theClientList2 addObject:[NSNumber numberWithInt:[theFilteredArray count]]];

the count method of NSArray returns an unsigned integer (NSUInteger), which is an unsigned int (32 bit) on a 32 bit system and a unsigned long (64 bit) on a 64 bit system.
The numberWithInt method of NSNumber expects always a 32 bit value.
Calling the numberWithUnsignedInteger method avoids this rare but possible problem.


[theClientList2 addObject:[NSNumber numberWithUnsignedInteger:[theFilteredArray count]]];