I’m posting this here, but if it would be better posted in the Applescript Studio forum, please advise.
For those joining the program in progress:
I’ve written an Applescript program that parses FrameMaker MIF files; it worked perfectly (albeit slowly) as long as the MIFs were relatively small (eg. ~100,000 lines of text).
Larger MIFs have started arriving (eg. ~ 1,000,000 lines of text, roughly 20+ MB) and because my program worked by loading the whole file into memory before parsing the strings, Applescript crashes with an out-of-memory error.
I wrote this small line-by-line copy program with an eye to rewriting and optimizing my marker-moving algorithms to work on small chunks of the MIF at a time:
-- A Quick (?) line-by-line text file copy program
-- Get and set the filenames
set CurrentDoc to (choose file)
set DestinationFolder to (choose folder with prompt "Choose destination folder for cleaned files:")
set finalFile to (DestinationFolder & "Duped_File" as string)
-- open 'em
open for access CurrentDoc
-- open for access finalFile with write permission
set the finalFile to the finalFile as text
set the open_target_file to open for access file finalFile with write permission
-- preserve TIDs
set tid to text item delimiters
-- zero the tempstring
set tempstring to ""
-- loop through line-by-line, reading once and writing twice
-- using a Mac carriage return as the delimiter
try
repeat
set tempstring to read CurrentDoc until return
write tempstring to the open_target_file starting at eof
end repeat
close access CurrentDoc
close access finalFile
on error
close access CurrentDoc
close access finalFile
end try
set text item delimiters to tid
It works, but it takes roughly 30 minutes to do the 1,000,000 line program – and this is without actually doing any of the string-parsing.
I rewrote just this simple program (hard-coding the filenames) in C:
and in FreePascal:
and ran both from inside XCode.
The C and the Pascal file both took less than 10 seconds.
10 SECONDS, versus more than 30 MINUTES.
I know that I can rewrite the AS routine to read in larger chunks at a time, but I’m limited to how much I can read in at a time and still keep my parsing routines effective.
If someone could point out a way to speed it up so that it’s more in line with the C or Pascal, I’d be very appreciative.
I know I can rewrite my simple string-parsing algorithms in either C or Pascal fairly easily (he said, without ever having written a real C program, or touched Pascal in a decade-and-a-half), but I’ve never done a Cocoa interface, and I don’t know how hard it would be to duplicate Applescript’s drag-and-droplets, or the Get/Put/Choose dialogs… and while I want to learn Cocao anyway, right now’s not a good time, not when I need to get this thing finished and working efficiently ASAP.
My options seem to be staying 100% in Applescript and building a program that will take hours to run instead of minutes, dumping Applescript and learning enough enough Cocoa to use with either Objective-C or FreePascal, or writing the actual data-handling, string-parsing parts in C or Pascal and calling them from the Applescript droplet (which, after the marker-moving routines, calls TextWrangler to do a mess of search-and-replaces).
How hard would it be to convert just the simple C or Pascal programs above to a compiled routine that could be called from Applescript? Ideally, Applescript would take the dropped file and pass the input/output file name as parameters to the compiled routines.
If someone could step me through converting one of 'em (the FreePascal would be my first choice), I’m confident I could add in the parsing routines without too much problem. I’d be very grateful for the help.
Thanks,
Walt Sterdan