Reading and writing files

Juerg · December 9, 2002, 4:27pm

Hi all!

I have a query regarding the reading and writing of files.

I’m currently writing a script which needs to insert a potentially large chunk of data into an existing file. The original file’s size could be 100K or it could be 100Mb.

What’s the best way of inserting the data?

Should I read the first chunk of data and save that as a new file (File 1), then write my new data to another separate file (File 2), and write the remaining data from the first file to a third file (File 3)? I could then read the three files in the correct order (File 1, File 2, File 3) and overwrite the original file? Is this a good idea or is there a better way of achieving this? Maybe there are a few tricks to do with moving the EOF marker?

Also, I’m guessing that I should I limit the size of data which is read in to small chunks such as 8192 characters at a time. So in fact files 1, 2 and 3 in the above example might be split into smaller chunks which will be read in and written to the new file immediately.

I’m not after a complete solution but a few pointers would be appreciated.

Many thanks,

Juerg

Rob · December 9, 2002, 4:34pm

I don’t know what the best method is but if you decide to break the text into chunks, you might want to take a look at chunkTextLib 1.1, from the infamous ‘has’.

Mytzlscript · December 9, 2002, 5:23pm

Hi Juergen,

I have a couple of questions about your existing file:

Are you having a script go back and read this file for any reason or are you
just writing to it for an activity log?

Does it matter what order the entries are in? Do newest entries need to be logged at the top of the file, or does it even matter?

I’ve had good luck with Tanaka’s appendtofile command:

set MyData to “Whatever new data you are adding to the existing file”
set ExistingFile to “Mac HD:Some Folder:Your File Name”
appendtofile return & MyData to ExistingFile
–this adds the new text to the bottom of the text file, add return if you want a line break before the new entry

Or, if you need the data to flow newest to oldest use Tanaka’s readfromfile and writetofile commands

set MyData to “Whatever new data you are adding to the existing file”
set ExistingFile to “Mac HD:Some Folder:Your File Name”
set ExistingText to readfromfile ExistingFile
set UpdatedFile to writetofile MyData & return & ExistingText to ExistingFile
–add a return or as many as you like between your new entry and the existing text

The readfromfile command reads a 20,000 line text file pretty efficiently and the writetofile doesn’t require to you open the file with write permission or mess around with the EOF.

You can get Tanaka’s OSAX at http://osaxen.com/tanaka.html

Hope this was beneficial,

Juerg · December 9, 2002, 5:57pm

Hi Mytzlescript,

My script will embed font information into existing EPS files. The code works fine for now as I’m only testing on 100K files but I’m worried that the whole thing will come crashing down on me when I start processing larger files!

There’s a fair bit of data manipulation occuring before I piece the chunks back together again, but essentially there will be three parts which need to be puc back in the correct order.

Also, I should’ve pointed out that I’m after a Vanilla solution if possible (does Standard Additions come under Vanilla?)

I’ll still give your ideas a try though.

Thanks very much,

Juerg

Juerg · December 9, 2002, 6:00pm

Hey Rob,

Yeah, should’ve checked there first. I thought I had all the libraries I’d need from has’ site but that one must’ve escaped my notice!

I’ll take a look - see what it can do!

Cheers,

Juerg

Mytzlscript · December 9, 2002, 6:51pm

Juergen,

I understand. You are editing the text of EPS files and want to keep the script using Standard commands only right?

I’ve worked with altering EPS files but have never messed with the fonts.

You could use the “Offset” command in the Standard Addition.

like:

set EPSFile to “Some File Path”
set EPSFileText to read EPSFile from 0 to 2000
set EPSHeader to offset of “%%DocumentFonts: AmericanTypewriter-Bold” in EPSFileText

That should give you an idea of where your EPS Font info is

As for breaking larger files up into chunks to make the read stable you could try reading it in chunks of 2000 characters - kind of like…

set EPSFile to “Mac HD:Desktop Folder:2676890.eps”

set newtext to “”–set to nothing for now, will be our finished eps file
set startread to 0–start at 0
set endread to startread + 2000–change this to whatever works, make sure you change below too
set endreached to false
set EOFmarker to get eof of EPSFile
if EOFmarker <= endread then set endread to EOFmarker
–just in case the end of the file is less than the endread point

repeat until endreached = true
set ThisRead to read EPSFile from startread to endread
–read the EPS file for 2000 characters. You can adjust this to whatever works for you

set newtext to newtext & ThisRead as string
set startread to startread + 2000

set endread to endread + 2000

if endread >= EOFmarker then
	set endread to EOFmarker
	set endreached to true --this ends the repeat
end if

end repeat

I hope this is what you were looking for.

Best,

Juerg · December 9, 2002, 7:06pm

This is more in line with what I had in mind!

Unfortunately, what I’m trying to avoid is loading the whole file into memory, which is just what your code (and mine!) still does!

My idea was to read chunks of data then write them immediately to a new file, so the string which holds the data will only ever be a certain length - 2000 characters if I use your example. I’ll read 2000 characters from my EPS, storing the data in a string named ‘tempStr’, write that info to a file, then read the next 2000 characters into the same string (tempStr), thus replacing the previous text and then write that to the end of the file. Because the data that is being read is stored in the same variable each time, I’m assuming that I’m cutting down on my memory overheads. Is this correct?

If I use my current script - or yours - I end up with a string which holds the entire file in memory. It’s this that I’m curious about. Is it OK to do this, or should it be avoided?

Thanks,

Juerg

Rob · December 9, 2002, 7:34pm

I think if the chance exists that the script could encounter 100 MB files, working in chunks is the safest way to approach it. With proper error checking and logging, this would allow the script to pick up where it left off if something bad happened during the process. I’m speaking from a purely hypothetical perspective (never tried to do what you are doing), but I’d have to bet that large files would eventually cause problems.

Just 2 cents from a 1 cent scripter

Mytzlscript · December 9, 2002, 7:48pm

Actually the script I included only reads in 2000 character increments. If you wanted to write those 2000 characters you could do so in each repeat pass. Just add a line near the end of the repeat to write the new text to the file of your choice. Just be sure to get the eof each time and when you write to the file include the text “starting at” and then the EOF so it continually tacks the newly read text on the end. I haven’t tested it but it should produce a script that reads 2000 characters, then writes those characters to the end of a file, never actually reading all 100MB of the file at a time.

I agree with Rob about the error checking. I usually don’t include that in my posts so I can save space and focus on the task at hand. A bunch of Try’s never hurt anyone.

Best,

Mytzlscript · December 9, 2002, 10:05pm

I am a numbskull. The lines of code:

should read

set startread to startread + 2001 

set endread to startread + 2000

Otherwise it includes the last character in the beginning of
the next read. I am very sorry if that caused any problems.

Best,

Juerg · December 10, 2002, 11:20am

I am a numbskull. The lines of code:

set startread to startread + 2000

set endread to endread + 2000

should read
set startread to startread + 2001 

set endread to startread + 2000 
Otherwise it includes the last character in the beginning of
the next read. I am very sorry if that caused any problems.

Best,

No worries - I sussed that out for myself.

Thanks for your help on this - I have an idea of how I’m going approach the problem now.

I’ll read chunks of data in chunks of 4096 (2 ^ 12) characters, and append it immediately to a new file. Once I’ve read in the first part of the EPS file, I’ll then perform the same routine on the the font file (though I’ll need to do some data conversions along the way) and finally I’ll do the same again for the remaining part of the original EPS file.

I think this should work out fine.

Once it’s up and running I’ll take a look at reading/writing larger chunks of data - 65536 (2 ^ 16) maybe!? Though looking at PostScript font’s POST resource it may make more sense to read in chunks of 2048… So maybe I’ll just write a handler which uses a passed parameter to determine the ‘chunk’ size… That way I could use large sizes for the original EPS file and smaller, more convenient sizes for the font files…

Anyway, going back to your script, the line 'set newtext to newtext & ThisRead as string ’ concatenates the 2000 character string to newText on each iteration, so at the end of the whole routine ‘newText’ is the size of the original file - as stated in the comment at the top of the script… It was this that caused me to mention it in my previous post.


repeat until endreached = true 
set ThisRead to read EPSFile from startread to endread 
--read the EPS file for 2000 characters. You can adjust this to whatever works for you 

set newtext to newtext & ThisRead as string 
set startread to startread + 2000 

set endread to endread + 2000 

if endread >= EOFmarker then 
set endread to EOFmarker 
set endreached to true --this ends the repeat 
end if 
end repeat

Regardless of that, as I said earlier, I think I’ve got a better idea of what I’m going to do now. I also agree about the error checking though I don’t expect people to insert the necessary code when they post on the forum so don’t worry on my account!

Thanks very much to yourself and to Rob - who I think is probably more than just a 1 cent scripter!

I’ll let you know how I get on…

Juerg