Read/Write query, help required please....

TecNik · January 30, 2008, 3:56pm

Hi there,

I’m trying to write some text to a file which works ok apart from one slight niggle.
When the copy gets added it looks like it’s some sort of Chinese copy?

Here’s the little test:-

set LogFile to "PATH-TO-FILE:test_text3_rw"

try
	
	open for access file LogFile with write permission
	
	set addThis1 to "TEST TEXT"
	
	write addThis1 to file LogFile starting at eof
	
	close access file LogFile
	
	
on error errText number errNum
	close access file LogFile
	display dialog errText
end try

Please would someone point me in the right direction.

Thank you

Nick

P.S. If I try this:-

write addThis1 to file LogFile starting at 0

the text is added correctly?
Could it be that there’s some sort of character at the end of the file changing the format???

chrys · January 31, 2008, 8:04am

First, we are lacking some important information to better help you. Under what OS and AppleScript version are you seeing this problem? What method or application are you using to read the resulting file? How was the file created? What other programs or scripts or AppleScript write commands put data in this file?

The bit about Chinese copy (I assume you mean copy in the publishing sense of the word: text to be printed) makes me think it is an encoding issue. When ASCII/Latin-1/MacRoman /UTF-8(first 127 code points only) encoded text is read in as UTF-16, is often takes the form of Asian glyphs.

Try deleting/renaming the file and adding “as Unicode text” to all your write command(s).

I suspect that the initial writes to this file were from a string with the class Unicode text (you might not notice since the system produces strings like this when you ask Finder for filenames and such). In Mac OS pre-10.5 if no as parameter is supplied to the write command, Unicode text strings are written as UTF-16 but normal non-Unicode strings are written out in the system’s “primary” encoding (probably MacRoman, which looks a lot like UTF-8, Latin-1, or ASCII for plain English text). Because of changes in Unicode handling in Mac OS 10.5, Apple recommends always supplying either as text to use the system’s primary encoding or as Unicode text to use UTF-16. I think you can also safely use as «class utf8» to read/write UTF-8 (the AppleScript objects will be of class Unicode text; even though the as parameter uses the same word as an AppleScript coercion, it is not a coercion and should not be confused for one).

TecNik:

P.S. If I try this:-
write addThis1 to file LogFile starting at 0
the text is added correctly?
Could it be that there’s some sort of character at the end of the file changing the format???

It seems more likely to me that there is UTF-16 encoded text (or possibly a Unicode byte-order mark) at the beginning of the file that indicates to the read command that the entire file is encoded as UTF-16. This assumption is broken when your write command writes its new data out as MacRoman encoded text. This would cause the appended text to be garbled (most likely showing up as Asian glyphs). By clobbering the early bytes of the file you will overwrite the first bit of UTF-16 encoded text (or byte-order mark). This means that applications will generally either use a default encoding (probably UTF-8, maybe MacRoman (AppleScript read with no as) or Latin-1) or try to auto-detect the encoding. If there are subsequent U+0000-U+007F (plain English text) UTF-16 encoded characters in the file that are not overwritten (are you doing set eof of fileRef to 0 to truncate the file?) they might appear normal, but they will have extra null characters (U+0000) between them from having been written as UTF-16 (16 bits per character) but read as UTF-8/MacRoman/Latin-1 (8 bits per character).

TecNik · January 31, 2008, 12:03pm

Hi Chris,

Thank you for your reply and for all the information which is very useful.

To answer a few of yours, I’m on a G5 running 10.4.10 and Applescript 1.1.
The Applescript version came as a suprise given I’m running OS 10.4.10.

The file I’m trying to write to is one that has been exported from Quark 7.31 and is an Xpress Tags file.
This may explain the encoding problem you’ve talked about in your reply.

I’d written a script that pulls data from relevant columns in Excel, concatenates it and puts tabs in where required. It also puts in style tags as required.
This would all be saved in a text file and look something like this:-

<@interBold>1 <@interReg>CELL INFO 1
<@interBold>2 <@interReg>CELL INFO 2
<@interBold>3 <@interReg>CELL INFO 3
<@interBold>4 <@interReg>CELL INFO 4
<@interBold>5 <@interReg>CELL INFO 5

The script would then open up a Quark 6.52 doc and import the text into a named text box.
This all stills work fine in Quark 6.52 which is the version it was developed in and the text is styled correctly as per the style tags.

However, enter Quark 7! The script still imports the text however it doesn’t interpret the style tags and they end up as text in the box.
I’ve since found that Quark 7 is a little more fussy regarding Xpress Tags and it puts some header info in the file too.

Hence the route I’ve taken. I’ve exported an Xpress Tags file from Q7 with no content so in theory all I have is the header.
What I’m trying to do is open the file with write permissions so I can add the content I’ve created from Excel.

Hopefully this has given you a better picture of what I’m trying to achieve.

Thanks again for your help and any further thoughts would be appreciated though I think the encoding one is a good place to start.

Regards,

Nick

Model: G5
AppleScript: 1.1
Browser: Safari 312.3
Operating System: Mac OS X (10.4)

chrys · February 1, 2008, 12:45am

Thanks for the info, it does seem to give credence to the idea that the garbled text is plain English text in ASCII/MacRoman/UTF-8 being interpreted as UTF-16.

Unfortunately, I do not have any version of Quark, so I can not help with those parts of your task.

If you have not yet tried putting as Unicode text on all your script’s write commands, do try that. You also might do some research into the encoding of the file to which you are attempting to append. I suspect it is UTF-16. It might even have a BOM. The small script below might tell you something useful if there is a BOM. If it is just plain English text though it will probably just return “ASCII text”, unless you are using an odd encoding. If it is UTF-16 without an byte order mark, it will unfortunately just return “data”.

choose file with prompt "Pick a (text) file to attempt to identify its encoding"
do shell script "/usr/bin/file " & quoted form of POSIX path of result

Oh, and the AppleScript version is probably 1.10.7. Whatever reported it as 1.1 probably stored it in a floating point representation during transit, which is usually a bad idea for just this reason (besides losing the “point release” info, too).

TecNik · February 1, 2008, 8:28am

Hi Chris,

Thanks for your reply.

With your help and something mentioned by a Scripting Ace on another site I’ve managed to find out what was causing the problem. It was an encoding problem, I tried exporting the Xpress Tags Text file with Mac-Roman encoding and everything worked ok. Originally I’d saved the file, I was using as a template, with Unicode (UTF-16) encoding hence the problem.

Thank you once again for your time on this it’s much appreciated.

Regards,

Nick