Why does READ a file only return the first characterI have a text file

John_Mosby · June 16, 2009, 3:18am

I have a text file that contains the following…

That is all that is in the file. The file is actually storing a default printer name for later reference. I want to read the file and retrieve the name of the printer for comparison with the current printer. My problem is that when I read the file I am only getting a “D” (the first character) as the file contents. If I designate the read command to read from 1 to 20, I get the first 10 characters. So I know the information is there. Why is the “Read” command ony get the first character of the text file?

Here is some of my code…

set myFile to “Macintosh HD 90:Applications:4D Product line 2004.4:4th Dimension 2004.4:Smackdefaultprtr.txt”
set dir_POSIX to (POSIX path of (“Macintosh HD 90:Applications:4D Product line 2004.4:4th Dimension 2004.4:Smackdefaultprtr.txt”))
set dir_POSIX to ReplaceText(dir_POSIX, " ", "\ ")
set fid to (open for access file myFile)
set old_Printer to read fid as text
close access fid
tell application “Printer Setup Utility”
set the_printer to the current printer
set the_name to the name of the_printer
if the_name is not old_Printer then
set the_count to the count of printers
repeat with x from 1 to the_count
if the name of printer x is old_Printer then
set the current printer to printer x
end if
end repeat
end if
end tell
on ReplaceText(theString, fString, rString)
set current_Delimiters to AppleScript’s text item delimiters
set AppleScript’s text item delimiters to fString
set sList to every text item of theString
set AppleScript’s text item delimiters to rString
set newString to sList as string
set AppleScript’s text item delimiters to current_Delimiters
return newString
end ReplaceText

chrys · June 16, 2009, 3:53am

This “ask for 20, got 10” symptom indicates that the data is probably stored in a UTF-16 encoding (uses two or four bytes per character, every other of those bytes will be NUL if the characters are before U+0100 (i.e. all the normal ASCII characters plus some others)). However, the read . as text command will always decode the data as if it was in the system’s primary encoding (usually MacRoman, which is a single-byte encoding, reading a two-byte encoding as a one-byte encoding creates the “10 for 20” symptom).

If the encoding is UTF-16BE (or the character data is preceded by a byte order mark (BOM)) you should be able to read it by changing to read . as Unicode text. If the encoding is something else, then you will need to transcode the data before reading it. You might find iconv useful.

set theText to do shell script "iconv -f UTF-16LE -t UTF-8 < " & quoted form of posixPathToUTF16LETextFile

Or maybe you could just arrange for the input file to be encoded in MacRoman instead of whatever it is currently using.

By the way, your problems with the truncated posts were likely from pasting a NUL byte into the Message text box. “NUL terminated string” is a common string representation. One of its problems is that it cannot directly represent the NUL character. Apparently the BBS system here uses such a representation at some point.

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari 4 Public Beta (4528.17)
Operating System: Mac OS X (10.4)

John_Mosby · June 16, 2009, 4:08am

This is how the file is created. How can I designate the “MacRoman” format?

tell application “Printer Setup Utility”
set oldprinter to name of current printer
set last_printer_file to open for access file “Macintosh HD 90:Applications:4D Product line 2004.4:4th Dimension 2004.4:Smackdefaultprtr.txt” with write permission
write oldprinter to last_printer_file starting at eof
close access last_printer_file
end tell

chrys · June 16, 2009, 5:51am

If all the writers and all the readers are in AppleScript, all you need to do is be consistent in the as parameters.

In your examples you used write . (no as) and read . as text. Prior to Leopard, this combination results in the inconsistency you saw if the original value written was of the Unicode text class. Under Leopard, both encoding and decoding would have been done with the primary encoding (no as = as text = as string). The AppleScript Release Notes for Leopard’s AppleScript say that it is best to always specify as text or as Unicode text for all writes and reads (mostly for cross-compatibility, but also just for consistent results).

Use write . as text to write text encoded in the system’s primary encoding and read . as text to read it.
Use write . as Unicode text to get text encoded in UTF-16BE and read . as Unicode text to read it.
Or possibly use write . as «class utf8» to get text encoded in UTF-8 and read . as «class utf8» to read it. I have never seen any documentation for this one though, it may be best to consider it unsupported unless you really need UTF-8 and you are willing to risk future incompatibility.