I’m just testing a few things with different text encodings
in conjunction with AppleScript’s read/write file capabilities.
I found this routine by jj, which considers also the Byte Order Mark of Unicode text.
-- Convert a plain text file to a utf-16 (aka "Unicode text") file
textfile2utf16(choose file, "BE")
to textfile2utf16(theFile, BOM)
set oldContents to (read theFile)
set f to (open for access theFile with write permission)
set eof of f to 0
if BOM is not "" then write (ASCII number 254) & (ASCII number 255) to f
write oldContents to f as Unicode text starting at eof
close access f
end textfile2utf16
But instead of FE FF it writes this character string (hexadecimal view):
I think it’s how write works. If you write something as a record, for example, there’s also a preamble indicating that the following string is a record. I think listlong2long2 refers to two 16-bit integers.
This:
set f to open for access ((path to desktop as text) & "record") with write permission
set eof of f to 0
write {name:"Adam C Bell"} as record to f
close access f
Produces this text: “reco[unprintable char]pnamTEXT[unprintable char] Adam C Bell”
In Nigel Garvey’s paper in unScripted, he showed that write has an as parameter similar to AppleScript coercion, in that it causes the data to be written to the file as some other type than whatever it was. Without it, the item to be written is written in a form that represents whatever it is already. This isn’t necessarily the same as its AppleScript format, but among the things that are, you can write plain text “as Unicode text”, but then will have to read it that way too.
As another example, he wrote that two of the thirty-two bits in an AppleScript integer are devoted to the code that identifies it as being an AppleScript integer. (That’s why AppleScript integers only have 30-bit signed values.) But an integer value written to file will be a full 32-bits wide.
With this parameter, write can mimic some of the coercions the AppleScript language can do ” and can do a few that the language can’t. For instance, not only can reals be written to file as integer, or integers as real; but either of these (or their text equivalents) can be written as double integer (eight bytes), as extended real (ten bytes), as short (two bytes), or as small real (four bytes), none of which exist in the AppleScript language itself. (as short can also be rendered as short integer or as small integer.) If a number’s written as a type that’s too small to hold it, information will be lost ” typically the high-order bytes of an integer or the precision of a real. These non-AppleScript number classes are really for specialist use.
When numbers are written to a file as string or as Unicode text, the text number produced has greater precision, and absorbs more digits before being rendered as “scientific notation”, than the result of the equivalent AppleScript coercion.
The AppleScript values true and false can’t be written to file as themselves unless they’re in a list or a record, but they can be written discretely as boolean (!), which in this case is a single-byte value of 1 or 0.
During my tests I discovered, that the write command doesn’t write automatically the Byte Order Mark at the
beginning of the file, if the text class is Unicode text.
TextEdit for example doesn’t recognize a Unicode plain text file properly without the BOM information
This demonstrats it: