I’m working on an HTML editor and have created a table that has all the HTML character entities (" " and so forth). The problem is that the table is being filled by a text file that I found that is in the format: ¢ & #162; ¢ ¢ cent sign
and during startup I load the table from the text file, reordering items to make it easier to read and dropping the duplicate glyph. So far so good, table loads just fine, no problem there.
The problem is that I can’t seem to find a character encoding for the text file that works. If I use unicode (8 or 16, doesn’t matter) then all the character glyphs get the ¬ character in front of them and some of them are the wrong character. So the line above comes out: ¢ & #162; ¬¢ ¬¢ cent sign
and (for example) the currency symbol ¤ & #164; ¤ ¤ currency sign
comes out like this: ¤ & #164; ¬¢ ¬¢ currency sign
And if I try to resave the text file in Mac OS Roman, Textwrangler and XCode inform me that the chosen encoding can’t handle all the characters.
Telling AS to use unicode text doesn’t help, neither does international text or just plain text. Someone please tell me that there is a way around this!
Edit: Oops, this is in the AppleScript Studio forum, maybe read f as «class utf8» is not available in that environment. If so, please disregard, or let me know in a PM, and I’ll delete this post. Either way, the line continuation characters seem to indicate reading UTF-8 encoded text as MacRoman. You could also check eacute against my result for further evidence one way or another.
on run
set readAsEncoding to {text, «class utf8», Unicode text} -- text uses the system's default encoding, which is probably Mac Roman; Is there an explicit encoding specifier for Mac Roman or Latin-1?
set encodingExt to {"macroman", "utf8", "utf16", "latin1"}
set txt to "" as Unicode text
repeat with ext in encodingExt
set falias to alias ((path to desktop as Unicode text) & "test text." & ext & ".txt")
repeat with enc in readAsEncoding
try
read falias as enc
on error e
"<error: " & e & ">"
end try
set t to ext & " as " & enc & return & result & return & return
log t
set txt to txt & t
end repeat
end repeat
return txt
end run
My result for “utf8 as text” (which is UTF-8 as Mac Roman on my system) is
AS Studio doesn’t “take away” any of AS’s basic functionality, it just adds Cocoa objects via the ASkit dictionary, so yes, I can still try read f as «class utf8». That’s a good thought, I assumed (probably wrongly) that as unicode text would handle that. I’ll give that a try!
Thanks for the info about AppleScript Studio. I have a vague idea about what it does, but I have never had cause to need it yet, so I have not read any proper documentation about it. I had read one or two posts here that said something like “Oh, that does not work in AppleScript Studio, only in vanilla AppleScript”, so I was concerned that read from StandardAdditions might have been one of those things. My impression was that AppleScript Studio replaces some small fraction of the base “vanilla AppleScript” dictionary with slightly incompatible stuff.
It is my understanding (and the full result of the experiment in my previous post bears it out) that Unicode text means UTF-16, and there is no auto-detection for the case that the data is actually UTF-8.