Text encoding problems

Apparently when AS does file IO it uses Mac Roman encoding, but when it does “do shell script”, it’s UTF-8. I’m trying to keep only UTF-8 through out my app, but the problem is, there’s no command available to read in an UTF-8 encoded file (and does the implicit encoding into Mac Roman needed for displaying, etc.)

I’ve tried this code

set s to "��"
set tmpPath to (path to home folder) as string

set fooF to (open for access file (tmpPath & ":foo.txt") with write permission)
set eof fooF to 0 --empty anything previously in the file
write s as Unicode text to fooF
close access fooF

but the file is still in Mac Roman encoding.

Anyone has suggestions? Thanks!

Found the solution - just change the write statement to

write s as «class utf8» to fooF

I guess Unicode text is UTF-16.

You can also do it via shell (for little strings):

set x to "tést"
do shell script "echo -n " & quoted form of x & " > ~/Desktop/test" --> write
do shell script "cat ~/Desktop/test" --> read

Odd enough: one would expect file reading would work the same way, like this

set tmpPath to (path to home folder) as string

set fooF to (open for access file (tmpPath & ":foo.txt")) 
set s to read fooF as «class utf8»
close access fooF

and this indeed works. BUT, when I added "using delimiter “n” to the read statement:

set s to read fooF as «class utf8» using delimiter "n"

s goes back to non-unicode again! I’ve also tried

set s to read fooF as «class utf8» using delimiter ("n" as «class utf8»)

and it doesn’t work either.

Any suggestion on this?

Yes this is handy. However I’d like to use "read … using delimiter … " instead of reading in one big string and then use “every text item” to split it into a list - this approach seems to be much slower than the “read … using delimiter” approach…

Somewhere in this forum or applescript-users list there is an explanation about this issue. You can’t read as UTF-8 and use a delimiter to read it.
Use instead (pseudo-code):

paragraphs of (read ...)

Can never say how MUCH I want to thank you!!!

ok turns out one problem is solved, and I’m now banning my head over another…

This one is ASS-related, but since we already started this thread on unicode.

I have no problem using “as «class utf8»” for reading/writing UTF-8 files in Script Editor. But I have mixed results in ASS. In the main script of my app, it’s perfectly ok. But in another supporting script, using the “as «class utf8»” statement compiles fine, but when running I got a small red “x” in front of that line, with error basically saying the token is invalid.

Any hint on this one? Thanks!

What is the code and context?

This is sort of following the topic I started about calling handler/sharing properties of another script in ASS. The scenario is like this: say I have two scripts A and B (two files) and each of them is directly connected to some widget in the main NIB file, and I have a 3rd script C, which is not connected to any widget, and is basically a collection of utility functions that both A and B will call.

Now I can use the “as «class utf8»” trick in both A and B, but I can’t use it in C. Using it in C compiled fine but when run it simply showed the red ‘x’ and an invalid token error message in Xcode.

I wonder if whether connecting to a NIB file has anything to do with this… if so, is there a second way to read/write UTF-8 (other than do shell script trick)?

Some times osaxen commands will fail when they are inside tell blocks (for example). That’s way I asked about the code.