AppleScript and UTF8 bug?

Hi all,

Suppose to have a txt file written as UFT-8 with these two words separated by linefeed:

Français (with the ç written by UTF-8 to file as 2 bytes)
France

Now you want to read this UTF-8 file using this simple script in an array:

set utf8File to “Macintosh HD:Users:john:Desktop:test.txt”
set fileRef to open for access file utf8File
set bufferList to read fileRef as «class utf8» using delimiter linefeed
close access fileRef

The result array don’t contains anymore the word Français written correctly.

Instead, if I use:

set utf8File to “Macintosh HD:Users:devsc:Desktop:test.txt”
set fileRef to open for access file utf8File
set bufferList to read fileRef as «class utf8»
close access fileRef

set AppleScript’s text item delimiters to linefeed
set bufferList to text items of bufferList
set AppleScript’s text item delimiters to “”

bufferList

The array contains the word Français written correctly.

Seems to be that read using delimiter is “broken”
Anyone can confirm this?
OS X 10.6.8

Hagi

Hi,

same here on 10.6.8. I recommend anyway


set utf8File to ((path to desktop as text) & "test.txt")
set bufferList to paragraphs of (read file utf8File as «class utf8»)

‘using delimiter’, like most other ‘read’ parameters, works at a byte level rather than at a character level. I’m not sure if it just hasn’t been upgraded to be compatible with Unicode or whether it’s not necessarily only meant for text.

Your successful method [Edit: or better still, Stefan’s] is a better one to use anyway, since it’s much faster. The file gets read in all at once and the editing takes place in memory instead of interrupting the disk read.

Hi Nigel and Stefan,

Thanks for the confirmation of strange “using delimiter” behavior.

Hagi