Specify Unicode Output

Hi,

I have a series of text files that are being saved (from an Applescript) in UTF-16 (Big Endian) with Unix Line Breaks. I’d like the script to save the text files as UTF-8 Encoding with Macintosh Line Breaks.

Is it possible to specify these parameters in Applescript?

This is the Save Portion of my script:

set AppleScript's text item delimiters to "."
			set textFileName to docName & ".txt"
			set AppleScript's text item delimiters to astid
			--save front document as "TEXT" in theTargetFile
			set theTargetFile to ("Macintosh_HD:Users:roadrocket:Desktop:convertedASP:" & textFileName) as text
			save front document as "TEXT" in theTargetFile

I have also tried the expression" set theTargetFile to Unicode text", but that doesn’t get the Macintosh Line Breaks right either.

TIA
Cliff

Both BBEdit and TextWrangler are capable of saving as UTF-8 with Mac paragraph endings, but I don’t see a way to specify those Save As… Options in AppleScript. Perhaps if you were to save a blank document with those options you could write to it and have it saved that way, but I didn’t try it.

Adam,

I’ll give that a go. I appreciate the directional assist. At the risk of looking completely incompetent, this file is just simply being imported into a Text Filemaker field, but it’s current encoding is not being interpreted correctly as it treats the carriage returns as “spaces”. I’ve tried to use the Troi File Plugins to restore the FMP encodings, and while it kind of works it adds and extra “space” to each character in the file which won’t fit the solution.

I actually had discovered the Textwrangler ability to save the text files in the UTF 8 and Mac line break and that lead me to post here to see if I could automate or script it as I have 1000’s of text files.

I have seen posts that indicate using a shell script will force the correct (Macintosh) line endings, I’ll pull out my Unix in a Nutshell and see if that might work.

Cheers,
Cliff

Hi Cliff,

I think I got this unicode text stuff right after reading from this article:

http://www.satimage.fr/software/en/unicode_and_applescript.html

Here’s an example that reads utf-16 and writes utf-8 with mac line breaks (carriage return).


set f to choose file
set t to read f as Unicode text
set r to t as record
set p to «class ktxt» of r
-- replace linefeeds
set lf to (ASCII character 10)
set user_tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {lf}
set _temp to text items of p
set AppleScript's text item delimiters to {return}
set p to _temp as string
set AppleScript's text item delimiters to user_tid
-- write as utf-8
set dp to (path to desktop as string)
set fs to (dp & "utf8.txt") as file specification
set ref_num to (open for access fs with write permission)
write p to ref_num as «class utf8»
close access ref_num

Not sure if this is what you’re trying to do.

gl,

Hi, kel.

Thanks for the link to the “Unicode and AppleScript” article, which I found very interesting. I’m always amazed at how flexible the File Read/Write commands are.

Your approach to what Cliff seems to want should be very effective when incorporated into a script to handle thousands of files. I don’t know if it’s relevant for Cliff’s purposes, but because you use the intermediate medium of strings, any Unicode-only characters will either be lost or cause the script to error. To preserve any such charcters, it’s better to keep the text as Unicode within the script:

set f to choose file
set t to read f as Unicode text

-- make sure paragraph endings are carriage returns.
set _temp to t's paragraphs
set user_tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {return as Unicode text}
set p to _temp as Unicode text
set AppleScript's text item delimiters to user_tid

-- write as utf-8
set dp to (path to desktop as string)
set fs to (dp & "utf8.txt") as file specification
set ref_num to (open for access fs with write permission)
try
	write p to ref_num as «class utf8»
end try
close access ref_num

Hi Nigel,

Thanks for the correction. So that’s how you replace returns in unicode text (return as unicode text). I didn’t read to the bottom of the article.

I still don’t get it when they say utf-8 characters use one or more bytes and utf-16 characters use two or more bytes. What’s the more and how does anything reading the unicode text know that the more bytes is part of the other bytes? For instance, if a character uses three bytes, how does whatever is reading it know that the third byte is part of the first two bytes. Unicode text is strange.

Thanks again,

Hi, kel. The ‘as Unicode text’ probably isn’t necessary, but I sometimes do an explicit coercion when setting a delimiter to act with Unicode text, just to be sure!

I believe it’s something to do with the high-order bits in the one-byte (utf-8) or two-byte (utf-16) “character” elements. If these high-order bits are set, they don’t contribute to the Unicode number themselves, but indicate that the elements are part of a larger grouping that makes up a single character. The other bits from each one- or two-byte element are then recombined according to a certain formula to make up the Unicode value of that character. That’s roughly how it works with utf-16 (though I don’t have the exact details to hand) and I imagine that something similar happens with utf-8.

Hi Nigel,

I see know. That’s why the article says that is only seven bits are used then utf-8, mac roman, and iso is the same. Thanks a lot.

Have a good day,

Kel & Nigel,

Thanks so much for the help. This site is such a fantastic resource. I appeciate all of the information you’ve provided.

The solution(s) work great. I tried inserting the code into my existing repeat loop and I get the resulting error:

-1409

I am simply choosing a folder, reading the conents of the files, converting them, and saving them to another folder.

Here’s the snipet of the script that is offending:

tell application "Finder"
	activate
	-- choose ASP folder 
	set ASPpath to (choose folder with prompt "Choose folder containing ASP's:") as string
	-- choose location to place folder 
	set the_items to items in folder ASPpath
	repeat with i from 1 to number of items in the_items
		set this_item to item i of the_items as string
		
		-- make sure paragraph endings are carriage returns.
		set _temp to this_item's paragraphs
		set user_tid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to {return as Unicode text}
		set p to _temp as Unicode text
		set AppleScript's text item delimiters to user_tid
		
		-- write as utf-8
		set dp to ("Macintosh_HD:Users:roadrocket:Desktop:SaveFile1:" as string)
		set fs to dp as file specification
		set ref_num to (open for access fs with write permission)
		try
			write p to ref_num as «class utf8»
		end try
		close access ref_num
	end repeat
end tell

Hi synapTECH,

Try not to place everything in a big tell app “Finder” block. You could get errors with keywords. Also, Some apps or scripting additions may not be able to use Finder type references. So you should change Finder references to alias references. I rewrote your script a little:


-- choose ASP folder 
set ASPpath to (choose folder with prompt "Choose folder containing ASP's:")
tell application "Finder"
	-- choose location to place folder 
	try
		set the_items to (items in folder ASPpath) as alias list
	on error -- error occurs if there is only one item
		set the_items to (items in folder ASPpath) as alias as list
	end try
end tell
repeat with i from 1 to number of items in the_items
	set this_item to item i of the_items
	-- make sure paragraph endings are carriage returns.
	set _temp to this_item's paragraphs
	set user_tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {return as Unicode text}
	set p to _temp as Unicode text
	set AppleScript's text item delimiters to user_tid
	-- write as utf-8
	set file_name to "SaveFile" & i
	set fs to ("Macintosh_HD:Users:roadrocket:Desktop:" & file_name) as file specification
	set ref_num to (open for access fs with write permission)
	try
		write p to ref_num as «class utf8»
	end try
	close access ref_num
end repeat

I didn’t text it out, but think it looks ok. Watching something on tv.

gl,

Cheers Kel,

I have been working on your latest post, and it seems to have a wee bit o’ hearturn on the following lines:

set this_item to item i of the_items
	-- make sure paragraph endings are carriage returns.
	set _temp to (this_item)'s paragraphs

It errors with : Can’t get every paragraph of alias “Macintosh_HD:Users:roadrocket:Desktop:tempsf:rm5101zarcq.txt”.

I’ve tried various ways to get around it, but of course, I am a stumped noob.

Hi synapTECH,

You left out the reading the file part:


-- choose ASP folder 
set ASPpath to (choose folder with prompt "Choose folder containing ASP's:")
tell application "Finder"
	-- choose location to place folder 
	try
		set the_items to (items in folder ASPpath) as alias list
	on error -- error occurs if there is only one item
		set the_items to (items in folder ASPpath) as alias as list
	end try
end tell
repeat with i from 1 to number of items in the_items
	set this_item to item i of the_items
	set t to read this_item as Unicode text
	-- make sure paragraph endings are carriage returns.
	set _temp to t's paragraphs
	set user_tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {return as Unicode text}
	set p to _temp as Unicode text
	set AppleScript's text item delimiters to user_tid
	-- write as utf-8
	set file_name to "SaveFile" & i & ".txt"
	set fs to ("Macintosh_HD:Users:roadrocket:Desktop:" & file_name) as file specification
	set ref_num to (open for access fs with write permission)
	try
		write p to ref_num as «class utf8»
	end try
	close access ref_num
end repeat

gl,

Kel,

Works great, however, I need to pass the filenames within the folder through as a part of the transformation.

As the script goes through the list, I have tried to capture the orginial filename by using the set name or get name function but it error’s with “Can’t get name of an alias”.

My thinking is that I need to tell the Finder to get the name of “this_item” as a variable for use in writing the appended file,…

I am frustrated because I can’t seem to set just the filename (not the whole path) as a variable and then append it to the write section

 set file_name to "SaveFile" & i & ".txt"

of the script?

Feel free to digitally slap me for missing something very basic here.

:confused:

Hi synapTECH,

You can use the ‘info for’ command to get the name. I thought you were renaming the file with “SaveFile”.


-- choose ASP folder 
set ASPpath to (choose folder with prompt "Choose folder containing ASP's:")
tell application "Finder"
	-- choose location to place folder 
	try
		set the_items to (items in folder ASPpath) as alias list
	on error -- error occurs if there is only one item
		set the_items to (items in folder ASPpath) as alias as list
	end try
end tell
repeat with i from 1 to number of items in the_items
	set this_item to item i of the_items
	set t to read this_item as Unicode text
	-- make sure paragraph endings are carriage returns.
	set _temp to t's paragraphs
	set user_tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {return as Unicode text}
	set p to _temp as Unicode text
	set AppleScript's text item delimiters to user_tid
	-- write as utf-8
	set file_name to name of (info for this_item)
	set fs to ("Macintosh_HD:Users:roadrocket:Desktop:" & file_name) as file specification
	set ref_num to (open for access fs with write permission)
	try
		write p to ref_num as «class utf8»
	end try
	close access ref_num
end repeat

gl,