xml & diacriticals (accented letters)

I have a script that writes an xml file. It works just fine unless some text in the file includes a diacritical (an accented letter such as î, û, é, etc.). Then the file appears to be corrupted and I cannot either import into Illustrator or open with TextEdit. If I take out the diacriticals then it’s fine.

Does anyone know any way around this?

Thanks, Rick

You probably have to encode your high-ascii just like in HTML so diacriticals like é should be either é or & etc. There are probably better ways of doing this but you can just do a straight search and replace:

property search_entities : (characters of "&"<>ÄÅÇÉÑÖÜáàâäãåçéèêëíìîïñóòôöõúùûü†°¢£§•¶ß®©™´¨?ÆØ?±??¥µ?????ªº?æø¿¡¬?ƒ??… ÀÃÕŒœ–—“”‘’÷?ÿŸ?€‹›‡·‚„‰ÊÁËÈÍÎÏÌÓÔÒÚÛÙˆ˜¯¸fiflı˘˙˚˝˛ˇ")
property replace_entities : {"&", """, "<", ">", "Ä", "Å", "Ç", "É", "Ñ", "Ö", "Ü", "á", "à", "â", "ä", "ã", "å", "ç", "é", "è", "ê", "ë", "í", "ì", "î", "ï", "ñ", "ó", "ò", "ô", "ö", "õ", "ú", "ù", "û", "ü", "†", "°", "¢", "£", "§", "•", "¶", "ß", "®", "©", "™", "´", "¨", "≠", "Æ", "Ø", "∞", "±", "≤", "≥", "¥", "µ", "∂", "∑", "∏", "π", "∫", "ª", "º", "Ω", "æ", "ø", "¿", "¡", "¬", "√", "ƒ", "≈", "Δ", "…", " ", "À", "Ã", "Õ", "Œ", "œ", "–", "—", "“", "”", "‘", "’", "÷", "◊", "ÿ", "Ÿ", "⁄", "€", "‹", "›", "‡", "·", "‚", "„", "‰", "", "Ê", "Á", "Ë", "È", "Í", "Î", "Ï", "Ì", "Ó", "Ô", "Ò", "Ú", "Û", "Ù", "ˆ", "˜", "¯", "¸", "fi", "fl", "", "ı", "˘", "˙", "˚", "˝", "˛", "ˇ"}
property replace_entities_decimal : {"&", """, "<", ">", "Ä", "Å", "Ç", "É", "Ñ", "Ö", "Ü", "á", "à", "â", "ä", "ã", "å", "ç", "é", "è", "ê", "ë", "í", "ì", "î", "ï", "ñ", "ó", "ò", "ô", "ö", "õ", "ú", "ù", "û", "ü", "†", "°", "¢", "£", "§", "•", "¶", "ß", "®", "©", "™", "´", "¨", "≠", "Æ", "Ø", "∞", "±", "≤", "≥", "¥", "µ", "∂", "∑", "∏", "π", "∫", "ª", "º", "Ω", "æ", "ø", "¿", "¡", "¬", "√", "ƒ", "≈", "Δ", "…", " ", "À", "Ã", "Õ", "Œ", "œ", "–", "—", "“", "”", "‘", "’", "÷", "◊", "ÿ", "Ÿ", "⁄", "€", "‹", "›", "‡", "·", "‚", "„", "‰", "", "Ê", "Á", "Ë", "È", "Í", "Î", "Ï", "Ì", "Ó", "Ô", "Ò", "Ú", "Û", "Ù", "ˆ", "˜", "¯", "¸", "fi", "fl", "", "ı", "˘", "˙", "˚", "˝", "˛", "ˇ"}

set t to "This is my résumé"
set t to my replace_special_chars(t, false)

on replace_special_chars(t, use_decimal)
	if use_decimal then
		set r to replace_entities_decimal
	else
		set r to replace_entities
	end if
	repeat with i from 1 to count search_entities
		if t contains (item i of search_entities) then set t to my snr(t, (item i of search_entities), (item i of r))
	end repeat
	return t
end replace_special_chars

on snr(t, s, r)
	tell (a reference to my text item delimiters)
		set {o, contents} to {contents, s}
		set {t, contents} to {t's text items, r}
		set {t, contents} to {"" & t, o}
	end tell
	return t
end snr

Jon

Or, if your file is/should-be utf-8 encoded, as defined in the first line (eg, <?xml version="1.0" encoding="UTF-8"?>), just write the data “as «class utf8»”, and special characters will be translated automagically to utf-8 encoding and you will end with a properly formatted file.

Jon, I couldn’t get your solution to work. I think I just don’t know what I’m doing. Thank you for being so quick with your reply.

jj, I got yours to work. Thank you. I found somewhere else to start the file with two ASCII characters (“write (ASCII character 239) & (ASCII character 187) & (ASCII character 191)”). It works but I’m not sure why.

Thank you both for taking the time.

Windows’ notepad (or whatever is called) adds these three characters automatically when you save text files in UTF-8 encoding. Some “folks” need these three characters to recognize the utf-8 encoding of the file (for example, the PC version of the flash player).