will somebody be so kind and tell me how to return the unicode value of a character - for example “00DC” for “Ãœ”?
I’m trying to do find out how to do it for more than three hours now, and the more I try the more I get confused.
I’m pretty sure I figured out how to do it before by myself, but I just can’t remember.
UnicodeCharToHex("Ü" as Unicode text) --> "00DC"
to UnicodeCharToHex(u)
try
((text -2 thru -1 of ({{a:u}} as string)) as C string) * 5
on error msg
text ((offset of "cstr" in msg) + 4) thru ((offset of "00»" in msg) - 1) of msg
end try
end UnicodeCharToHex
Note that it relies on the “as C string” coercion, which is now a deprecated data type (but it still works in Tiger, and I think it will work forever while AS itself isn’t redesigned from scratch, as we have still lots of things in the dictionary from ancient times…).
For more powerful conversions (long text), you may use something as:
UnicodeTextToHex("Üchá" as Unicode text) --> "00DC0063006800E1"
to UnicodeTextToHex(u)
set q to (open for access ("/tmp/u.txt" as POSIX file) with write permission)
set eof of q to 0
write u to q
close access q
read ("/tmp/u.txt" as POSIX file) as «class paca»
try
result * 5
on error msg
text ((offset of "paca" in msg) + 4) thru ((offset of "»" in msg) - 1) of msg
end try
end UnicodeTextToHex
This may work fine for, ie, 100Kb of Unicode text. If you need more power (speed), you may use specialized tools (such as TextCommands, which is not as portable as a handler, but beats this routine in speed).
And - be warned - a naughty AppleScript bug exploit too (the list of record-to-string coercion).
If you need a vanilla solution, the easiest thing (as usual) is to use a shell script. e.g.:
on unicodeToHex(txt)
return do shell script (("python -c \"import sys; print unicode(sys.argv[1], 'utf8').encode('UTF-16BE').encode('hex')\" " as Unicode text) & quoted form of (txt as Unicode text))
end unicodeToHex
You’ll be limited in the amount of data you can convert, of course, unless you want to muck about with temp files instead of passing it on the command line. Curse Apple’s wretched ‘do shell script’ command for its continuing lack of stdin support, and go file a feature request on it.
Folks, I don’t know what to say… You’re great. Thank you so much. Looking at some of your code makes me doubt I did it before all by myself.
First I had a little trouble using your code but then realized that “Tex-Edit Plus” (wich I like scripting for it’s nice search/replace functions) is not good in handling japanese/chinese characters. I’m using BBEdit now and everything is working a-OK. Thanks a lot!