I have made an odd discovery or I am making an error in observation. Hopefully someone can explain this. I have a code segment that is building a string in a loop. Prior to starting the loop, I set the string to null (set my_string to “”). Seems to me like a pretty standard thing to do. However, after appending things to the string, there is a null (ASCII character 0) at the beginning. This makes no sense to me. What am I doing wrong? TIA.
As it turns out my initial post was an error in understanding where things were coming from. Since what I have discovered is so bizzare, it was not even on my radar screen until I did more in depth tracing. Particularity since everything appears correct with normal tools until I dumped out the data in hex. Anyway, here is was is really happening. Given the following code segment:
write "dn: cn=" & FirstName & " " & LastName & ",mail=" & theemail & new_line to ref_num write "objectclass: top" & new_line & "objectclass: person" & new_line & "objectclass: organizationalPerson" & new_line & "objectclass: inetOrgPerson" & new_line & "objectclass: mozillaAbPersonAlpha" & new_line to ref_num write "givenName: " & FirstName & new_line to ref_num write "sn: " & LastName & new_line to ref_num write "cn: " & FirstName & " " & LastName & new_line to ref_num write "mail: " & theemail & new_line to ref_num write "modifytimestamp: 0Z" & new_line to ref_num write "telephonenumber: " & directno & new_line to ref_num write "telephonenumber1: " & corporateno & new_line to ref_num write "fax: " & faxno & new_line to ref_num write "street: " & thestreet & new_line to ref_num write "l: " & thecity & new_line to ref_num write "st: " & thestate & new_line to ref_num write "postalCode: " & zipcode & new_line to ref_num write "c: USA" & new_line to ref_num write "company: " & Org & new_line & new_line to ref_num
Here is the bizzare part. The hex output from the first line shows that a null (ASCII Character 0) is inserted before each character. In other words the hex dump of that line is:
0064 006e 003a 0020 0063 006e 003d 0043 0061 … 000a
The next line output is fine.
6f62 6a65 6374 636c 6173 733a 2074 6f70 0a6f 626a 6563 7463 6c61 7373 3a20 7065 7273 6f6e 0a6f 626a 6563 7463 6c61 7373 … 610a
The next three lines contain the nulls like the first. The 4th line is again correct. The next 3 lines again have the nulls, the rest of the lines are good except the last one which has nulls. I am completely and totally baffled.
In a desperation attempt, I got it to work (even a blind squirrel finds an occasional nut). I am still clueless as to why I got the nulls like I did but the fix was to add ‘as string’ to each write.
By default, the Standard Additions write command writes text in your primary encoding (e.g. MacRoman) when you pass it a string value and in UTF16-BE when you pass it a Unicode text value. To ensure text is written in a specific encoding, specify the desired type via the write command’s ‘as’ parameter:
write txt to f as string -- primary encoding write txt to f as «class utf8» -- UTF8 write txt to f as Unicode text -- UTF16-BE
If you want to add a Byte Order Mark to Unicode files, write the following to the file first:
write ((ASCII character 254) & (ASCII character 255)) to f as string -- UTF16-BE BOM write ((ASCII character 239) & (ASCII character 187) & (ASCII character 191)) to f as string -- UTF8 BOM
Search the forum archives for more information on how AppleScript deals with Unicode and text encodings. e.g. Here’s a list of past threads on the subject that I’ve contributed to.
You’ll also find tons of general info all over the internet, e.g. here’s a good starting point.