Hi,
I am attempting to replace the ASCII 10 Line feed characters in a string variable with the UTF Line separator character or code point. This code point has a UTF-16 value of 2028 hex or in a hex editor appears as E2 80 A8 (hex).
I have written the following function but I am unable to correctly specify the correct code point.
on CleanText(pText)
set tFind to character id 10
set tReplace to character id 2028 --8232 does not work
-- save the existing delimiters
set prevTIDs to text item delimiters of AppleScript
-- find the newline hex 0A characters
set text item delimiters of AppleScript to tFind
set tText to text items of pText
-- add the replacement character / string
set text item delimiters of AppleScript to tReplace
set tText to "" & tText
-- resset the delimiters back to how they were
set text item delimiters of AppleScript to prevTIDs
return tText
end CleanText
I believe that \u2028 is a hex value which in UTF16 is 8232 decimal but this does not work. I believe that I need to specify the value in hex E2 80 A8 but the decimal value of 14844072 fails.
\u2028 is UTF8 not UTF16, hex2028 is dec8232, character id expects decimal values.
Your handler works if the text is UTF8 encoded, there is a constant linefeed representing character id 10 and of AppleScript is redundant in this case
on CleanText(pText)
set tFind to linefeed
set tReplace to character id 8232
-- save the existing delimiters
set prevTIDs to text item delimiters
-- find the newline hex 0A characters
set text item delimiters to tFind
set tText to text items of pText
-- add the replacement character / string
set text item delimiters to tReplace
set tText to tText as text
-- resset the delimiters back to how they were
set text item delimiters to prevTIDs
return tText
end CleanText
I finally worked it out. I had to ensure that UTF 8 was written to the clipboard otherwise Applescript changes the text to Ascii which means the LS character is replaced with LF.
You shouldn’t. AppleScript will write it correctly:
on CleanText(pText)
set tFind to linefeed
set tReplace to character id 8232
-- save the existing delimiters
set prevTIDs to text item delimiters
-- find the newline hex 0A characters
set text item delimiters to tFind
set tText to text items of pText
-- add the replacement character / string
set text item delimiters to tReplace
set tText to tText as text
-- resset the delimiters back to how they were
set text item delimiters to prevTIDs
return tText
end CleanText
set x to "one" & linefeed & "two"
set x to my CleanText(x)
set the clipboard to x
set y to the clipboard
id of character 4 of y
--> 8232
I meant the former at the time, but on reflection the latter is probably better. The general principle with the clipboard is to pass the “richest” version of what you have. The clipboard will then be able to offer it where requested, as well as simpler versions for clients that can’t cope with it (unlikely to be any these days anyway).
Code points (slash + ‘u’ notations) are unicode character values, not byte code. Therefore It’s neither UTF8 or UTF16. It may seem UTF16 because an large portion of all valid unicode points are identical to UTF16 but when characters require two 16 integers they will differ.