Nuanced handling of line endings by "the clipboard" command

I’d like to bring to attention some recent observations on the nuanced way the clipboard handles literal line ending characters for which I couldn’t find a previous discussion either in MacScripter or on the internet in general. Just to recap:

linefeed character:
Unicode code point = 10, AppleScript constant = linefeed, representation in text strings for this discussion = [LF]
carriage return character:
Unicode code point = 13, AppleScript constant = return, representation in text strings for this discussion = [CR]

First, let’s start with expected behavior. When text containing literal linefeed characters is saved to the clipboard with the standard additions command set the clipboard to and retrieved with the the clipboard command, the line ending characters are preserved in all the clipboard’s internal encodings:


set the clipboard to "v" & linefeed & "w" & return & "x" & return & "y" & linefeed & "z"

the clipboard --> "v[LF]w[CR]x[CR]y[LF]z"
the clipboard as «class ut16» --> "v[LF]w[CR]x[CR]y[LF]z"
the clipboard as «class utf8» --> "v[LF]w[CR]x[CR]y[LF]z"

Now, the unexpected behavior. When the same text is selected by the user and then copied to the clipboard by a Command-C key press or by clicking on an application menu’s Copy menu item (typically in the Edit menu), the following results are observed:


the text selected and copied by the user:

	v[LF]w[CR]x[CR]y[LF]z

the clipboard --> "v[CR]w[CR]x[CR]y[CR]z"
the clipboard as «class ut16» --> "v[LF]w[LF]x[LF]y[LF]z"
the clipboard as «class utf8» --> "v[LF]w[CR]x[CR]y[LF]z"

Thus, for text copied to the clipboard via a Command-C key press or Copy menu item click, the plain version of the command, the clipboard, converts all line endings to carriage return characters, and the UTF16 version, the clipboard as «class ut16», converts all line endings to linefeed characters. Only the UTF8 version, the clipboard as «class utf8», preserves the original line ending characters. Incidentally, the ASObjC approach to retrieving clipboard contents via the NSPasteboard class also preserves the original line endings:


use framework "Foundation"
(current application's NSPasteboard's generalPasteboard()'s stringForType:(current application's NSPasteboardTypeString)) as text
 --> "v[LF]w[CR]x[CR]y[LF]z"

While the particular form of line ending character will have no impact in most scripting situations, one situation where it may be critically important is in the processing of text strings by Unix shell commands such as echo, sed, etc. NSRegularExpression pattern matching could also be affected depending on the specifics of the search pattern. For these applications where line ending type may be important, one should consider using the UTF8 version, the clipboard as «class utf8», or ASObjC’s NSPasteboard class, for retrieving text copied to the clipboard by the user.

Very interesting. Looks like someone thought they were being helpful when that code was written.

It’s probably also worth pointing out that you should also avoid “the pasteboard as string”.

Strange. I copied with Copy menu this text: “v\nw\rx\ry\nz”

Then I ran this, and got correct results (all):

set t to the clipboard --------------------> "v\nw\rx\ry\nz"
set t to the clipboard as «class ut16» --> "v\nw\rx\ry\nz"
set t to the clipboard as «class utf8» ---> "v\nw\rx\ry\nz"

How do you get different results?

Now, I copied from Safari’s this topic’s text (with its Edit–>Copy): v[LF]w[CR]x[CR]y[LF]z

The result returned was the same in all cases: “v[LF]w[CR]x[CR]y[LF]z”

You need real returns and linefeeds in there. Run the first part:

set the clipboard to "v" & linefeed & "w" & return & "x" & return & "y" & linefeed & "z"

Then paste into TextEdit, or better still Script Debugger or some other app that lets you see invisibles.

Then select it and Copy. Now run the script.

Thanks, Shane. I got different results your way :slight_smile:

Happy to share the information. I made this discovery recently while working on a script involving a substantial amount of NSRegularExpression text processing. As I was putting the script through its paces and trying to break it, I noticed that literal linefeed characters in text strings were unexpectedly being replaced by carriage return characters in text that was copied to the clipboard then retrieved via the the clipboard command. It was a serendipitous discovery.

For some reason, the the clipboard as string command throws an error, even though the clipboard info command lists one of the internal encodings as “string”. On the other hand, another internal encoding, “Unicode text,” can be used, and it behaves like the plain the clipboard command. In fact, all of the following forms convert line endings to carriage returns:

[i][b]the clipboard[/b][/i]
[i][b]the clipboard as text[/b][/i]
[i][b]the clipboard as Unicode text[/b][/i]

One other point that should be noted is that text copied to the clipboard via a Command-C key press or Copy menu item click, then pasted via a Command-V key press or Paste menu item click (without intermediate processing through the the clipboard command!), retains original line endings, thankfully mimicking the behavior of the the clipboard as «class utf8» command and the ASObjC NSPasteboard method described above.

Finally, the crazy thought came to me that this behavior could be leveraged into yet one more method to convert a text file’s line endings. I’m not recommending this approach, but just as a proof of concept, let’s say that you would like to change the line endings of a document opened in the front window of the TextEdit application (or any text-editing application, for that matter.) Then run the following script to change all line endings:


tell application "System Events" to tell process "TextEdit" -- or any desired process
	set frontmost to true
	keystroke "a" using command down -- to select all the text
	delay 0.5
	keystroke "c" using command down -- to copy all the text to the clipboard
	delay 0.5
	set the clipboard to (the clipboard) -- to convert to carriage return line endings and save the result back to the clipboard
	-- or
	-- set the clipboard to (the clipboard as «class ut16») -- to convert to linefeed line endings and save the result back to the clipboard
	keystroke "v" using command down -- to paste the result, thereby replacing the text with the text with converted line endings
	delay 0.5
end tell

From my prior post:

Sorry, it turns out I was premature in posting this. When I tried this technique in a TextEdit RTF document, the line endings always ended up being carriage return characters independent of the form of set the clipboard to command. The same result was found in a BBEdit text document. I presume this reflects internal text processing that only permits carriage return line endings in those applications.

The two applications where the “crazy” script was found to change line endings are Script Editor and Script Debugger. Prior to recompiling the changed documents, all line endings were found to be those determined by the set the clipboard to (the clipboard) or set the clipboard to (the clipboard as «class ut16») command (i.e., carriage returns or linefeed characters, respectively.) However, once the changed documents were recompiled, only literal line ending characters within text strings retained the changed line endings; all others were converted to carriage returns independent of the form of set the clipboard to command. Again, I presume this reflects internal text processing by the Script Editor and Script Debugger applications.

In summary, based on these preliminary tests, it appears that this technique of changing line endings works only for converting literal line ending characters within text strings to carriage returns [set the clipboard to (the clipboard)] or linefeed characters [set the clipboard to (the clipboard as «class ut16»)] in the Script Editor and Script Debugger applications.

It should work, depending on your test copy. But it truncates if you use, for example, an emoticon.

Not quite – they get turned to carriage returns by the AppleScript compiler, which accepts all versions of line-breaks but always outputs returns.

FWIW, OSAKit defines undocumented methods for converting line-breaks. Because the method names include underscores, they have to be called via performSelector:.

use AppleScript version "2.5" -- macOS 10.11 or later
use framework "Foundation"
use framework "OSAKit"
use scripting additions
set theString to current application's NSMutableString's stringWithString:("v" & linefeed & "w" & return & "x" & return & "y" & linefeed & "z")
theString's performSelector:"_osa_standardizeEndOfLineToCRLF"
theString as text

Shane, thank you for the insights into what goes on under the hood of the script-editing applications and for the description of the undocumented OSAKit method for changing line endings. Talk about undocumented. I didn’t get a single hit on _osa_standardizeEndOfLineToCRLF with a Google search. Great stuff!