Resulte of 'Do Javascript'

Must be something obvious that I’m doing wrong here. The first half of the script works by itself: I can grab the title of the web page with javascript. The second part of the script works, stripping spaces from the title. But they won’t work together. Any ideas? Thanks…


tell application "Safari"
activate
	set the_title to do JavaScript "document.title" in front document
end tell

set new_title to do shell script "echo " & the_title & " | sed -e 's/^s*//' -e 's/s*$//'"

set the clipboard to new_title

First, when passing string data to programs via do shell script, always use quoted form of to make sure the string data is not interpreted by the shell.

Second, it looks like you were intending to use the Perl-style \s space expression. If so, you need to write your AppleScript inline string value like this: " | sed -e ‘s/^\s*//’ -e ‘s/\s*$//’". Whenever you want to represent a single backslash in an AppleScript string, you have to double it (something similar happens in C, for example).

However (on my system) sed does not support the Perl-style \s expression. But maybe yours is different. I found that I could use [[:space:]] though (see the re_format(7) manpage on your system). Or you could construct your own character class using the square brackets and actual space, tab, newline, return, form feed, and vertical tab characters (or whichever subset you actually want).

tell application "Safari" to set the_title to do JavaScript "document.title" in front document

set the_title to space & tab & the_title & tab & space -- put some spaces on, just for testing

-- Use the builtin character class.
set new_title to do shell script "echo " & quoted form of the_title & " | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'"

-- Use a custom character class.
set spaces to space & tab
set new_title to do shell script "echo " & quoted form of the_title & " | sed -e 's/^[" & spaces & "]*//' -e 's/[" & spaces & "]*$//'"

-- Same as above, but more fragile since it uses actual tab characters in the inline string value, which might be eaten by some programs/work-flows.
set new_title to do shell script "echo " & quoted form of the_title & " | sed -e 's/^[ 	]*//' -e 's/[ 	]*$//'"

--set the clipboard to new_title

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari 4.0.4 (4531.21.10, r51708)
Operating System: Mac OS X (10.4)

Thanks for the help; that was an extensive answer.

We killed a number of birds with one rock. Not using the “quoted form of” was a problem, and as soon as I changed that, I needed the different sed string to strip spaces and tabs, even though it was working before. And using the character class is a better idea, anyway.

Now the script works fine, except for what was to be my next question after this was solved: I need to get rid of a carraige return in the_title, and my ham-fisted attempts to add something like -e ‘s/[[:return:]]*$//’ don’t work and I haven’t had any luck with different sed commands, either, as Applescript doesn’t like an unescaped \r.

All this to clean up a dirty page title of a URL that I email several times a day…

There is no character class for just a return, and there is no octal/hex/Unicode syntax like in Perl-style regexps, so we have to embed the character directly. AppleScript’s inline string syntax does understand ˜\r’, but there are problems with using it in 10.4. The problem with how Script Editor, upon first compiling a script with ˜\r’ in a string changes all the "\r’ to a real return characters (it actually edits your source code). If this first compilation is part of a running the script, it will work OK, but if you use Compile, then Run (or Run and Run again), the replacement happens the first compilation, and a second replacement (return to line feed) happens on the second compilation. I read that 10.5 or 10.6 fixes the problem, but I could not find a reference to it in the AppleScript Release Notes.

Anyway, for something that should work everywhere (that do shell script is supported), try this:

tell application "Safari" to set the_title to do JavaScript "document.title" in front document

set the_title to space & tab & ¬
	text 1 through 10 of the_title & ¬
	return & text 11 through end of the_title & ¬
	tab & space -- put some spaces on (and an embedded return in), just for testing

set new_title to do shell script "echo " & quoted form of the_title & " | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' -e 's/" & return & "//g'"

-- Same as above, but with meaningful names for the sed expressions
set trim_leading to "s/^[[:space:]]*//"
set trim_trailing to "s/[[:space:]]*$//"
set strip_returns to "s/" & return & "//g"
set new_title to do shell script "echo " & quoted form of the_title & " | sed -e " & quoted form of trim_leading & " -e " & quoted form of trim_trailing & " -e " & quoted form of strip_returns

--set the clipboard to new_title

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari 4.0.4 (4531.21.10, r52126))
Operating System: Mac OS X (10.4)

Here is a Ruby version.

do shell script "echo " & quoted form of the_title & " | ruby -e \"print STDIN.read.gsub(/\\s/, '').chomp\""

chrys, Stripping the return with -e ‘s/" & return & "//g’ works great, but… Using the sed string to strip spaces and returns only works when the URL is on the clipboard in a “test” script, not when Safari and javascript it used to grab the URL. For lack of a better way to say it, there’s a disconnect between Safari getting the URL and passing it to the sed string. So, does the URL that javascript grabs need to be quoted or stipulated to be in some format?

Craig, the Ruby version won’t work as the return is in the middle of the title string that Safari grabs…

Then you could add another gsub which would remove any return in the line.
If this does not work, then change the \n to \r.

do shell script "echo " & quoted form of the_title & " | ruby -e \"print STDIN.read.gsub(/\\s/, '').gsub(/\n/, '').strip\""

There should be nothing special concerning the string’s source (˜clipboard’ vs. Safari via JavaScript). I do not think you ever mentioned the URL of the page you are trying to treat, so it is difficult to reproduce the problems you are encountering.

That said, my current guess that the character you want to remove is actually a line feed (ASCII 10), not a carriage return (ASCII 13). It is difficult to handle line feeds in some versions of sed (like the one on my 10.4 machine) because the pattern part of the s command does not recognize \n as an escaped line feed and an actual line feed prematurely ends the whole s command before it is syntactically complete. So I would turn to tr to strip out all the line feeds and carriage returns before sed does the start/end trimming.

tell application "Safari" to set the_title to do JavaScript "document.title" in front document

set the_title to ¬
	space & tab & text 1 through 10 of the_title & ¬
	return & text 11 through 16 of the_title & ¬
	(ASCII character 10) & text 17 through end of the_title & ¬
	tab & space -- put some spaces on, just for testing; also separately embed a return and linefeed

set new_title to do shell script "echo " & quoted form of the_title & " | tr -d '\\r\\n' | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//'"
{the_title, new_title}

--set the clipboard to new_title

Late getting back to this; thanks for your help. What I ended up with is this:

set new_title to do shell script "echo " & quoted form of the_title & "  | sed -e 's/^[[:space:]]*//' -e 's/[[:space:]]*$//' | tr -d '\\r\\n'"

The carriage return was in the middle of the string, so I changed your shell script around to strip leading and trailing spaces first, then get the return with tr, which was a great idea to use.

I also added (not shown here) another sed operation to remove the “http://” from the url, as it was causing a link inside of the generated link…