do shell script is not working like the Terminal command

Brazuca · December 21, 2007, 12:35am

Living and learning, I’m getting pretty far on my goal, but keep hitting hurdles. I promise that I do search here and on google for the answers, and I read the Apple docs on Applescript, but I just can’t figure some things out.

My script to escape all quotes in an .xml file:

set myFolder to ((path to desktop as text) & "gcal")
tell application "Finder" to set theFiles to every file of folder myFolder whose name extension is "xml"

repeat with j from 1 to the count of theFiles
	
	set thisFile to name of item j of theFiles
	
	set theCommand to "sed -i -e 's/\"/\\\"/g' /Users/mlafleur/Desktop/gcal/" & thisFile
	
	do shell script theCommand
end repeat

The result of the script above is just: “” (exactly like that, two double quotes). The files are not modified.

The result of this Terminal command is a successful escaping of the quotes:

sed -i -e ‘s/"/\"/g’ /Users/mlafleur/Desktop/gcal/P31733.xml <<P31733.xml is one of the files that the script above modifies.

So, why does the Terminal command work, but the Applescript command doesn’t? It isn’t a shell thing, because the sed command is running. I can’t figure out where I’m messing up the command since it is not giving me an error when its run, and it is working on each file (by checking the Event Log). I don’t get it…

Craig_Smith · December 21, 2007, 12:43am

This lines is probably your culprit:

set thisFile to name of item j of theFiles

That only works within the Finder, so try this:

Tell app "Finder" to set thisFile to name of item j of theFiles

Or, you may be able to use info for:

set thisFile to (info for (item j of theFiles))'s name

Good luck,

chrys · December 21, 2007, 2:24pm

Well, you did not say exactly how your script was failing (except mentioning that it returns an empty string, which is normal for a successful shell command that issues no output to stdout). From playing with the script, I suspect that you meant that the double quote characters in the targeted files do not end up with an (extra) blackslash in front of them. That is caused by a context error when you moved the string you typed at the command line into an AppleScript string literal.

First, let us start with the desired sed program in a block of its own:[code]s/"/\\"/g
This is the literal program text we would like sed to interpret. It does not have any special quoting for the shell or AppleScript. It is the actual data that you would put in a file if you were passing the program to sed by using sed’s -f option. To understand why we need two backslashes in the replacement text (in this case the replacement text is the text after the second slash character and before the third slash character) we need to understand a bit about how backslashes function in that context. When a backslash is followed by a single, non-zero digit character it has a special meaning that I will not describe here (see the man page for sed). When a backslash comes before any other character, it means that the character that follows it will be inserted into the replacement text literally. If the replacement text is a\ce, the result would be ace (the lowercase C character is inserted as a literal). To get a literal backslash into the result we need to backslash the backslash. Thus the double backslash in the s sed command’s replacement text. It means we want a single, literal backslash in the resulting text.

Based on the description of typing it into a Terminal window, the original sed program in this thread was this:s/\"/\\\"/g
Since the double quote is not special in the pattern part or the replacement text part of sed’s s command and because the backslash plus double quote combination represents a literal double, the backslashes that come immediately before the double quotes are superfluous. The double backslash itself needs to stay so that we get a single, literal backslash in the replacement text.

Next, if we are going to supply our sed program by using sed’s -e command line option, we need to make sure we pass it to the shell with proper quotation. Here I am also going to change the -i option that is used. As it was (-e following -i), the files ended up with -e appended to them. If you intended to have sed not save a backup copy, you need -i ‘’. I will have sed append .bak, since I like to have backup copies in case something goes wrong. The sed command line ends up looking like this (this is one of many ways of doing quoting for the shell, there are other, equivalent representations):sed -i .bak -e 's/"/\\"/g'
All we had to do was to wrap the program part in single quotes. Everything is literal inside the shell’s single quotes (excluding single quotes themselves, since they delimit the region in question). This means that backslashes have no special meaning inside the shell’s single quotes (they are literal). If you want to embed a single quote, you have to do it another way (we do not need to do that here, so I will not describe it right now). At this point, this text is the command you could type or copy-and-paste into a Terminal window running /bin/sh.

Next we want to move the command into an AppleScript string literal so it can be used with do shell script. AppleScript string literals are delimited by double quote characters. To embed a double quote in an AppleScript string literal, we need to put a backslash immediately before the double quote (well, that means that string literals are not literal, but that is the term used in the AppleScript Language Guide). To embed a backslash in an AppleScript string literal, we need to put a backslash immediately before the original backslash. There are several other characters that have special meaning when preceded by a backslash in an AppleScript string literal, but I will not describe them here (see the Special Characters in Strings section of the AppleScript Language Guide). So here is the command line as an AppleScript string literal:

"sed -i .bak -e 's/\"/\\\\\"/g'"

We can save this string in a variable or use it directly as part of the parameter to the do shell script command.

What Craig Smith was talking about in post #2 was that the objects in the theFiles list are Finder objects. Usually when you access an object that belongs to a specific application you must do so inside a tell block or statement. (My guess for why it is not strictly necessary in this case revolves around the fact that name is a standard property label («class pnam»). The compiler does not need to know anything Finder specific to compile the property reference form (name of) and the runtime is able to access the property via generic property access code.) So normally, if you want to play with a property from an application specific object you must extract the property inside the scope of a tell application . block or statement or maybe just a using terms from application . block.

Latent Bugs

I think the above string formulation is enough to get your script “working”, but it still has a couple of hidden bugs.

In your original script, thisFile contains the HFS filename of a file. But your script appends that string to the POSIX pathname of a folder. Slashes are valid in HFS names (files or folders) but are invalid in POSIX names, so they are represented by colons in that context. The reverse also holds: colons in POSIX names are represented as slashes in HFS-oriented contexts. For most filenames, concatenating a POSIX directory pathname with an HFS filename will work out OK. However, it will fail if the HFS version of the filename contains a slash character (or, equivalently, the POSIX version of the filename contains a colon character). To make this concatenation valid for all(?) possible file names, you would need to convert any slashes in the HFS filename to colons before appending it to the POSIX directory pathname.

Normally I would recommend using POSIX path of to convert from HFS to POSIX, but (at least on my system) POSIX path of anHFSNameString does not work reliably for non-absolute pathnames (like a filename without any folder names).

POSIX path of "/foo" -- An HFS filename starting with a slash. This is valid.
-- "/foo" -- Oops. Maybe the leading slash makes it think it already is a POSIX path, so it does nothing. This is a potential bug in the "POSIX path of aString" command since volume names can legally start with slashes. Thankfully, AppleScript aliases and file references that refer to items whose volume name starts with a slash work correctly with "POSIX path of".
POSIX path of "bar/foo"
-- "/bar:foo" -- Oops. It makes the file look like it is in the root directory. Actually, this might be OK if you are appending to a POSIX pathname, since multiple consecutive slashes are OK.

These behaviors may be reasonable for the way POSIX path of is usually used (with alias, file refs, or strings that have absolute paths), but they mean that we cannot apply POSIX path of to just the filename (as would be convenient in context of the original script).

If you KNOW that none of your HFS filenames will have slashes, then you might just live with the latent bug inherent in appending an HFS filename to a POSIX pathname. If you KNOW that your HFS filenames might have slashes but never leading slashes, using POSIX path of on the filenames and appending them to the POSIX path will work if you are willing to live with a latent bug that would be triggered when your script encounters a filename with a leading slash. The preceding notes also hold if you frame them in terms of the POSIX filename, colons, and leading colons, since that is the identical situation, just viewed from a POSIX context. Otherwise (if you do not want the bugs, or the HFS filenames might contain leading slashes (equivalently, the POSIX filename might have a colon)), you should probably extract the whole path from AppleScript’s aliases or file references, or Finder file objects.

Another latent bug in the script relates to handling pathnames that have characters that are treated specially by the shell. These characters include, but are not limited to, space, asterisk, question mark, semicolon, greater than, less than, and exclamation mark. If a file’s pathname had any of the special characters the shell command would fail in various ways. To fix this latent bug, you should use quoted form of for any strings that you want to pass verbatim to a shell command.

Here is one way to rewrite the script to account for the problems I have described in this post:

set myFolder to ((path to desktop as Unicode text) & "gcal")
tell application "Finder" to set theFiles to (every file of folder myFolder whose name extension is "xml")

repeat with j from 1 to the count of theFiles
    -- Coercing the Finder file object to an alias seems to work for me outside of tell statement. It may need to be uncommented if you run into conversion errors.
    --tell application "Finder" to ¬
    set thisFilePP to quoted form of POSIX path of (item j of theFiles as alias)
    
    set theCommand to "sed -i .bak -e 's/\"/\\\\\"/g' " & thisFilePP
    
    do shell script theCommand
end repeat

I do not know the larger context of this script, but we could make the shell do all the work:

set myFolderQFPP to quoted form of (get POSIX path of file ((path to desktop as Unicode text) & "gcal:"))

set theCommand to "find " & myFolderQFPP & " -maxdepth 1 -type f -name \\*.xml -print0 | xargs -0 sed -i .bak -e 's/\"/\\\\\"/g'"

do shell script theCommand

You could drop the find and xargs stuff (and replace it with a cd command and the sed command with a final argument of *.xml), but it might break in the case where the combined length of the command line arguments (mostly the pathnames) is too large (nearly 256KB; usually several thousand pathnames, depending on their lengths).

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari 3.0.4 (523.12)
Operating System: Mac OS X (10.4)

Brazuca · December 21, 2007, 11:12pm

Wow, thanks a lot guys. Chris, that was a fantastic tutorial and really eye opening. I thought I had escaped enough times for Applescript, but I guess I needed to add a few more slashes. I actually thought that I had too many (since the Terminal version of the sed command worked with less slashes, as you pointed out). I also didn’t know about the “-e after -i” thing. I read the man page for sed but I got that syntax from a different script in these forums. Now I know why I was getting all these filenames that ended with “-e”.

The script you made at the end works great. I wasn’t worried about the bugs you pointed out (thought it is nice to know about it for future reference) since the path and the filenames will always be similar to what is there. But it doesn’t hurt to write the script well…

FWIW, the script exists as part of a larger project. Where I work, the event calendar has an RSS version that isn’t updated. I would like to make a Google Calendar version of the web calendar for some co-workers that would benefit, plus to show them the benefits of having a system where people can use the info into with other tools. Since they won’t fix the RSS feed, I’m trying to automate the process of getting the data (which is in .xml format) and moving it into GCal. Since GCal cannot import an .xml, I’m going to reshape the files as .ics first, then have GCal get that, somehow.

I have an automator script that gets all the files and places them in the /gcal directory. This is the part where I need to massage some things in the file to be able to parse it via the XML Tools scripting addition (it parses .xml files and allows you to refer to elements in your script). But I need to escape things like the quotes for it to work. Amazing how such a small need takes so much time.

My todo list looks like this:
-write a script to fix the line endings on the file to CRLF (I have a TextWrangler Find/Replace script that does it, but I want to do it in Applescript). I imagine there is a simple command to do this…
-write a script to place specific parts of the file (determined by their XML tags) into the .ics format (I’ve been reading the tutorial on Text Delimiters here which I think will do the trick
-find a way to get it into iCal or straight into GCal
-create a work flow that will keep the calendar updated. Either by deleting, rerunning the whole thing, re-uploading, or something else
-???
-Profit!

I would love any help/suggestions, and I hope you guys don’t think I’m bothering the forums too much with all the questions I will have.

Thanks again!!!