hmm, thats odd. Give this a try for the return command in the handler
return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " -print | xargs -I {} osascript -e 'return \"{}\" as POSIX file as alias'"
Now either way you aren’t going to get back a true alias, but you will get something back in a mac-formatted reference for easy use later. Tell me though if this removes the “file” descriptor you were experiencing before. (because I wasn’t, but I am on Leopard)
Well, now instead of “file” prefacing each entry, I’m getting “alias” prefacing each entry. I’m on Tiger…some of us won’t see Leopard until next spring after Apple gets more kinks out.
You say this isn’t a true alias, which I agree, but that it’s “mac formatted” so I’m wondering how I’d use it if it has that preface on it? Unless I then filter the list to remove the word “alias” on each entry, I must be missing something.
While I have your attention on the “find” command…where in that huge do shell line would I use “-empty” if I didn’t want an empty folder returned (i.e. skip empty folders)? Would save me a filtering step later.
I will post my finished script, if I ever get out of it what I want.
Yeah I thought as much, so my next try before the train ride then is to return the mac-style as a string. Logic would say the alias or file specifier should be removed. (I really can’t wait for my Tiger dev box to get back from repair /sigh)
return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " -print | xargs -I {} osascript -e 'return \"{}\" as POSIX file as string'"
This seems to work so far, though if I had to hazard a guess, I’d say it’s about 3x slower now, even slower than when I took the original find results and manually coerced the entire list to aliases via a handler one at a time. I’m assuming this is one of those instances where the shell isn’t always faster than AppleScript?
Gonna leave it as-is for now (with your latest suggestion), since this will eventually run overnight and speed isn’t my primary concern. If you find a better way or more expedient way, I’ll keep checking back.
I’ll give it this, it’s the fastest, by a landslide, of any of the attempts so far. I was about to post it was a rousing success, but I was looking over the results, and found one serious issue:
–It’s converting all “0” (zero) characters to a “/” (slash).
I know sed less than I know find, but I’m guessing it’s in there somewhere:
sed -ne \"s/:/\\0/g; s/\\//:/g; s/\\0/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\"
I see a couple zeroes in the first part of the statement, not sure if they are relevant.
Yes, that does seem to be the intent. But my copy of sed does not seem to share the same interpretation of the command string. It might not for the originator either (Cole at the Apple Unix forum), it would be easy to overlook.
I was unable to come up with a good way to represent the null/zero character to sed, so I made an alternate solution. Delete the first three sed commands that attempt to swap colon for slash and slash for colon and replace them with a tr command. Among a few other things, the tr command is made to do exactly what we need: translate of one set of characters to another set.
return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | tr ':/' '/:' | sed -ne \"/^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""
I started writing up an explanation of how all this works, but I got bogged down in the details of the three levels of string interpretation (AppleScript, the shell, and sed). And right now there are other, more pressing things to do. I thought I would at least post my workaround for the 0 problem though.
Looks like James’ latest submission does the same thing as chrys’, but James’ is running about 2.75s to run, while chrys’ is running about 3.00s to run. Exact same results as near as I can tell.
Anyone care to comment on the merits of each way of doing it, and which may be more bulletproof? Am using James’ for reasons of speed alone, for now.
Great job both! Always, I am impressed with the caliber of solutions and helpfulness of the contributors here.
Yes, I did try something like that. Neither my original approach or this one work under my testing. When it gets to sed, your version supplies a lowercase letter L, a backslash, and three zeros. This ends up being interpreted as lowercase-L, escaped zero digit (in both contexts this has no effect, so it ends up working like a normal zero digit), and two normal zero digits. Try it with a file or directory name that contains the sequence “l000” (lowercase-L, followed by three zero digits). The code below is a demonstration:
do shell script "echo '/Volumes/ExtDisk/Dir ell triple zero:l000' | sed -ne \"s/:/l\\000/g; s/\\//:/g; s/l\\000/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""
set a to result
do shell script "echo '/Volumes/ExtDisk/Dir ell triple zero:l000' | tr ':/' '/:' | sed -ne \"/^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""
{a, result}
--> {"ExtDisk:Dir ell triple zero//","ExtDisk:Dir ell triple zero/l000"}
-- the ell triple zero sequence was clobbered in the first one, but preserved in the second
Also in plain shell+sed (od -a dumps its input in a hex-dump style that shows the non-printing ASCII control charaters with their name):
$ echo -e '\000' | od -a # -e tells echo to interpret \000 as the nul byte
0000000 nul nl
0000002
$ echo | sed -e 's/^/\000/' | od -a # sed just produces three zero digits
0000000 0 0 0 nl
0000004
$ echo | sed -e 's/^/l\000/' | od -a # even with the ell, sed produces three zero digits
0000000 l 0 0 0 nl
0000005
When I was searching the sed manpage, I found the reference to the \000 syntax, but if you read the whole section, it is only for the output of the lowercase-L command. I am guessing that is how the lowercase-L got into your text, but that is not quite how sed commands work. Because the sequences are inside the delimiters for the lowercase-S sed command (slashes in this case), the lowercase-L you put in them is interpreted as part of a search-and-replace replacement text and a regular expression (the first and second occurrences, respectively), not as a sed command.
This kind of thing is probably why it was overlooked in the first place in the Apple Unix forum. Correspondingly nobody should not expect my version to be completely bulletproof either. There are probably things I have overlooked as well (especially since I just entered into the middle of the thread to fix a single bug).
Edit History: Reworded a bit. Added shell and sed nul byte examples (dumped with od -a).
Edit: I recall that on Tiger some shares (SMB, at least) used the IP address for the mount point (i.e. POSIX path), while using the share name for the HFS path. If that’s the case, then any text manipulation method would likely fail if coerced to an alias for such instances. (Leopard seems to use the share name as the mount point.)
As an example, there are likely problems “further down the stream” when dealing with pathnames with embedded newlines (ASCII character 10) and carriage returns (ASCII character 13). find will print them out OK, but then how does one’s program determine what is the start of a new pathname vs an embedded newline at the end of a path component:
/Volumes/testvol/Funny Thing
/Volumes/testvol/NormalThing
Is that two paths or just one with an embedded line break (something like “/Volumes/testvol/Funny Thing\n/Volumes/testvol/NormalThing”).
/Volumes/testvol/new
line dir/somefile
Is that one absolute and one relative pathname, or one absolute pathname with an embedded line break?
Absurd? Sure. Algorithmically determinable? Almost, if you assume that all the pathnames you are dealing with refer to extant files or directories, and you are willing to spend the CPU and/or disk time to verify that various combinations of the files exist. Or maybe you are willing to live with a heuristic (all pathnames are absolute: starts with a /? → new pathname, does not start with a slash? → continuation of a pathname after an embedded linebreak). Bulletproof? No way.
As a hint, this is why GNU find added -print0. The slash and the nul character are the only two disallowed characters in most UNIX-like file systems. So when working with full pathnames (like in find), the nul character is a nice one to use to separate independent items. Alas, not every other shell tool works well with such nul-delimited output (often because the nul character is the string terminator in C-style strings!).