Scan File/Folder List down X-levels of hierarchy

James_Nierodzik · November 14, 2007, 10:32pm

hmm, thats odd. Give this a try for the return command in the handler

return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " -print | xargs -I {} osascript -e 'return \"{}\" as POSIX file as alias'"

Now either way you aren’t going to get back a true alias, but you will get something back in a mac-formatted reference for easy use later. Tell me though if this removes the “file” descriptor you were experiencing before. (because I wasn’t, but I am on Leopard)

Glad you’re learning

CalvinFold · November 14, 2007, 10:40pm

Well, now instead of “file” prefacing each entry, I’m getting “alias” prefacing each entry. I’m on Tiger…some of us won’t see Leopard until next spring after Apple gets more kinks out.

You say this isn’t a true alias, which I agree, but that it’s “mac formatted” so I’m wondering how I’d use it if it has that preface on it? Unless I then filter the list to remove the word “alias” on each entry, I must be missing something.

While I have your attention on the “find” command…where in that huge do shell line would I use “-empty” if I didn’t want an empty folder returned (i.e. skip empty folders)? Would save me a filtering step later.

I will post my finished script, if I ever get out of it what I want.

Bruce_Phillips · November 14, 2007, 10:45pm

See this post by StefanK.

James_Nierodzik · November 14, 2007, 10:56pm

well I have no idea how this is actually be returned by osascript, but for giggles what does this strip out the 'alias ’ or does it error?

	return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " -print | xargs -I {} osascript -e 'return text 7 thru -1 of (\"{}\" as POSIX file as alias)'"

CalvinFold · November 14, 2007, 11:01pm

Yup, error:

7:21: execution error: Can’t get text 7 thru -1 of alias “Drive:Folder:”. (-1728)

James_Nierodzik · November 14, 2007, 11:04pm

Yeah I thought as much, so my next try before the train ride then is to return the mac-style as a string. Logic would say the alias or file specifier should be removed. (I really can’t wait for my Tiger dev box to get back from repair /sigh)

return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " -print | xargs -I {} osascript -e 'return \"{}\" as POSIX file as string'"

I’ll be checking back soon!

CalvinFold · November 14, 2007, 11:18pm

This seems to work so far, though if I had to hazard a guess, I’d say it’s about 3x slower now, even slower than when I took the original find results and manually coerced the entire list to aliases via a handler one at a time. I’m assuming this is one of those instances where the shell isn’t always faster than AppleScript?

Gonna leave it as-is for now (with your latest suggestion), since this will eventually run overnight and speed isn’t my primary concern. If you find a better way or more expedient way, I’ll keep checking back.

As always, THANKS!

James_Nierodzik · November 15, 2007, 5:15pm

Hi Calvin, okay how about this one?

	return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | sed -ne \"s/:/\\0/g; s/\\//:/g; s/\\0/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""

huge thanks to Cole over at the Apple Unix form for writing up the sed statement for me!

CalvinFold · November 15, 2007, 6:22pm

I’ll give it this, it’s the fastest, by a landslide, of any of the attempts so far. I was about to post it was a rousing success, but I was looking over the results, and found one serious issue:

–It’s converting all “0” (zero) characters to a “/” (slash).

I know sed less than I know find, but I’m guessing it’s in there somewhere:

sed -ne \"s/:/\\0/g; s/\\//:/g; s/\\0/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\"

I see a couple zeroes in the first part of the statement, not sure if they are relevant.

We’re so close. We…more like you…heheh.

Bruce_Phillips · November 15, 2007, 6:34pm

I think the intent is to replace null bytes, which are used to temporarily replace colons in the (POSIX) names.

StefanK · November 15, 2007, 8:23pm

James Nierodzik:

Hi Calvin, okay how about this one?

	return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | sed -ne \"s/:/\\0/g; s/\\//:/g; s/\\0/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""

I admire you guys to know this stuff.
For me it looks like headache

CalvinFold · November 15, 2007, 9:25pm

I’m just taking James’ word for it, no real idea how it works.

“Nice man give code. Grog use code. Makes Grog happy when code work.”

Once we get it working I’m hoping to get help understanding it. Though that it stumped you Stefan, that’s saying something.

chrys · November 15, 2007, 10:00pm

CalvinFold:

I’ll give it this, it’s the fastest, by a landslide, of any of the attempts so far. I was about to post it was a rousing success, but I was looking over the results, and found one serious issue:

–It’s converting all “0” (zero) characters to a “/” (slash).

I know sed less than I know find, but I’m guessing it’s in there somewhere:
sed -ne \"s/:/\\0/g; s/\\//:/g; s/\\0/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\"
I see a couple zeroes in the first part of the statement, not sure if they are relevant.

Indeed, the problem is with those zeros.

Yes, that does seem to be the intent. But my copy of sed does not seem to share the same interpretation of the command string. It might not for the originator either (Cole at the Apple Unix forum), it would be easy to overlook.

I was unable to come up with a good way to represent the null/zero character to sed, so I made an alternate solution. Delete the first three sed commands that attempt to swap colon for slash and slash for colon and replace them with a tr command. Among a few other things, the tr command is made to do exactly what we need: translate of one set of characters to another set.


return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | tr ':/' '/:' | sed -ne \"/^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""

I started writing up an explanation of how all this works, but I got bogged down in the details of the three levels of string interpretation (AppleScript, the shell, and sed). And right now there are other, more pressing things to do. I thought I would at least post my workaround for the 0 problem though.

Bruce_Phillips · November 15, 2007, 10:18pm

Also, I’d remove the trailing slash before using find.

quoted form of (text 1 thru -2 of (POSIX path of folder_to_scan))

Double colons won’t coerce directly to an alias.

CalvinFold · November 15, 2007, 10:22pm

Near as I can tell, that did it!

Woohoo!

Maybe one day I’ll get bright enough at this stuff to figure out why and how it works.

Thanks folks!

James_Nierodzik · November 15, 2007, 10:23pm

Okay this is seeming to work for me by replacing with a null byte, did you try this chrys?

	return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | sed -ne \"s/:/l\\000/g; s/\\//:/g; s/l\\000/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""

Kevin, I would be interested to know if this works on your end

CalvinFold · November 15, 2007, 10:30pm

James / chrys:

Looks like James’ latest submission does the same thing as chrys’, but James’ is running about 2.75s to run, while chrys’ is running about 3.00s to run. Exact same results as near as I can tell.

Anyone care to comment on the merits of each way of doing it, and which may be more bulletproof? Am using James’ for reasons of speed alone, for now.

Great job both! Always, I am impressed with the caliber of solutions and helpfulness of the contributors here.

chrys · November 16, 2007, 12:36am

James Nierodzik:

Okay this is seeming to work for me by replacing with a null byte, did you try this chrys?

    return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | sed -ne \"s/:/l\\000/g; s/\\//:/g; s/l\\000/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""

Yes, I did try something like that. Neither my original approach or this one work under my testing. When it gets to sed, your version supplies a lowercase letter L, a backslash, and three zeros. This ends up being interpreted as lowercase-L, escaped zero digit (in both contexts this has no effect, so it ends up working like a normal zero digit), and two normal zero digits. Try it with a file or directory name that contains the sequence “l000” (lowercase-L, followed by three zero digits). The code below is a demonstration:

do shell script "echo '/Volumes/ExtDisk/Dir ell triple zero:l000' | sed -ne \"s/:/l\\000/g; s/\\//:/g; s/l\\000/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""
set a to result
do shell script "echo '/Volumes/ExtDisk/Dir ell triple zero:l000'  | tr ':/' '/:' | sed -ne \"/^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""
{a, result}
--> {"ExtDisk:Dir ell triple zero//","ExtDisk:Dir ell triple zero/l000"}
-- the ell triple zero sequence was clobbered in the first one, but preserved in the second

Also in plain shell+sed (od -a dumps its input in a hex-dump style that shows the non-printing ASCII control charaters with their name):

$ echo -e '\000' | od -a # -e tells echo to interpret \000 as the nul byte 0000000 nul nl 0000002 $ echo | sed -e 's/^/\000/' | od -a # sed just produces three zero digits 0000000 0 0 0 nl 0000004 $ echo | sed -e 's/^/l\000/' | od -a # even with the ell, sed produces three zero digits 0000000 l 0 0 0 nl 0000005
When I was searching the sed manpage, I found the reference to the \000 syntax, but if you read the whole section, it is only for the output of the lowercase-L command. I am guessing that is how the lowercase-L got into your text, but that is not quite how sed commands work. Because the sequences are inside the delimiters for the lowercase-S sed command (slashes in this case), the lowercase-L you put in them is interpreted as part of a search-and-replace replacement text and a regular expression (the first and second occurrences, respectively), not as a sed command.

This kind of thing is probably why it was overlooked in the first place in the Apple Unix forum. Correspondingly nobody should not expect my version to be completely bulletproof either. There are probably things I have overlooked as well (especially since I just entered into the middle of the thread to fix a single bug).

Edit History: Reworded a bit. Added shell and sed nul byte examples (dumped with od -a).

Bruce_Phillips · November 16, 2007, 12:54am

I just use POSIX file. :rolleyes:

Edit: I recall that on Tiger some shares (SMB, at least) used the IP address for the mount point (i.e. POSIX path), while using the share name for the HFS path. If that’s the case, then any text manipulation method would likely fail if coerced to an alias for such instances. (Leopard seems to use the share name as the mount point.)

chrys · November 16, 2007, 1:34am

As an example, there are likely problems “further down the stream” when dealing with pathnames with embedded newlines (ASCII character 10) and carriage returns (ASCII character 13). find will print them out OK, but then how does one’s program determine what is the start of a new pathname vs an embedded newline at the end of a path component:

/Volumes/testvol/Funny Thing /Volumes/testvol/NormalThing
Is that two paths or just one with an embedded line break (something like “/Volumes/testvol/Funny Thing\n/Volumes/testvol/NormalThing”).

/Volumes/testvol/new line dir/somefile
Is that one absolute and one relative pathname, or one absolute pathname with an embedded line break?

Absurd? Sure. Algorithmically determinable? Almost, if you assume that all the pathnames you are dealing with refer to extant files or directories, and you are willing to spend the CPU and/or disk time to verify that various combinations of the files exist. Or maybe you are willing to live with a heuristic (all pathnames are absolute: starts with a /? → new pathname, does not start with a slash? → continuation of a pathname after an embedded linebreak). Bulletproof? No way.

As a hint, this is why GNU find added -print0. The slash and the nul character are the only two disallowed characters in most UNIX-like file systems. So when working with full pathnames (like in find), the nul character is a nice one to use to separate independent items. Alas, not every other shell tool works well with such nul-delimited output (often because the nul character is the string terminator in C-style strings!).