Scan File/Folder List down X-levels of hierarchy

I have mostly finished writing an attempt at an explanation of how the shell interprets the do shell script strings. It uses the sed part of the shell script code in this thread as an example, but it does not really explain how we tell sed to do what it does.

So, Kevin, are you more interested in learning how the shell works, how sed works, (how tr works,) how find works, or seeing how all of it works together in one bundle?

I could post the shell stuff I have (though I think I want to redo some of the examples because I have found yet another latent bug in the original code), but it really just scratches the surface of the whole package. Each of the shell, sed, and find are really their own little mini languages. I could not reasonable explain everything about each one (one should read the fine manpages for that), but I could try to explain enough about each one to describe what they are doing in this case and maybe give enough of a foundation to understand other, similar invocations of the commands.

Then again, none of this is really AppleScript stuff. It might help people with do shell scripts but it is maybe 99.5% Unix, 0.499% Mac OS X, and only 0.001% AppleScript. I still remember the last time do shell script debates raged on the AppleScript-Users mailing list…

Sorry been out a while. Well, sounds like maybe I’ll skip the UNIX explanations here for now…don’t want to start a war over “do shell” stuff…and I’ll just be super-thankful for all the help and the willingness of folks here to go above-n-beyond to solve someone else’s problems! :wink:

So which version should I use…James’ or Chrys’?

Bruce: I’m kinda confused by your response…are you saying to use POSIX file in the middle of one of James’ or Chrys’ scripts somehow? Or did you have a different non-“do shell” solution to my original problem?

If you decide to use “my” version, here is a newer version that has fewer bugs:

on listGetter(folder_to_scan, scan_level, folder_exceptions)
	--exceptions formatted for shell find
	copy folder_exceptions to folder_exceptions
	repeat with fe_ref in folder_exceptions
		set contents of fe_ref to quoted form of contents of fe_ref
	end repeat
	set ASTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to " -or -name "
	set exclude_code to text 6 thru -1 of ("" & ({""} & folder_exceptions))
	set AppleScript's text item delimiters to ASTID
	--do shell find with exceptions
	do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( \\( " & exclude_code & " \\) -prune \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | sed -e 's|//*|/|g; y|:/|/:|; /^:Volumes:/ ! s/^/'\"$(ls -F /Volumes | sed -ne 'y|:/|/:|; s|[/&\\]|\\\\&|g; s|@$||p;')\"'/; s|^:Volumes:||'"
	paragraphs of result
end listGetter

Here is that do shell script command again, with some embedded comments:

	-- Embedded newlines below...
	do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( \\( " & exclude_code & " \\) -prune \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | sed -e ' # Convert POSIX to Mac, prepending name of startup disk or stripping /Volumes/
s,//*,/,g;			# delete multiple adjacent slashes, they are ignored in POSIX paths, but adjacent colons can cause problems in Mac paths
y|:/|/:|;			# swap all colons and slashes
/^:Volumes:/ ! s/^/'\"$(ls -F /Volumes | sed -ne '# Convert POSIX to Mac and print out only symlinks from ls -F
	y|:/|/:|;			# swap all colons and slashes
	s|[/&\\]|\\\\&|g;	# the output will be used as the replacement for a slash delimited s command; escape slash, ampersand and backslash in the name of the startup disk
	s|@$||p;			# only print out lines that end with the at sign (which means that the named directory entry is a symlink)
	')\"'/;	# if it did NOT start with /Volumes/, prepend the name of the startup disk
s/^:Volumes://;	# if it started with /Volumes/, then just strip it, the next part should be the volume name
'"

Changes from previous versions:
¢ included paragraphs of . to make it produce a list (I presume you were always doing this with the output of this handler, so I pulled it into the handler; this also makes the output (a list instead of a long newline delimited string) comparable to the output of the last version in this post)
¢ regularized the folder exception building code a bit
¢ added -prune to the find commands to abort descending into trees that will be excluded from the output (this may not be as important since you are using -mindepth and -maxdepth (which seem to always be evaluated, even if in places where I expect otherwise), but it is a common time saver in most uses of find where directories are skipped)
¢ use an alternate delimiter in some sed commands to avoid having to escape actual slash characters

Bugs fixed from my most recently posted version:
¢ folder exceptions quoted properly (a double quote, dollar sign, back quote, or back slash in one of the exceptional names could have caused problems in the previous code that just wrapped them in double quotes)
¢ repeated slashes in the POSIX pathname are collapsed into a single slash (a different fix for the double colon problem that Bruce Phillips noted)
¢ tr invocation subsumed by equivalent sed functionality (the sed y-command has all of the functionality of the tr program that we need here, I must have had tunnel vision for sed s-commands at the time I proposed using tr); this is not really a bug but using tr was less efficient
¢ colons in the POSIX startup disk name are translated to slashes in the Mac startup disk name
¢ slashes, ampersands and backslashes in the startup disk’s name no longer cause syntax errors for sed

Known bugs:
¢ file/folder names with embedded newlines will not be properly handled (impossible to fix with the newline delimited data format that find and sed are using here)

Fixing the newline bug with any version of sed is going to be impossible or pretty unreliable (depending on how the implementation of sed chooses to behave). One way to fix the newline bug would be to use the -print0 command of find to produce null terminated pathnames and combine that with a language that can handle embedded null characters and that can also do the required string manipulation. Various UNIXy languages could fill this role, but so can AppleScript.

Here is my take on teaming up “find -print0” with AppleScript (maybe this is something like what Bruce Phillips had in mind when he mentioned using POSIX file). It is (much) slower than the sed versions, but the conversion it does is probably about as reliable as possible:

on listGetter(folder_to_scan, scan_level, folder_exceptions)
	--exceptions formatted for shell find
	copy folder_exceptions to folder_exceptions
	repeat with fe_ref in folder_exceptions
		set contents of fe_ref to quoted form of contents of fe_ref
	end repeat
	set ASTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to " -or -name "
	set exclude_code to text 6 thru -1 of ("" & ({""} & folder_exceptions))
	set AppleScript's text item delimiters to ASTID
	--do shell find with exceptions
	do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( \\( " & exclude_code & " \\) -prune \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " -print0 ; true" without altering line endings
	set find0 to result
	set {ASTID, text item delimiters} to {text item delimiters, {ASCII character 0}}
	try
		set POSIX_pathnames to text items 1 through -2 of find0 -- Drop the last text item because it is always empty (find -print0 always prints a trailing null).
		set text item delimiters to ASTID
	on error m number n from o partial result r to t
		set text item delimiters to ASTID
		error m number n from o partial result r to t
	end try
	script speedHack
		property Mac_pathnames : {}
	end script
	repeat with P_pn in POSIX_pathnames
		set end of speedHack's Mac_pathnames to (POSIX file (contents of P_pn)) as Unicode text
	end repeat
	speedHack's Mac_pathnames
end listGetter

All the bugs I have found (as of this moment) in the sed versions are handled in this “find -print0”/AppleScript version (including the newline bug). Like the first handler in this post, this one yields a list of Mac pathnames. Symbolic links show up a bit differently in the outputs of the first and last versions in this post. The sed versions yield the Mac path to the symlink itself. This “find -print0” and “(POSIX file .) as Unicode text” version yields the Mac path to the target of the symlink. Since the target of the symlink may be at a different depth, you may not be expecting it to be included in the output. For those familiar with typical UNIX program options, this is similar to the “follow symlinks” behavior of most UNIX programs. If you want to exclude symlinks, you could add "-type l -or " to the front of exclude_code.

chrys,

Tried to use your last two versions of listGetter and both returned nothing when I swapped them into my script.

Here’s my last version:

on listGetter(folder_to_scan, scan_level, folder_exceptions)
	--exceptions formatted for shell find
	set ASTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "\" -or -name \""
	set exclude_code to text 2 thru -1 of ("" & ({""} & folder_exceptions)) & "\""
	set AppleScript's text item delimiters to ASTID
	
	--do shell find with exceptions
	return do shell script "/usr/bin/find " & (quoted form of POSIX path of folder_to_scan) & " ! \\( -name" & (text 11 thru -1 of exclude_code) & " \\) -maxdepth " & scan_level & " -mindepth " & scan_level & " | sed -ne \"s/:/l\\000/g; s/\\//:/g; s/l\\000/\\//g; /^:Volumes/ ! s/^/`ls -F /Volumes | sed -ne 's/@$//p'`/p; s/^:Volumes://p\""
end listGetter

It returns results just fine, but neither of yours do…they both return 0 results. Is it possible your script change the format of the output in a way that has messed-up my use of the handler?

Here’s my main script:

--
-- DECLARE PROPERTIES
--
-- debugging/logging on?
property g_debug : true

--basic file path and names
property g_home_folder_path : path to home folder
property g_log_file_name : "LOG--Overnight Automation.txt"
property g_transfer_files_location : "Server:Transfer Files"

--Mac OS names to be ignored
property g_exclusions_macosx : {"Temporary Items", "Trash", ".DS_Store", "TheFindByContentFolder", "TheVolumeSettingsFolder", "Icon
"}

--folder names not to be scanned
property g_exclusions_folders : g_exclusions_macosx & {"_VACATION REQUESTS", "Bob Doe"}

--folder names of users whose contents should never be deleted automatically
property g_exclusions_users : g_exclusions_macosx & {"SKU LISTS", "Bob Doe", "Joe Doe", "Steve Dude", "Jane Doe", "Valerie Joe", "Kevin Quosig"}

--
-- MAIN SCRIPT
--
--get transfer folder categories (include empties)
set category_folders to {}
set category_folders to paragraphs of listGetter(g_transfer_files_location, 1, g_exclusions_folders)

--get user folders (include empties)
set user_folders to {}
repeat with j from 1 to (number of items in category_folders)
	set user_folders to user_folders & paragraphs of listGetter(item j of category_folders, 1, g_exclusions_users)
end repeat

--get user folders that aren't empty
set user_folders_filtered to {}
repeat with e from 1 to (number of items in user_folders)
	tell application "Finder"
		set folder_to_check to (item e of user_folders)
		set item_contents to number of items in folder folder_to_check
		if number of items in folder folder_to_check > 0 then
			set user_folders_filtered to user_folders_filtered & folder_to_check
		end if
	end tell
end repeat

--get user folder contents (skip empties)
set user_folders_filtered_contents to {}
repeat with k from 1 to (number of items in user_folders_filtered)
	set user_folders_filtered_contents to user_folders_filtered_contents & paragraphs of listGetter(item k of user_folders_filtered, 1, g_exclusions_macosx)
end repeat

Try taking out paragraphs of that occur before each call to listGetter.

The handlers in post #43 in this thread return a list of Mac pathnames (a list of strings). Your original handler returns a single string that you have to break down with paragraphs of every time you want the list of pathnames (see the the first bullet of the “Changes from previous versions:” section of that post).

Asking for the paragraphs of a list seems to yield an empty list:

set original_listGetter_result to "some list" & return & "of" & return & "string values"
set chrys_listGetter_result to {"some list", "of", "string values"}

paragraphs of original_listGetter_result
--> {"some list", "of", "string values"}

paragraphs of chrys_listGetter_result
--> {} --> I think this is why you are getting empty results

chrys_listGetter_result is equal to paragraphs of original_listGetter_result
--> true --> no need for "paragraphs of" with the result from my version

Getting rid of “paragraphs of” seems to work like a charm. I went with the “slower but better” find -print() version. Not because I know what the blazes is going on, but because I’m taking everyone’s word for it that it’s the most bulletproof version. :wink:

It’s too bad that AppleScript couldn’t handle this efficiently without the shell. Dunno, seems to violate the spirit of using AppleScript. :wink:

Thanks to everyone who helped!

In the main script you included in post #44 you were always calling listGetter() with a depth of 1. I think that can reasonably be handled in AppleScript. Do your requirements still include processing stuff at depth more than 1? Maybe we just got side tracked by all this find stuff. The code is complicated by having to work around a Finder bug, but the core is a single Finder statement:

to listGetter_Finder(folder_to_scan, scan_level, folder_exceptions)
	if scan_level is not 1 then error "listGetter_Finder can not handle scan_level other than 1 at this time"
	try
		tell application "Finder" to (every item of folder_to_scan whose name is not in folder_exceptions) as alias list
	on error m number n from o partial result r to t
		-- Try to pass on errors other than the single-item as alias list error
		using terms from application "Finder"
			set aliasListClass to alias list
		end using terms from
		if n is -1700 and t is aliasListClass then
			tell application "Finder" to {(first item of folder_to_scan whose name is not in folder_exceptions) as alias}
		else
			error m number n from o partial result r to t
		end if
	end try
end listGetter_Finder

I had some ideas for deeper depths, but I could never get them to work out properly (generate the Finder code for the required depth as a string and execute it with run script, but the filter always seemed to complicate things and cause my reference forms to break).

FYI, that bug has been fixed in AppleScript 2.0 (Mac OS X v10.5).

Currently, no. I ended-up looking at the problem as “what is in there 1 deep, then examine the innards of each of those separately.” But I can forsee a future need where I may want to look 2-3 levels deep in a given structure. Planning ahead, basically. :wink:

My script is basically designed to get rid of old files from transfer folders on the server (user-named folders used to move files between workstations). There is a category heirarchy above the user folder. I wanted to make sure the heirarchy could change withou breaking the script, and users could come-and-go (dynamic scanning at a certain level each time). Right now I “scan down” to the first level of a user folder to get modification dates, but I may want to go deeper to root out more old files (for users trying to be clever and fooling the folder modification date).

In addition, I also go back and if they have too much data in the folder, despite the cleaning of older files, an e-mail is sent to pester them. Allof this is logged into Excel for data capture and charting.

Holy crap this thread has progressed a bit… WTH have I been LOL

I meant to just use POSIX file with a repeat loop after the do shell script.

I suppose it wouldn’t help if I were to mention using the find expression -mtime