Loading file list from a folder

Just for completeness, here are a couple of other ways of getting the paths of folder items. None are as lightning-fast as ASObjC, but all are reasonably efficient:

  1. Using the shell’s find command. Advantage: less typing than ASObjC. Disadvantage: returns posix paths rather than hfs paths.

set posixPaths to (do shell script "find " & [parent_folder_hfs_path]'s POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -name '*.pdf'")'s paragraphs

 -- posixPaths contains a list of the posix paths of all pdf files in the parent folder

  1. Using Finder + Applescript’s text item delimiters property. Advantage: simple Applescript solution. Disadvantage: can’t subselect items based on file extension, etc; instead, returns the hfs paths of all top-level folder items.

set {tid, AppleScript's text item delimiters} to {AppleScript's text item delimiters, return}
tell application "Finder" to set hfsPaths to (files of folder [parent_folder_hfs_path] as text)'s paragraphs
set AppleScript's text item delimiters to tid

-- hfsPaths contains a list of the hfs paths of all files in the parent folder	

  1. Using the System Events application. Advantages: 1) Applescript solution even simpler than #2; 2) returns hfs and posix paths. Disadvantage: can’t subselect items based on file extension, etc; instead, returns the paths of all top-level folder items. (Note: Unlike the Finder solution in #2, the System Events solution also returns the paths of hidden files.)

tell application "System Events"
	set hfsPaths to path of files of folder [parent_folder_hfs_path]
	-- and/or --
	set posixPaths to POSIX path of files of folder [parent_folder_hfs_path]
end tell

--or alternatively --

tell application "System Events" to tell files of folder [parent_folder_hfs_path] to set {hfsPaths, posixPaths} to {its path, its POSIX path}

-- hfsPaths and posixPaths contain a list of the hfs and posix paths, respectively, of all files in the parent folder (including hidden files)

Why are POSIX paths a disadvantage? POSIX file … is no more effort than using alias …, and arguably easier than creating a file reference from a HFS path.

(The code also needs to be modified if the file type is a package.)

The Finder may return hidden files too; it depends on whether you have invisible files visible or not. So you should account for them as a matter of course (and there goes some of the simplicity).

I don’t know if it’s still true, but it used to be that certain icon files had names ending with returns. For this reason, I’d use a linefeed as the delimiter here and return the text items rather than the paragraphs:


set {tid, AppleScript's text item delimiters} to {AppleScript's text item delimiters, linefeed}
tell application "Finder" to set hfsPaths to (files of folder [parent_folder_hfs_path] as text)'s text items
set AppleScript's text item delimiters to tid

As this and bmose’s other examples show, the Finder and System Events, although not nearly as fast as ASObjC, aren’t particularly slow at returning particular information about every file in a folder. It’s only when they have to think about whether or not the values conform to certain criteria that they begin to struggle. The same’s true for many applications.

Vanilla AppleScript, on the other hand, can usually check the results very quickly, so it can make sense to have an application simply dump all the relevant data to the script and let the script pick out what it needs. The code below is ten times slower on my machine than the ASObjC script, but that’s three hundredths of a second instead of three thousandths. The user could still live long enough to see the results. :wink:

set extensionList to {"pdf", "zip", "dmg", "jpg"}
set sourceFolder to (path to downloads folder)

set myFileList to listFilesWithGivenExtensions(sourceFolder, extensionList)

on listFilesWithGivenExtensions(sourceFolder, extensionList)
	-- Make some attempt to check the input.
	if ((sourceFolder's class is text) and (sourceFolder begins with "/")) then set sourceFolder to POSIX file sourceFolder
	set sourceFolder to sourceFolder as alias
	tell application "System Events"
		if (properties of disk item (sourceFolder as text) does not contain {class:folder, package folder:false}) then error (sourceFolder as text) & " is not a folder."
	end tell
	copy extensionList to extensionList
	repeat with thisExtension in extensionList
		if (thisExtension does not start with ".") then set thisExtension's contents to "." & thisExtension
	end repeat
	
	-- Get the HFS paths of all the files in the folder.
	tell application "System Events" to set HFSPaths to path of files of sourceFolder
	
	-- Replace all paths having the required extensions with aliases.
	repeat with thisPath in HFSPaths
		repeat with thisExtension in extensionList
			if (thisPath ends with thisExtension) then
				set thisPath's contents to thisPath as alias
				exit repeat
			end if
		end repeat
	end repeat
	
	-- Return the aliases.
	return HFSPaths's aliases
end listFilesWithGivenExtensions

Edit: Unnecessary ‘its’ removed from the handler call in the script.

Fast and elegant ! And a lot of interesting “tricks” in your script.
Thanks !

I actually I have a question:
why do you “copy” the extensionList?

copy extensionList to extensionList

With AppleScript Toolbox:

set extensionList to {"pdf", "zip", "dmg", "jpg"}
set sourceFolder to (path to downloads folder)

tell AppleScript
	set oldTIDs to text item delimiters
	set text item delimiters to "|"
	set theGroup to extensionList as string
	set text item delimiters to oldTIDs
end tell

set regex to "\\.(" & theGroup & ")$"
AST list folder sourceFolder matching regex regex with returning HFS paths

With the option returning HFS paths (, returning file specifiers or returning POSIX paths) set to true the command will not only return the name but the entire path.

Hi.

It’s to preserve the contents of the list set at the top of the script, in case it’s needed again somewhere else in the script.

The list passed to the handler is the actual one set at the top of the script. Although the extensionList variables inside and outside the handler are different variables with the same name, the list they contain is initially the same one. copy extensionList to extensionList makes a copy of the list and resets the local extensionList variable to the copy, so that any changes made during the repeat which follows happen in the copy and not in the original.

It may be a bit confusing that I’ve used the same variable label throughout. But it’s nothing to do with the variables, it’s to do with the list itself. A copy would still have to be made (if you wanted to preserve the original) even if the variables all had different labels. It’s something of which you have to be aware when passing lists to handlers or assigning them to other variables.

set extensionList to {"pdf", "zip", "dmg", "jpg"}
set sourceFolder to (path to downloads folder)

set myFileList to its listFilesWithGivenExtensions(sourceFolder, extensionList)

on listFilesWithGivenExtensions(theSourceFolder, theExtensionList)
	-- Blah blah blah
	
	copy theExtensionList to extensionListCopy -- Make a copy of the list.
	repeat with thisExtension in extensionListCopy
		-- Possibly modify the copy.
	end repeat
	
	-- etc.
end listFilesWithGivenExtensions

Only if the user wishes to work with HFS paths or Applescript aliases rather than POSIX paths, and only marginally so because of the repeat loop that would be needed for the conversions.

While the shell solution presented above is robust owing to the find command’s extraordinary capabilities (of which my example, of course, barely scratches the surface), to my knowledge, the shell offers no easy way to convert from POSIX paths to HFS paths other than to “cheat” via an osascript command. Still, one can do amazing things (and amazing damage if used improperly) with the find command, all the more so if options such as -exec or -delete are incorporated. I would think it should be somewhere in one’s toolbox. :slight_smile:

Thanks for the tip. I wasn’t aware of that potential problem.

Thanks for the nice handler that adds filtering to the System Events solution.

What a clever way to subselect items from an input list!

I was thinking that the text item delimiters property might be a particularly efficient way of filtering paths. I did a quick execution speed test comparing a variation of your method vs Nigel’s technique of nested repeat loops. I must admit that I fully expected the former to be faster, but lo and behold, the nested repeat loops turned out to be about 1.7 x faster:


tell application "System Events" to set hfsPaths to path of files of folder [HFS path to parent folder containing 600 files with various file endings, including "pdf", "txt", and others]

set extensionList to {".pdf", ".txt"}

on getPathsViaTID(hfsPaths, extensionList)
	set tid to AppleScript's text item delimiters
	try
		set AppleScript's text item delimiters to extensionList
		set targetPaths to {}
		repeat with thisPath in hfsPaths
			tell thisPath's text items to if (length > 1) and (last item = "") then set end of targetPaths to thisPath's contents
		end repeat
	end try
	set AppleScript's text item delimiters to tid
	return targetPaths
end getPathsViaTID

on getPathsViaRepeatLoops(hfsPaths, extensionList)
	set targetPaths to {}
	repeat with thisPath in hfsPaths
		tell thisPath's contents
			repeat with thisExtension in extensionList
				if (it ends with thisExtension's contents) then
					set end of targetPaths to it
					exit repeat
				end if
			end repeat
		end tell
	end repeat
	return targetPaths
end getPathsViaRepeatLoops

-- Result: For 100 repetitions of each handler, getPathsViaRepeatLoops was about 1.7 x faster than getPathsViaTID!

It’s not what the text items are for in my example. AST list folder command returns the contents of a folder using CoreFoundation’s URL enumerator. I use the text items to create an regular expression to filter out names, like you can do with AST copy list command.

No it’s not, it’s not even a superset. It’s a bridge, it’s a hack into a runtime of another environment whose paradigm is completely different. It’s as pure as PyObjC is pure Python, which is not. ASObjC is more limited than PyObjC and basically one direction. Even if the AppleScript engineers have updated AppleScript many times to make ASObjC less like an alien in the AS language, it doesn’t make it pure AppleScript. It’s principally the same as the call method command back in AS-Studio (which was only limited to classes and didn’t work with instances).

No so much clever as not particularly well known — nor, I suppose, very often useful. If a list contains AppleScript objects, they can be referenced by class in the same way as, say, words or paragraphs in text, or files or folders in a folder belonging to the Finder or System Events.

set aList to {path to desktop, "aardvark", 17, {1, 2, 3}, 4, "hello", {a:"apple", b:"banana"}, 5.0, 7, "world"}

aList's records --> {{a:"apple", b:"banana"}}
aList's third integer --> 7
count aList --> 10
count aList's text --> 3
count aList's first text --> 8
-- etc.

DJ’s already pointed out that the delimiters in his script are only used to put together the regex. Everything else is done by his OSAX. On my machine, the speed’s about the same as the ASObjC solution further up this thread — that is, about ten times as fast as my vanilla script.

‘my extensionList’ refers to the extensionList variable in the run handler, not the local parameter variable of the same name. They contain the same list, so it’s not a problem. But correctly, either the ‘my’ should be omitted or the extension list not passed as a parameter.

A basic idea behind AppleScript is its “plug-in” architecture. There’s the core language, which is actually quite small but can do a lot. Then there’s the ability to add commands supplied by separate OSAXen and commands belonging to applications whose authors have included suitable scripting interfaces. Over the years have been added the ability to run shell scripts, simulate user actions in the GUI, and recently to access some of the system’s Objective-C frameworks. On the one hand, it’s a bewildering array of things to learn. On the other, it offers a vast choice of solutions from which an expert can select what he/she feels is the most appropriate. On the third hand ( :slight_smile: ), it can be approached from a number of different directions to suit people coming from different programming backgrounds. Complete beginners (English speakers, at least) should find the core language fairly easy to grasp. People familiar with Unix or languages like Python or Ruby can go straight to ‘do shell script’ and achieve a lot of what they want to do straight away in a way that they already know. Hard-core Objective-C programmers should be able to adapt to ASObjC without too much trouble if they need to. Once a start’s been made, you can “add on” any additional knowledge you need much as the language adds on extensions.

So “pure AppleScript” is a term rather like “thoroughbred mongrel”. In as far as it means anything, I’d personally regard “pure AppleScript” as being the core language and (when I’m in a flexible mood!) the StandardAdditions OSAX.

I could probably live with your inflexible definition. But I struggle with the need to Balkinize in the first place, especially with loaded terms like “pure”. The core language is not very useful by itself – that’s why the hack that is scripting additions was added before it was even released – and in many ways it’s stuck in a time warp. It’s the “impure” bits that have helped keep it alive.

Sorry, that was a typo left behind from when I first put the code together and had extensionList coded as a property rather than a local variable. I corrected the entry.

Of course. My mistake. I had been thinking of using text item delimiters as in my example and noticed its presence in yours, but didn’t look closely enough to see the difference in usage.

It’s clear from the posts above that there are multiple ways of getting the HFS paths or Applescript aliases of files of a folder. But as I mentioned earlier, I am not aware of any shell solutions to getting that information, other than to “cheat” with the osascript command. So for the fun of it and just so that it’s out there, I put together the following shell solution that is not quite pure since it requires an Applescript run script command, but the heavy lifting is done by the shell.

To get the HFS paths or Applescript aliases of all pdf files in a parent folder:


set hfsPaths to run script "{" & (do shell script "find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' -exec echo '\"'{}'\" as POSIX file as text' \\; | tr '\\n' ',' | sed -E 's/,$//'") & "}"

--or--

set applescriptAliases to run script "{" & (do shell script "find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' -exec echo '\"'{}'\" as POSIX file as alias' \\; | tr '\\n' ',' | sed -E 's/,$//'") & "}"

And to get the HFS paths or Applescript aliases of all pdf, txt, and jpg files in a parent folder:


set hfsPaths to run script "{" & (do shell script "find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 \\( -iname '*.pdf' -o  -iname '*.txt' -o  -iname '*.jpg' \\) -exec echo '\"'{}'\" as POSIX file as text' \\; | tr '\\n' ',' | sed -E 's/,$//'") & "}"

--or--

set applescriptAliases to run script "{" & (do shell script "find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 \\( -iname '*.pdf' -o  -iname '*.txt' -o  -iname '*.jpg' \\) -exec echo '\"'{}'\" as POSIX file as alias' \\; | tr '\\n' ',' | sed -E 's/,$//'") & "}"

Notes: 1) The -name primaries have been changed to -iname so that file extension searching will be case-insensitive. 2) Since this post was first submitted, the curly braces have been transferred from the do shell script command to the run script command so that an empty list will be returned in the case of no matching files. 3) This approach will fail if an HFS path of an item in the parent folder has a double-quote character in its name.

Here are the same solutions but with the curly braces incorporated into the do shell script command, and with additional examples in which all files are returned (i.e., no filtering is performed based on file name extension):

To get the HFS paths or Applescript aliases of all files in a parent folder:


set hfsPaths to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -exec echo '\"'{}'\" as POSIX file as text' \\; | tr '\\n' ',' | sed -E 's/,$// ; s/(.+)/\\1/')}\"")

--or--

set applescriptAliases to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -exec echo '\"'{}'\" as POSIX file as alias' \\; | tr '\\n' ',' | sed -E 's/,$// ; s/(.+)/\\1/')}\"")

To get the HFS paths or Applescript aliases of all pdf files in a parent folder:


set hfsPaths to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' -exec echo '\"'{}'\" as POSIX file as text' \\; | tr '\\n' ',' | sed -E 's/,$// ; s/(.+)/\\1/')}\"")

--or--

set applescriptAliases to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' -exec echo '\"'{}'\" as POSIX file as alias' \\; | tr '\\n' ',' | sed -E 's/,$// ; s/(.+)/\\1/')}\"")

And to get the HFS paths or Applescript aliases of all pdf, txt, and jpg files in a parent folder:


set hfsPaths to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 \\( -iname '*.pdf' -o -iname '*.txt' -o -iname '*.jpg' \\) -exec echo '\"'{}'\" as POSIX file as text' \\; | tr '\\n' ',' | sed -E 's/,$// ; s/(.+)/\\1/')}\"")

--or--

set applescriptAliases to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 \\( -iname '*.pdf' -o -iname '*.txt' -o -iname '*.jpg' \\) -exec echo '\"'{}'\" as POSIX file as alias' \\; | tr '\\n' ',' | sed -E 's/,$// ; s/(.+)/\\1/')}\"")

Hi bmose.

‘run script’ isn’t “cheating”, of course. :wink:

I find these to be faster and slightly more thorough:

set parent_folder_hfs_path to (path to downloads folder)

set hfsPaths to run script (do shell script "find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' | sed -E 's/\\\\|\\\"/\\\\&/g; s/^.*$/\"&\" as POSIX file as text,¬/; 1 s/^/{/; $ s/,¬$/}/'")

--or--

set hfsPaths to run script (do shell script "find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' | sed -E 's/\\\\|\\\"/\\\\&/g; s/^.*$/\"&\" as POSIX file as alias,¬/; 1 s/^/{/; $ s/,¬$/}/'")

‘find’ simply returns the relevant files’ POSIX paths. The ‘sed’ codes double-escapes any quotes or backslashes in them, enquotes them and adds the AppleScript code, inserts an opening brace at the beginning of the first line, and edits a closing brace onto the end of the last.

Nice tweaks! It’s more streamlined. I made one slight adjustment: I pulled your curly braces out of the sed command and put them in a wrapping echo command so that in the case where no matching files are found, an empty list rather than no result is returned. (Also, I used applescriptAliases for the second statement’s variable name :).)


set hfsPaths to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' | sed -E 's/\\\\|\\\"/\\\\&/g; s/^.*$/\"&\" as POSIX file as text,¬/; $s/,¬$//;')}\"")

--or--

set applescriptAliases to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' | sed -E 's/\\\\|\\\"/\\\\&/g; s/^.*$/\"&\" as POSIX file as alias,¬/; $s/,¬$//;')}\"")

Ah. Right! I hadn’t realised sed wouldn’t be triggered in such cases.

Oops! :rolleyes:

I thought I might submit this to Code Exchange, given that there is pretty much nothing out there about using the shell to get HFS paths and AppleScript aliases. One question: Is it really necessary to “doubly” escape the double-quote character in the first sed command? This seems to work just as well:


set hfsPaths to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' | sed -E 's/\\\\|\"/\\\\&/g; s/^.*$/\"&\" as POSIX file as text,¬/; $s/,¬$//;')}\"")

Under your ‘echo’ scheme, the ‘sed’ code’s a string embedded in an ‘echo’ string in a shell script represented by an AppleScript string. The ‘sed’ code edits text returned by ‘find’ which may contain quote or backslash characters. These characters have to be doubly escaped in the AppleScript text to be received correctly by ‘sed’, which then has to add enough escapage to any matches so that, after everything’s gone through ‘echo’, there’s enough escapage left to to doubly escape the characters in the path string(s) represented within the AppleScript text returned by the shell script. Simple really. :wink:

set parent_folder_hfs_path to (path to downloads folder)

set hfsPaths to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' | sed -E 's/\\\\/\\\\\\\\\\\\\\\\/g; s/\\\"/\\\\\\\\\"/g; s/^.*$/\"&\" as POSIX file as text,¬/; $ s/,¬$//')}\"")

--or--

set applescriptAliases to run script (do shell script "echo \"{$(find " & parent_folder_hfs_path's POSIX path's quoted form & " -mindepth 1 -maxdepth 1 -iname '*.pdf' | sed -E 's/\\\\/\\\\\\\\\\\\\\\\/g; s/\\\"/\\\\\\\\\"/g; s/^.*$/\"&\" as POSIX file as alias,¬/; $ s/,¬$//')}\"")

Edit: Explanation rewritten.

or try to use the ‘satimage.osax’ with ‘list files’:
http://www.satimage.fr/software/en/dictionaries/dict_satimage.html#SatimageFileAdditions.listfiles