Newbie Here...Would Appreciate Some Help With a Finder Script...

I have done some – but much more to do – reading on AppleScript to do the following:

  1. Identify those files in my Documents folder whose name [based on their full path] is > x characters

  2. Import the list of the file identified [including their full path] and character count in 1. above to an Excel spreadsheet

  3. Sort the resulting excel spreadsheet by character count from highest to lowest

I have searched the forum to see whether I could find some code to get me started but came up empty.

Would appreciate either being pointed in the right direction or some code to get me started.

Thanks.

Try this

set minCount to 50 -- minimum number of characters you want included in your list
set uName to do shell script "whoami" -- get the user name
set docPath to "Users/" & uName & "/Documents/" as string -- path to documents folder
set fileList to list folder docPath without invisibles
set pathList to {}
set charCounts to {}
repeat with i in fileList
	set j to docPath & i as string
	set charCount to (count of j)
	if charCount > minCount then
		set end of pathList to j
		set end of charCounts to charCount
	end if
end repeat
set listCount to (count of items of pathList)
tell application "Microsoft Excel"
	activate
	make new workbook
	set theBook to the active workbook
	set theSheet to active sheet of theBook
	set bookName to name of theBook
	set sheetName to name of theSheet
	tell workbook theBook
		tell worksheet theSheet
			repeat with i from 1 to listCount
				set value of cell ("A" & i as string) to (item i of pathList) as string
				set value of cell ("B" & i as string) to (item i of charCounts) as string
			end repeat
			sort special range ("A1:B" & listCount as string) of worksheet theSheet key1 (range "B1" of ¬
				worksheet theSheet) order1 sort descending
		end tell
	end tell
end tell

bebout:

Appreciate the script noting that I am in the midst of writing my own – figured it would be a good learning experience – and will certainly use yours should I run into problems.

Thanks and greatly appreciated,

Joel

I am updating this script as I have done some work on this on my own [i.e. I want to write my own script and, in point of fact, I have made a good deal of progress but have now hit a wall].

The extract from the script in question is as follows:



-- Pick the folder (and sub-folders) whose filename lengths will be compared / tested against maxCount
	set listFiles to choose folder with prompt "Choose the disk / directory / folder whose file lengths' will be compared against maxCount" default location (path to home folder) without invisibles
	set FolderChosen to result
	set FolderChosenList to list folder FolderChosen without invisibles		
	
	repeat with aFolder in FolderChosenList
		set folderPath to FolderChosen & aFolder & ":" as string		
		
		set fileList to entire contents of folderPath
		
		repeat with aFile in fileList
			set fileCount to count aFile
		end repeat

	end repeat

I am trying to cycle through the files in the user selected folder [and all sub-folders] so that I can count / test the length of each file and add any file whose length exceeds maxCount to a list.

The problems I am having are:

  1. The above code is not working as I think it should in that a) I am getting an error message and the script editor error message is not helpful in debugging it ii) I am not sure it will cycle through the sub-folders or the sub-folders; and

  2. Though I would like to get the above code fixed I wonder whether there is an easier way to write this section of code in a way that an AppleScript beginner can understand [noting that I don’t understand the code in the immediately preceding post as it appears to test the character count of the folders and not the files therein].

Thanks.

Hi. “Entire contents” is a Finder command that gets contents, en masse. There’s no need for you to list the chosen folder and reconstruct names, because Finder can do that work. That said, if there are many files, entire contents will be incredibly slow. Observe how this works in the event log:

set maxCount to 20
set FolderTarget to (choose folder with prompt "Choose the disk / directory / folder whose file lengths' will be compared against maxCount" default location (path to documents folder))
set theList to {} --for population 

tell application "Finder" to repeat with aFile in (get FolderTarget's entire contents's files) --a target in (all files of all folders)
	set theName to aFile's name
	tell (count theName) to if it < maxCount then set theList's end to {theName, it} --returns paired lists in a list
end repeat

theList

Marc:

Appreciate the helpful response but I do have a few follow ups:

  1. As you note entire contents is extremely slow and, in point of fact it is so slow that it times out. How would the code change to simply extract the “full filenames” inclusive of their paths.

  2. I also note that the code also needs to successfully deal with all folders and sub-folders of documents because I need to check each and every file embedded within the documents folder for its length. This is proving – at least for me – to be difficult [i.e dealing with the sub-folders].

  3. I get your point about not needing to reconstruct Finder’s native functionality.

I should add that I am NOT familiar with shell scripts [at least not as of yet] so am looking code APpleScript code to do this.

Thanks…

Hello.

Here is a way to return posix paths of the file list returned from Finder.

First of all we demand to get an aliaslist back, which is a special type that (only?) Finder returns.

Then we use the posix path property for the individual aliases and stuff them into a new list.

I have used a script object - and a property to speed up the process some. Now, you could have also used the aliases in the list and used text processing to turn the aliases into paths, but I somehow doubt that to be any faster.

script o
	property l : {}
end script

tell application "Finder"
	set fileAliasList to items of target of window 1 as alias list
end tell
if fileAliasList = {} then error number -128 # nothing to do
set o's l to fileAliasList # Assigns the fileAliasList to o'sl to speed up things.
set posixPathList to {}
repeat with i from 1 to (count o's l)
	set end of posixPathList to (POSIX path of item i of o's l)
end repeat
posixPathList

I have included a version that returns hfsPaths as well: here we just coerce the aliases to text. :slight_smile:

script o
	property l : {}
end script

tell application "Finder"
	set fileAliasList to items of target of window 1 as alias list
end tell
if fileAliasList = {} then error number -128 # nothing to do
set o's l to fileAliasList # Assigns the fileAliasList to o'sl to speed up things.
set hfsPathList to {}
repeat with i from 1 to (count o's l)
	set end of hfsPathList to (item i of o's l as text)
end repeat
hfsPathList

McUsril:

Appreciate your response but the script that you provided does not work as it suffers from the same problem that my very own code does in that it does not work its way through all the various subfolders and files…for example, if window 1 contains only folders [call these the “A Items”] with said A Items containing both nested subfolders and files [call these the "B Items] then your script only lists the “A Items”.

I / we need a way to also list the “B Items”. Any ideas?

Thanks for trying.

Hi,

you can search for all files with a specific length (relating to the full path) in a folder including subfolders with the shell using find and awk, for example


set maxLength to 50
set documentsFolder to POSIX path of (path to documents folder)
set foundFileList to paragraphs of (do shell script "/usr/bin/find " & quoted form of documentsFolder & " -type f | awk 'length>" & maxLength & "'")

The result is a list of POSIX paths

StefanK

Appreciate the response – truly – but as I am very new to scripting I am trying to do this with AppleScript as I have yet to read up or tackle shell scripts.

I would be happy to learn and take a look at shell scripts but would appreciate a link / reference to a good starting point. For example, is there a library of shell scripts with their related parameters that is similar to DOS’ where I could, again for example, type “ipconfig /?” in a command prompt window to get all the parameters / settings for ipconfig?

Thanks.

Hello. It was really just an example of how to return the file list, as something different than file references. I think Stefan has shown you the way to go with the find command. You can read about the find command in the terminal window by entering man find, or you can google it, and maybe see some clever usages.

Hi. If you really want to use a vanilla method with Finder, you could attempt a subsearch of folders; this should make it a bit faster, as there are (probably) far fewer folders than files. It will still be relatively slow, just maybe not glacially so.

set maxCount to 20
set FolderTarget to (choose folder with prompt "Choose the disk / directory / folder whose file lengths' will be compared against maxCount" default location (path to documents folder))
set theList to {}

tell application "Finder" to repeat with oneFolder in FolderTarget's entire contents's folders as alias list --all subfolders within the selection
	repeat with aFile in (get oneFolder's files as alias list)
		set theName to aFile's name
		tell (count theName) to if it < maxCount then set theList's end to {theName, it} --returns paired lists in a list
	end repeat
end repeat

theList

Stefan’s method will be significantly faster at obtaining the paths*, although I’m not sure if counting the entire path length will work as intended, as folder length will also vary. There are a few searchable shell tutorials on this site, and, if you want to see the command options, you can either type “man”, followed by the command, in Terminal”or use this script, courtesy of Nigel Garvey, in the script editor:

do shell script "man find | col -b | sed ' s/`/'\\''/g ; s/'\\'\\''/\"/g ;' "

–the only thing you change is the word after man; e.g. “man ls” or “man man.”

EDIT:

  • I explored a similar idea to Stefan’s find. This variant sidesteps the need for Awk.
set theFiles to (do shell script "find -E " & my ((choose folder)'s POSIX path's quoted form) & " -iregex '.{51,}' -type f")'s paragraphs

If I’ve understood the aim correctly (ie. get a list of all the files in a certain folder hierarchy whose full HFS paths contain more than a certain number of characters), this variation may be worth trying:

set maxCount to 100 -- Or whatever.
set FolderTarget to (choose folder with prompt "Choose the disk / directory / folder whose file lengths' will be compared against maxCount" default location (path to documents folder))

script o
	property allpaths : missing value
	property theList : {} --for population
end script

-- Get the paths to all the files in the chosen hierarchy.
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to linefeed
tell application "Finder" to set o's allpaths to text items of (files of entire contents of FolderTarget as text)
set AppleScript's text item delimiters to astid

-- Pick out the paths which are longer than the trigger length and append the corresponding aliases to the result list.
repeat with i from 1 to (count o's allpaths)
	set thisPath to item i of o's allpaths
	if ((count thisPath) > maxCount) then set end of o's theList to thisPath as alias
end repeat

return o's theList

Hello.

As I did understand it, the question was to find all filenames that was longer than 50 characters.

I revamped Stefan’s solution a little bit, by giving awk a file separator of “/”, and only count the length of the last field, which is the filename in a posix path. :slight_smile:

set maxLength to 50
set documentsFolder to POSIX path of (path to documents folder)
set foundFileList to paragraphs of (do shell script "/usr/bin/find ~/Documents -type f 2>/dev/null |awk 'BEGIN {FS=\"/\" } {if (length ($NF) > 10 ) print $0 }'")
foundFileList

Edit

I had forgotten that awk doesn’t automatically print out the result of a test, when the test is within braces, which I had to have all the time I used a BEGIN block to specify the field-separator.

That issue is now fixed.

If you’re running 10.9 or 10.10, this is considerably quicker:

use AppleScript version "2.3.1"
use scripting additions
use framework "Foundation"

-- get the path
set thePosixPath to POSIX path of (choose folder)
set theList to my getFilesIn:thePosixPath withNamesAbove:50

on getFilesIn:thePosixPath withNamesAbove:charLimit
	-- make it into an NSURL
	set anNSURL to current application's |NSURL|'s fileURLWithPath:thePosixPath
	-- get the file manager
	set theNSFileManager to current application's NSFileManager's defaultManager()
	-- ask file manager for enumerator of the folder, skipping invisibles and files in packages
	set theOptions to (current application's NSDirectoryEnumerationSkipsPackageDescendants as integer) + (current application's NSDirectoryEnumerationSkipsHiddenFiles as integer)
	set theEnumerator to theNSFileManager's enumeratorAtURL:anNSURL includingPropertiesForKeys:{} options:theOptions errorHandler:(missing value)
	-- make array from enumerator
	set theNSArray to theEnumerator's allObjects()
	-- convert array of NSURLs into array of paths
	set theNSArray to theNSArray's valueForKey:"path"
	-- create filter
	set thePred to current application's NSPredicate's predicateWithFormat_("lastPathComponent.length > %@", charLimit)
	-- filter the array and coerce to a list
	set theList to (theNSArray's filteredArrayUsingPredicate:thePred) as list
	return theList
end getFilesIn:withNamesAbove:

If you want the test based on total path length, change lastPathComponent.length to just length.

Apologies for the delay in responding but I had a number of time sensitive items to deal with at work.

Appreciate the assistance and input…it is interesting that something that – at least to me – would be a basic need [i.e. spanning / searching a folder and all its subfolders] is so complicated.

A few follow up to the previous post:

  1. Although I post that I did not want to use a shells script and wanted to use AppleScript I did squeeze in some reading time and am reversing course as it seems the “find” command was made for this thing as it recursively depends the directory tree [which AppleScript has no way of easily doing].

I do however need to understand the code I am using so – at least for now – will not be using the awk command as I have yet to read up on it.

I have decided to therefore use – at least for now – as less efficient two step approach in the I will i) assign all the filenames in the selected folders and all subfolders to a variable as a list and ii) loop through the resulting list to identify those filenames [inclusive of the full path] that are over a user defined number of characters.

I tried using the below code but am getting an error message which reads “Finder got an error: sh: /usr/bin/find/Users/JoelC/Documents/American Express/type: Not a directory”…how do I fix the syntax to get this to work…this is terribly frustrating as I can get the find commend to work in terminal but for some reason not within a shell script…would greatly appreciate the fix!



-- Pick the folder (and sub-folders) whose filename lengths will be compared / tested against maxCount	
	set folderTarget to (choose folder with prompt "Choose the disk / directory / folder whose file lengths' will be compared against maxCount" default location (path to home folder))
	set folderTarget to POSIX path of folderTarget
	
	
	--set filesAll to get files of folderTarget 
	set filesAll to paragraphs of (do shell script "/usr/bin/find" & quoted form of folderTarget & "type -f")
	

[color=red]<<EDIT @ 7:15 EDT: I think that this has been fixed…I have changed the code to:

set filesAll to paragraphs of (do shell script “/usr/bin/find " & quoted form of folderTarget & " -type f”)>>[/color]

  1. I have a number of files with long files names – some in excess of 255 characters – with the consequence that when I copy / move files from the originating drive to secondary drives not all the files are being properly copied.

I therefore want to identify all files over 255 characters to avoid this happening but – and here is the question – does the 255 character limit apply to HFS file format/nomenclature OR the POSIX file format/nomenclature?

While on the point – if the answer is that I should be using the POSIX file format/nomenclature then at the risk of asking a naive question what character does the count start with…for example, if a file were names /User/Joelc/Documents/test_file.docx then what is the first character in the 255 character limit [i.e. is it the “/” in “/User”, is it the “U” in “User”, etc.?]?

  1. @StefanK at your post #9

Appreciate this and will give it a go once i) I get understand the fix to 1. above and ii) I have time to read up on the awk command.

  1. @ Marc Anthony at your post #12

Appreciate your suggestion which I tried but cannot report back on as my MBA timeout…it seems – as I had this experience with my own attempting coding – that the entire contents command/dump is very slow to execute.

I will come back to this if I cannot get a solution to 1. above which I hope will work once I get my syntax sorted.

Thank you for your referral to Terminal / man find which was helpful…this learning curve --though enjoyable – is steeper than I anticipated!

  1. @ Nigel Garvey at your post #13

Appreciate you jumping in and taking the time to help again – truly!

Appreciate – as in post # 12 – your suggestion which I tried bit cannot report back on as my MBA timeout…it seems – as I had this experience with my own attempting coding – that the entire contents command/dump is very slow to execute.

I will come back to this if I cannot get a solution to 1. above which I hope will work once I get my syntax sorted.

  1. @ McUsrll at your post # 14

Appreciate your response and will try this out later tonight and report back…thank you!

Hello Joel.

I think you should look away from my solution then, and stick with Stefan’s, as I have understood that you mean the filename to include the full path. (The jargon is that a filename is just the filename, whereas a pathname is the whole thing by the way.)

There is a lot of reading up at once, when you dive into the unix world of arcane commands. I suggest you just try Stefan’s command, and just trust it. Awk, regards each line of input as a record, what Stefan use awk for, is just to test each line length, which is a path returned from the Find command, and then Awk only prints those lines with a length exceeding the number of characters.

To clear this confusion up I do mean the pathname [i.e. \User\Joelc\Documents\Folder N\filname_of_this_file.ext]. Apologies for any confusion this caused.

No kidding…it can be overwhelming!

I am almost there through a combination of methods as I note in post # 16 to this thread [i.e. shell script to get the list of files followed by a repeat loop to test for length] and I really want to try to the greatest extent possible to use code that i understand…I think for now my focus is coding as much of the script as possible based on my working knowledge and clean it up/ improve it as my knowledge improves.

Really appreciate your help!

A change of pace…a non-programming related question…

I am working through my script and testing the results…I noticed that when I run the script on a sub-folder of the Documents folder I get the proper folder and file count [i.e. the scripts count matches Finders get info count], at least for the sub-folders I have used for testing purposes…BUT I noticed that when I run the script on the Documents folder the scripts folder and file count is less than Finder’s folder and file count…this is a problem because I want to make sure that I am processing all files…

Though I will try to identify whether there are any folders that are being omitted by the shell script find command I am wondering whether anyone has an obvious / known explanation for this anomaly.

Thx…

It’s unclear what code you are actually testing, but Finder’s get info isn’t accommodating for entire path lengths and the shell also returns invisible files, such as DS_Store, so that may account for the discrepancies.