Parse a list of files based on extensions and process accordingly

Hi, I’m really stuck and need help badly. This is a simplified example:


-- no need to choose this folder, as it is preset by a weekly variable.
set theFolder to ((path to documents folder from user domain as text) & "Week01")
set contentsList to list folder theFolder

Here is where I need help. The list may consist of anything from 400 to 900 files and a few subfolders.

Edit and process the list automatically:

Remove the following from the list:

  • all subfolders
  • all files with the extension “jpg”, “tiff”, “png”

Check if exists and if yes, move them to the trash:

  • any files with a 7 character extension extension, except when the extension is “convert”
  • any files without extension

Check if exists

  • any files with the extension made of greater then 4 (but not 7) characters
  • any remaining files from the original list
    and if yes, write their file name and extension (no path required) as a list to a text file:
    ((path to documents folder from user domain as text) & “Week01:'Files_to_Check.txt”)

Please, can anybody code this? My gratitude will be endless.

Hi guys, did I say anything offensive or did I ask for too much? Can any charitable please, put me out of my misery?

This is a slightly shorter version of my original post. I tried a number of options and they all fail at my inability to sort a list of files by the ‘name extension’:


tell application "Finder"
	set theItems to every file of folder ((path to documents folder from user domain as text) & "Week01")
	-- set theItems to (sort theItems by name) -- works
	-- set theItems to (sort theItems by size) -- works
	-- set theItems to (sort theItems by modification date) -- works
	set theItems to (sort theItems by name extension) -- AppleEvent handler failed, number -10000
	set theNames to {}
	repeat with oneFile in theItems
		set end of theNames to name of oneFile
	end repeat
end tell

Sorting by ‘name’, ‘size’ or ‘modification date’ direct or in revers work. However, sorting by ‘name extension’ fails with the error ‘AppleEvent handler failed. Number -10000’ Why? How should it be done?

The second part of my question follows a successful sort by ‘name extension’:
How to move all files with a 7 character extension, with the exception of the extension ‘convert’, to the trash automatically?

If anybody wonders, the files to delete have random 7 alphanumeric characters created by an app when the user all-to-often does silly thinks. Bad app too. The files with the ‘convert’ extension are the good ones.

Again, please help me.

Hi, flex20.

Probably too much. You asked for “help” with a fairly involved script for which you’d only written two lines. Coupled with that, it’s the weekend and the Olympics are on, so possibly not many people are at their computers. Whatever the reason(s) for not getting a reply, importuning never helps.

However, as it happens, I was looking at your file-identification problem myself this afternoon. Hopefully this will get you started:


-- no need to choose this folder, as it is preset by a weekly variable.
set theFolder to ((path to documents folder from user domain as text) & "Week01:")

try
	-- Get the names of the folder's contents, excluding subfolders and files with extensions ".jpg", ".tiff", ".png", or ".convert". There may be an empty line at the end, but it's harmless.
	set contentsList to (do shell script ("ls -p " & quoted form of POSIX path of theFolder & " | grep -vE '([.](jpg|tiff|png|convert)|/)'") without altering line endings)
on error
	error "The folder's either empty or non-existent."
end try

try
	-- Get a subset of the above names which have seven-character extensions (excluding the dots) or no extensions.
	set forTrash to paragraphs of (do shell script ("echo " & quoted form of contentsList & " | grep -E '(^[^.]+|[.].{7})$'"))
	-- Delete the files with these names. (Other ways may be faster.)
	tell application "Finder" to delete (every file of folder theFolder whose name is in forTrash)
on error msg
	-- No matches.
end try

try
	-- Get a subset having four-to-six-character extensions or eight-or-more-character extensions. Write these data to a text file in the folder. (Overwrites any exisiting data.)
	set forNoting to (do shell script ("echo " & quoted form of contentsList & " | grep -E '([.](.{4,6}|.{8,}))$' > " & quoted form of (POSIX path of theFolder & "Files_to_Check.txt")))
on error
	-- No matches.
end try

With regard to your ‘sort theItems by name extension’ difficulty, I don’t think the Finder can sort things by ‘name extension’, only by the properties by which it sorts columns in list view. The nearest equivalent to ‘name extension’ would be ‘kind’.

Hi Nigel,
Sorry for my original post with the two-line code. I had many more code line tries, too silly to show. On top of it, I’m suffering of Olympics induced sleep depravation here in Sydney, Australia. Just watched Bolt wining. By the way, Britain is wiping the floor with Australia when it comes to medals. Congratulations.

Many, many thanks for your script. Using shell scripts with ls and grep was the secret. I’m just reading more on grep. It is incredibly powerful.

There is just one issue I’d appreciate if you could fix:


try
	-- Get a subset having four-to-six-character extensions or eight-or-more-character extensions. Write these data to a text file in the folder. (Overwrites any exisiting data.)
	set forNoting to (do shell script ("echo " & quoted form of contentsList & " | grep -E '([.](.{5,6}|.{8,}))$' > " & quoted form of (POSIX path of theFolder & "'Files_to_Check.txt")))
on error
	-- No matches.
end try

As per your comment, this section of the script is supposed to write a file with the subset of the original list, consisting only of the files with extensions of 5, 6, 8, and more characters (I removed 4 because of “.tiff”). In fact it also adds all files excluded in the first section.

If I haven’t exhausted my welcome, could you please fix that?

The only changes I will then make to my final script will be adding a date & time stamp “_ww_hhmmss” to the output file name and lock it for editing to keep a record of the files which needed checking that week. The date & time stamp will also prevent overwriting.

Thanks again,
Chris

Nigel,

Sorry, but my sleep depravation seems to really affect me.

The questionable files can in fact have extensions of anything between 4 to 6 and 8 and 9 (can be 4 but not 7, which are deleted in section 2) alphanumeric characters. Their names must be written to the output text file, but not any files with “.tiff” or “.convert”, or any other of the files and subfolders excluded in the first section of the script (the excluded files and folders are the correct ones and there are many hundreds of them, better than 90% of all).

Can you please modify the third section accordingly?

My embarrassment is also real.

Many thanks,
Chris

Hello!

I just give you some nuts 'n bolts here

This one can help you with the sort issues by extension, when fed a list with sublists like:

set ml to {{"fileone.withextension","withextension"},{"filetwo.withotherext","withotherext"}}

say this was the two only elements.

then you could use the handler like this:

quickSortL(ml, 1, 2, 2)

here is the handler:


on quickSortL(theList, theLeft, theRight, itemNo)
	set i to theLeft
	set j to theRight
	set v to item itemNo of item ((theLeft + theRight) div 2) of theList -- pivot
	repeat while (j > i)
		repeat while (item itemNo of item i of theList < v)
			set i to i + 1
		end repeat
		repeat while (item itemNo of item j of theList > v)
			set j to j - 1
		end repeat
		if (not i > j) then
			tell theList to set {item i, item j} to {item j, item i} -- swap
			set i to i + 1
			set j to j - 1
		end if
	end repeat
	if (theLeft < j) then quickSortL(theList, theLeft, j, itemNo)
	if (theRight > i) then quickSortL(theList, i, theRight, itemNo)
end quickSortL

You would have to use Applescript, to construct such a list, you’ll find plenty of code around here doing just that.

As for returning a list with just those extension with a length of 5,6,8 you could use something like this, assuming the same list:



set resultList to _filterL(ml,correctExtLength,2)

	on _filterL(L, crit, itemNo)
-- © Matt Neuburg AppleScript The Definitive Guide Second Edition.
-- reworked to work on lists of lists, finding criteria in chosen item of the sublist
		script filterer
			property criterion : crit
			property itemnum : itemNo
			on _filter(L)
				if L = {} then return L
				if criterion(item itemnum of item 1 of L) then
					return {item 1 of L} & (_filter(rest of L))
				else
					return _filter(rest of L)
				end if
			end _filter
		end script
		return filterer's _filter(L)
	end _filterL

on correctExtLength(x)
	local ct
	set ct to count x
	if ct = 5 or ct = 6 or ct = 8 then return true
	return false
end correctExtLength

Hi McUsr,

Many thanks for the much needed nuts and bolts. I’m learning quite a bit about sorting and filtering technics. I’ll do some testing when I get up. It’s 7:30 am here in Sydney. I’m going to bed for a few hours. This is what the Olympics do to you Down Under.

Best regards,
Chris

Gosh! It’s on all through the night in your part of the world! You must be knackered!

I don’t understand this. The subset’s derived from ‘contentsList’, so clearly it can’t contain anything excluded in that, and I can’t see how it could share any content with ‘forTrash’. Maybe ‘contentsList’ contains stuff it shouldn’t. Can you give a few examples of file names you don’t want in the file which are slipping through?

Sorry. I should have written ‘5’ myself instead of ‘4’ in the grep pattern. You did specify extensions greater than 4 characters. Also, the ‘forNoting’ variable was from earlier tests when I just wanted to see what was in that particular text. There’s no point in setting a variable now the shell script writes the text to the file instead of returning it to the script.

Hello!

Here is another little piece, which constructs the filelist the two other handlers use, you must modify it to point it to the correct folder.


tell application "Finder" to set {theNames, theExtensions} to (get {name, name extension} of its every file of desktop folder)
set ct to (get count theNames)
set theList to {}
set i to 1
repeat ct times
	set end of theList to {item i of theNames, item i of theExtensions}
	set i to i + 1
end repeat
log theList

Hi Nigel,

After a six hour sleep during the day here and two mugs of black coffee, I’m almost as new, or so I think. I also have proof positive that late night watching the Olympics with friends, bier, posting on the forum and testing AppleScrips, don’t mix at all - posting and AS testing being the losers.

I just completed a few tests on real data and your script worked absolutely perfectly. BIG THANK YOU.

The reason for the false alarm was that in my initial post i left out two more extensions of good files, “datum” and “jpeg”. My bad. As soon as I added them to the exclusion list, everything worked as intended.

Regarding the “forNoting” variable, I left it in because, after the Olympics, i want to expand the script to check and process the suspect files automatically and do away with the text file record.

Many thanks again.

Best regards,
Chris

Why that complicated with its and get?


tell application "Finder" to set {theNames, theExtensions} to {name, name extension} of every file of desktop
set ct to count theNames

does the same in the same time

Hi McUsr and Stefan,

That is a very nice technic to get an extensions list. I did try a similar script myself but, I couldn’t get the syntax of the handlers right. I added yours to my bag of tricks. Many thanks.

Best Regards,
Chris

Hello!

I agree with the its that is superfluos, but I at least believe it doesn’t make the script longer, and for me, it adds to the readability, I like as less implicitily stated, as possible. But that is me! :slight_smile:

As for the get: I haven’t totally wrapped my head around for when it is required, and when not, and knowing I do nothing wrong by adding it, I see no harm really in doing so. :slight_smile:

I just conclude, as you see it as complicating, I see it as claryfying and assuring, perceptions diverge, you having the higher level of confidence!

I think Nigels solution, is perfectly good, I’ll return with another solution, just for a comparative study, as it is often interesting to view the different ways to solve problems.

explicit get is actually only needed in a few cases:

¢ to set and get a property in one line of code
¢ to retrieve a value of a property in a Finder tell block after display dialog
¢ to get the strong reference to a list in a repeat with anItem in aList loop

gets are always in AS when you use set var to value. But sometimes you want the get behave differently because by default the get will always between to and value.

For example the following code

tell application "Finder" to display dialog name of current application
--is the same as 
tell application "Finder" to get display dialog name of current application

both will return an error, to make this code work we have to change the scope by overwriting the default get command. So use

tell application "Finder" to display dialog (get name of current application)

Hello!

My solution is more unstable for one thing, as finder did crash, while testing this.

It is not totally tested either, I just guess it works!

@ DJ Bazzie Wazzie : Sometimes you actually explicitly need to get something! It doesn’t always work automagically! Something that can be observed when scripting Mail.app for instance! As for my construct, by using get, I don’t get dialogs for the first! but say I want a counter for a loop variable, then I usually get count to ensure it has a static value as I start iterating. :slight_smile:


repeat with i from 1 to (get count theList)
”do stuff
end repeat

So to ensure that behaviour, that I don’t miss out of it, I use to put a get in front of count, and members of object models, as it is quicker to type those three letters, than figure what went wrong afterwards, as the reason for failure may be many!

Edit Removed ignoring application responses block!


set theFolder to ((path to documents folder from user domain as text) & "Week01:")

tell application "Finder"
	delete (every item of its folder theFolder whose class is folder)
	delete (every file of its folder theFolder whose name extension is in {"", "jpg", "tiff", "png"})
end tell
-- get a list of what is left in the folder 

tell application "Finder" to set {theNames, theExtensions} to (get {name, name extension} of its every file of folder theFolder)
set ct to (get count theNames)
set theList to {}
set i to 1
repeat ct times
	set end of theList to {item i of theNames, item i of theExtensions}
	set i to i + 1
end repeat

-- get a list of every file name with a name  extension that is 7 chars long and isn't convert
set of7list to _filterL(theList, offending7s, 2)

set ct to get count of7list
set i to 1
repeat ct times
	try
		tell application "Finder" to delete file (item 1 of item i of of7list) of its folder theFolder
	end try
	set i to i + 1
end repeat

tell application "Finder"
	set theNames to (get name of its every file of folder theFolder) -- :)
	set theNames to (sort theNames by name)
end tell
set {tids, AppleScript's text item delimiters} to {AppleScript's text item delimiters, return}

set theNames to theNames as text
set AppleScript's text item delimiters to tids

writeFile(theNames, (theFolder & "Files_to_Check.txt"))



to writeFile(theData, theFile)
	(*For writing a file.  Handles situations where the new file may be
	 shorter than the original file, since AppleScript's write command doesn't 
	 reset EOF to the new data length.*)
	--returns boolean success (true=success)
	--Assumes: theFile is a file path string and file exists.	
	try
		-- open file
		open for access (theFile) with write permission
		copy the result to theFile_ID
		
		-- Set the file length to zero
		set eof theFile_ID to 0
		
		-- Write our message
		write theData to theFile_ID
		
		-- close the file
		close access theFile_ID
		return true
		
	on error
		try
			close access theFile_ID
		end try
		return false
	end try
end writeFile

on _filterL(L, crit, itemNo)
	-- © Matt Neuburg AppleScript The Definitive Guide Second Edition.
	-- reworked to work on lists of lists, finding criteria in chosen item of the sublist
	script filterer
		property criterion : crit
		property itemnum : itemNo
		on _filter(L)
			if L = {} then return L
			if criterion(item itemnum of item 1 of L) then
				return {item 1 of L} & (_filter(rest of L))
			else
				return _filter(rest of L)
			end if
		end _filter
	end script
	return filterer's _filter(L)
end _filterL

on offending7s(x)
	local ct
	set ct to count x
	if ct = 7 and x is not in "convert" then return true
	return false
	
end offending7s

Thanks! Hmmm.very interesting! :slight_smile: “to set and get a property in one line of code:”: I seldom do set a property in one line of code, but it is interesting that you have to get it in order to set it, I understand that to go like this:


property myprop : 2

set myprop to (get myprop) + 5 

When I get problems, I actually tend to put a my in front of them, I have never used get for this, as far as I can recollect. :frowning:

“To retrieve a value of a property in a Finder tell block after display dialog”, that would be a script property right, that is very interesting, that finder gets its scope obfuscated by display dialog! :slight_smile: I use “SystemUIServer” for display dialog!

And the strong reference, that would be a real reference to the list that is iterated over. That is also very interesting, and maybe the most logical of the three cases here, since we are in fact operating on the list while we are asking to get a reference to it. (There are no nodding emoticions here) :smiley:

Well, in addition to this, I think I have noticed, that I have to use get here and there in different apps, when I operate within their object models.

the second case is exactly DJ’s example in post #15
the third case is like your example


repeat with i from 1 to (get count theList)

but it affects only a list of references in a repeat . in form,
not an integer index variable in the repeat . from . to form


set ct to (get count theNames)

has no effect at all

I know Stefan, but it doesn’t add anything either, but to show off my ignorance! :slight_smile: I’ll drop those for the future, that is if not my fingers betray me!

We both wrote our posts at the same time again :slight_smile: . Anyway, I like your answer better.

That is exactly why I wrote the post and purpose of my examples, knowing when to override the ‘default’ get behaviour; explicit get.

BTW: I forgot that another way to override it is using a coercion.

display dialog name --doesn't work
display dialog (get name)
display dialog name as string --which I use mostly