Highlight files using non-acceptable characters

I have a client who can only accept filenames using alphabetical, numerical characters or spaces or underscores – plus a period before the file extension (only). % signs (for example) are not allowed.

I’d like to be able to scan a folder for files that don’t conform to these requirements, and highlight them in, say, yellow.

(I’m an total scripting newbie and barely understand what the heck I’m doing.)

Thanks for any help.

How’s this?

tell application "Finder"
	set comm to " | perl -pe 's/[a-z]//g' | perl -pe 's/[A-Z]//g' | perl -pe 's/[0-9]//g' |perl -pe 's/_//g' | perl -pe 's/ //g'"
	set theFolder to choose folder with prompt "Select a folder"
	set theFiles to every file of theFolder
	repeat with eachfile in theFiles
		set theFile to name of eachfile as string
		set command to "echo " & (quoted form of theFile) & comm
		set badchars to do shell script command
		if (length of badchars) > 1 then
			set label index of eachfile to 3
		end if
	end repeat
end tell

Edit: label index 3 is yellow. :smiley:

Wow! Great! Only thing is, it’s flagging hyphens also, which are actually permitted. My fault; I forgot to include them in the list of permissible characters. I totally don’t understand the code:
set comm to " | perl -pe ‘s/[a-z]//g’ | perl -pe ‘s/[A-Z]//g’ | perl -pe ‘s/[0-9]//g’ |perl -pe ‘s/_//g’ | perl -pe ‘s/ //g’"
…or much of the rest either, or would try to edit the script myself. Is it a big deal to fix?

No problem:

tell application "Finder"
	set comm to " | perl -pe 's/[a-z]//g' | perl -pe 's/[A-Z]//g' | perl -pe 's/[0-9]//g' | perl -pe 's/_//g' | perl -pe 's/-//g' | perl -pe 's/ //g'"
	set theFolder to choose folder with prompt "Select a folder"
	set theFiles to every file of theFolder
	repeat with eachfile in theFiles
		set theFile to name of eachfile as text
		set command to "echo " & (quoted form of theFile) & comm
		set badchars to do shell script command
		if (length of badchars) > 1 then
			set label index of eachfile to 3
		end if
	end repeat
end tell

The perl commands strip out the valid characters. For example, perl -pe ‘s/[a-z]//g’ replaces all characters that range from lower case a to lower case z with the character(s) between the two slashes. In this case, there are no characters there, so it simply deletes them. After deleting all valid characters, you should be left with just the dot from between the name and its extension, so the length should be 1. If it isn’t, the name contains invalid characters!

Note that I’m assuming that file names can’t have a second dot: filename.something.txt is invalid but filename.txt is valid.

I refined it a bit:

tell application "Finder"
	set comm to " | perl -pe 's/[a-z,A-Z,0-9,_, ,-]//g'"
	set theFolder to choose folder with prompt "Select a folder"
	set theFiles to every file of theFolder
	repeat with eachfile in theFiles
		set theFile to name of eachfile as text
		set command to "echo " & (quoted form of theFile) & comm
		set badchars to do shell script command
		if (length of badchars) > 1 then
			set label index of eachfile to 3
		end if
	end repeat
end tell

Must not have been thinking clearly before because there’s no need for multiple perl calls. :confused:

Ah, now I see what it’s doing. Know nothing about Perl either, but I get the idea of how you specified it. Thanks so much for posting this! I hope there are other folks out there who need to do this kind of thing. What an amazing resource this site is!

I saw in another thread that you were interested in also labeling files with more than 31 characters in their names. This version will do that as well as label the ones with illegal characters. It uses a different color for length and illegal.

tell application "Finder"
	set comm to " | perl -pe 's/[[:alnum:],_, ,-]//g'"
	set theFolder to choose folder with prompt "Select a folder"
	set theFiles to every file of theFolder
	repeat with eachfile in theFiles
		set theFile to name of eachfile as text
		set len to length of theFile
		set command to "echo " & (quoted form of theFile) & comm
		set badchars to do shell script command
		if (((length of badchars) > 1) or (len > 31)) then
			if len > 31 then
				set label index of eachfile to 2
			else
				set label index of eachfile to 3
			end if
		else
			set label index of eachfile to 0
		end if
	end repeat
end tell

Thanks! But what happens if the file has both conditions (too long and illegal characters)…I thought it might be clearer just to run the two scripts sequentially. I think adding a third color to indicate both sounds confusing to the end user. What do you think?

Then it gets labeled as too long. If you then shorten the name to 31 characters or less and run the script again, it will label it as having illegal characters, if it still contains them, or remove the label, indicating that it is short enough and contains no invalid characters.
:slight_smile:

Here’s one that labels each file one of four ways:
Too long & invalid characters
Too long
Invalid characters
No invalid characters and not too long

tell application "Finder"
	set comm to " | perl -pe 's/[[:alnum:],_, ,-]//g'"
	set theFolder to choose folder with prompt "Select a folder"
	set theFiles to every file of theFolder
	repeat with eachfile in theFiles
		set theFile to name of eachfile as text
		set len to length of theFile
		set command to "echo " & (quoted form of theFile) & comm
		set badchars to do shell script command
		if (((length of badchars) > 1) and (len > 31)) then
			set label index of eachfile to 4
		else
			if len > 31 then
				set label index of eachfile to 2
			else
				if (length of badchars) > 1 then
					set label index of eachfile to 3
				else
					set label index of eachfile to 0
				end if
			end if
		end if
	end repeat
end tell

Wow, this is great. Thanks for your help with this problem.

Can anyone help me with this quick query please.

I’ve been trying to understand why the final script doesn’t pick up spaces?

Regards,

Nick

Must be because I told him spaces were legal characters.

Hey Nick,

Change the following line.


set comm to " | perl -pe 's/[[:alnum:],_, ,-]//g'"

To below removing the space:


set comm to " | perl -pe 's/[[:alnum:],_,-]//g'"

hth,

Craig

Hi Craig,

Thanks for the reply, it did help and the script now pick’s up spaces, great!

I’ve not dabbled too much with shell scripting hence the next question.
Working forward with this script is it possible to amend the perl so that all the illegal characters are replaced with an underscore?

Regards,

Nick

Hi Nick,

Now you are going the other direction. The Perl script is stripping out all the acceptable characters.
You are talking about replacing unacceptable characters.

Change the following code to include all the characters you want to replace
in the “characters_to_replace” area and enter the character you want to replace
it with in the “replace_with” area.

If you do a lot of this you should get a good book on regex. My favorite is
“Mastering Regular Expressions” by Jeffrey E.F. Friedl. Also, Python and Ruby
have very good regex support as well.


set comm to " | perl -pe 's/[characters_to_replace]/replace_with/g'"

Cheers,

Craig

Hi Craig,

Thanks for your reply, it highlighted something else with the script.

I’d not realised that the script was supposed to strip out illegal characters, all it does on my machine is re-label the files that fail the test and so would need characters stripping out?
Do you have any ideas why that might be?

Thanks for explaining the find/replace thing, it’s clearer now. As for the regex info, I’ve spent many a happy hour playing with BBEdit and Text Wrangler to manipulate data though there’s still plenty to learn. I think I may get a copy of the book for reference tho.

Thanks again for the recommendation and the help.

Regards,

Nick

Hi Nick,

It would help me a lot if you would share what you are trying to accomplish.
Would you mind posting some example text of what you have and what you
would like it to be?

Cheers,

Craig

HI, Nick…if you go to the top of the thread, you’ll see the original intent of this script. Maybe that’ll enlighten you about how and why it does what it does. It was written for a very specific use, and worked perfectly for my needs. (Thanks again, Craig, you are amazing!) Here’s the gist of it:

Good luck, both of you!

Hi Craig,

Here’s a sample of the filenames I’ve been trying to rename.

_ one&lick_itunes.gif
_ 10-6_copy.jpg
----1&0
6.jpg
10_6_copycopy copy.jpg
10_6_copycopy copy 1.jpg
10_6_copycopy.jpg

I’ve written something else to strip out anything that’s not alphanumeric, it was the part to do with using perl that interested me. Having looked at the script again am I right in thinking that it doesn’t actually rename the files, it simply generates the new filename?

Thanks,

Nick