Regular Expression or Wildcards in Applescript

Hi All,
I have a script that I use as a folder action for moving files into folders based on file name. Basically when I download a TV show my folder action passes that filename and then in my script I have it use all of the text before the season (usually entered as S1, S2 etc…) to create a folder based on the show name. If the folder already exists then it just moves the file into that folder.

Right now I am hard coding my delimiters for where the folder name cutoff is. So I have it look for S0, S1, S2 etc… This works well but is very limited. what I would like is something like S##E## so there isn’t the limitation that I currently have. I would also like for it to look at date formats too, for example ####.##.## or ##.##.####

I believe this type of thing is possible in the shell using grep or using regular expressions I just can’t figure out how to implement this stuff into applescript. If someone wanted to do me a huge favor and look at my script and tell me what I need to do, or even make the necessary changes I would be forever gratefull.

Thanks,

Tanner



--Script Setup Variables--
set sendGrowl to "Yes" -- if Yes then growl notifications will be sent, if set to No they won't
--End Script Setup Variables--




--Functions--

--Function to replace text
on str_replace(find, replace, subject)
	set prevTIDs to text item delimiters of AppleScript
	set returnList to true
	
	-- This wouldn't make sense (you could have it raise an error instead)
	if class of find is not list and class of replace is list then return subject
	
	if class of find is not list then set find to {find}
	if class of subject is not list then ¬
		set {subject, returnList} to {{subject}, false}
	
	set findCount to count find
	set usingReplaceList to class of replace is list
	
	try
		repeat with i from 1 to (count subject)
			set thisSubject to item i of subject
			
			repeat with n from 1 to findCount
				set text item delimiters of AppleScript to item n of find
				set thisSubject to text items of thisSubject
				
				if usingReplaceList then
					try
						item n of replace
					on error
						"" -- `replace` ran out of items
					end try
				else
					replace
				end if
				
				set text item delimiters of AppleScript to result
				set thisSubject to "" & thisSubject
			end repeat
			
			set item i of subject to thisSubject
		end repeat
	end try
	
	set text item delimiters of AppleScript to prevTIDs
	if not returnList then return beginning of subject
	return subject
end str_replace
--end replace text function

--Function to trim leading and trailing spaces
on trim(someText)
	repeat until someText does not start with " "
		set someText to text 2 thru -1 of someText
	end repeat
	
	repeat until someText does not end with " "
		set someText to text 1 thru -2 of someText
	end repeat
	
	return someText
end trim
--end trim function

--Function to change case of text.  Used to create title case folders
property lower_alphabet : "abcdefghijklmnopqrstuvwxyz"
property upper_alphabet : "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
property white_space : {space, tab, return, ASCII character 10, ASCII character 13}

on change_case(this_text, this_case)
	set new_text to ""
	if this_case is not in {"UPPER", "lower", "Title", "Sentence"} then
		return "Error: Case must be UPPER, lower, Title or Sentence"
	end if
	if this_case is "lower" then
		set use_capital to false
	else
		set use_capital to true
	end if
	repeat with this_char in this_text
		set x to offset of this_char in lower_alphabet
		if x is not 0 then
			if use_capital then
				set new_text to new_text & character x of upper_alphabet as string
				if this_case is not "UPPER" then
					set use_capital to false
				end if
			else
				set new_text to new_text & character x of lower_alphabet as string
			end if
		else
			if this_case is "Title" and this_char is in white_space then
				set use_capital to true
			end if
			set new_text to new_text & this_char as string
		end if
	end repeat
	return new_text
end change_case
--End title case function

--End Functions--

--Watch for items added to the folder specified by the folder action--
on adding folder items to this_folder
	--
	--Main Script--	
	--Set delimiter for where to end the folder name, in this case we want everything before the season
	tell application "Finder"
		
		try
			--Set chosen_folder variable to the folder passed via the folder action
			set chosen_folder to this_folder as alias
			
			set file_list to files of chosen_folder
			set file_names to name of files of chosen_folder as string
		on error
			set file_list to {}
		end try
		
		repeat with this_file in file_list
			
			
			set folder_name to name of this_file
			
			if (folder_name contains "S0") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S0"}
			else if (folder_name contains "S1") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S1"}
			else if (folder_name contains "S2") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S2"}
			else if (folder_name contains "S3") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S3"}
			else if (folder_name contains "1x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "1x"}
			else if (folder_name contains "2x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "2x"}
			else if (folder_name contains "3x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "3x"}
			else if (folder_name contains "4x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "4x"}
			else if (folder_name contains "5x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "5x"}
			else if (folder_name contains "6x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "6x"}
			else if (folder_name contains "7x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "7x"}
			else if (folder_name contains "8x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "8x"}
			else if (folder_name contains "9x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "9x"}
			else if (folder_name contains "0x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "0x"}
			end if
			
			set folder_name to ((text items 1 thru -2 of folder_name) as string)
			
			
			--Run the string replace command on the folder name variable
			set folder_name to str_replace(".", " ", folder_name as Unicode text) of me
			set folder_name to str_replace("-", " ", folder_name as Unicode text) of me
			set folder_name to str_replace("_", " ", folder_name as Unicode text) of me
			set folder_name to str_replace("  ", " ", folder_name as Unicode text) of me
			
			--Run the trim command on the folder variable
			set folder_name to trim(folder_name) of me
			
			--Run title case command
			set folder_name to change_case(folder_name, "Title") of me
			
			--
			set new_folder to ((chosen_folder as string) & folder_name & ":")
			try
				get new_folder as alias
			on error
				make new folder at chosen_folder with properties {name:folder_name}
			end try
			move this_file to folder new_folder
			
			set my text item delimiters to old_delim
			
		end repeat
		
	end tell
	
	--End Main Script--
	
	--Send Growl Notification--
	
	--Check if growl is running--
	tell application "System Events"
		set isRunning to ¬
			(count of (every process whose name is "GrowlHelperApp")) > 0
	end tell
	
	--If growl is running then proceed--
	if isRunning is true then
		
		tell application "GrowlHelperApp"
			set fileName to file_names
			
			if fileName is not "" then
				
				set the allNotificationsList to {"Welcome", "Transfer Complete", "Error"}
				
				set the enabledNotificationsList to {"Welcome", "Transfer Complete", "Error"}
				
				register as application "TV Show Mover" all notifications allNotificationsList default notifications enabledNotificationsList icon of application "Plex"
				
				notify with name "Transfer Complete" title "Transfer Successfull" description "File " & fileName application name "TV Show Mover" sticky "no"
				
			end if
		end tell
	end if
	--End Send Growl Notification--
	--	
end adding folder items to


I realized that posting my whole script may have been overload. Here is the specific chunk of code I need help with in using some type of pattern replace or regex style setup.


	tell application "Finder"
		
		try
			--Set chosen_folder variable to the folder passed via the folder action
			set chosen_folder to "Path To Folder"  --Set this to the path of the folder you want to run this script on
			
			set file_list to files of chosen_folder
			set file_names to name of files of chosen_folder as string
		on error
			set file_list to {}
		end try
		
		repeat with this_file in file_list
			
			
			set folder_name to name of this_file
			
			--Standard SO1E01 format--
			if (folder_name contains "S0") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S0"}
			else if (folder_name contains "S1") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S1"}
			else if (folder_name contains "S2") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S2"}
			else if (folder_name contains "S3") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "S3"}
				
				--1xE01 format							
			else if (folder_name contains "1x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "1x"}
			else if (folder_name contains "2x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "2x"}
			else if (folder_name contains "3x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "3x"}
			else if (folder_name contains "4x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "4x"}
			else if (folder_name contains "5x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "5x"}
			else if (folder_name contains "6x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "6x"}
			else if (folder_name contains "7x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "7x"}
			else if (folder_name contains "8x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "8x"}
			else if (folder_name contains "9x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "9x"}
			else if (folder_name contains "0x") then
				tell (a reference to my text item delimiters) to set {old_delim, contents} to {contents, "0x"}
			end if
			
			set folder_name to ((text items 1 thru -2 of folder_name) as string)
			
			--
			set new_folder to ((chosen_folder as string) & folder_name & ":")
			try
				get new_folder as alias
			on error
				make new folder at chosen_folder with properties {name:folder_name}
			end try
			move this_file to folder new_folder
			
			set my text item delimiters to old_delim
			
		end repeat
		
	end tell

With text item delimiters you may specify a list of delimiters. So, for example, to split a string at every digit, you might write something like:


set s to "S0E9"
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9"}
set theList to the text items of s
set AppleScript's text item delimiters to tid
theList

Similarly, if you want to split at the season (e.g., “Sx”), you may set the text delimiters as follows:


set AppleScript's text item delimiters to {"S1", "S2", "S3"} -- .and so on, for as many seasons as you have

That said, for more complex parsing, I agree with you that it is better to delegate a Perl or Ruby script to do the job. Since I’m in Ruby mood currently, this is what I would use:


on matchRegExp(regex, txt, |caseSensitive?|)
	if |caseSensitive?| then
		set ci to "i"
	else
		set ci to ""
	end if
	set theRubyOneLiner to quote & "s = '" & txt & "'; s =~ /" & regex & "/" & ci & "; puts Regexp.last_match.to_a" & quote
	set theCommand to "ruby -e " & theRubyOneLiner
	set theMatchData to do shell script theCommand
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to character id 13 -- new line
	set theMatchData to the text items of theMatchData
	set AppleScript's text item delimiters to tid
	theMatchData
end matchRegExp

-- Example of usage:
set anEpisode to "MyTVShow-Season12Ep3"
set caseInsensitive to false
matchRegExp("Season(\\d+)Ep(\\d+)", anEpisode, caseInsensitive)
-- {"Season12Ep3", "12", "3"}

Note that:

  1. item 1 of the result is the match of the whole regular expression;
  2. subsequent items are matches to the subexpressions in parentheses
  3. In AppleScript, a backslash must be written \
  4. Other than that, the syntax for regular expressions is Perl-like (search the web for the details).

Hope this helps.

Ah, I see that you want the prefix of your filename rather than the series and episode numbers. So, an example that adheres to what you want is the following:


set anEpisode to "MyTVShow-S12E3"
matchRegExp("(.+)-S\\d+E\\d+", anEpisode, caseInsensitive)
-- {"MyTVShow-S12E3", "MyTVShow"} -- item 2 is the name

For the sake of completeness, here how you might parse dates in the format you want:


set aDate to "21.10.2010"
matchRegExp("(\\d{2})\\.(\\d{2})\\.(\\d{4})", aDate, true)
-- {"21.10.2010", "21", "10", "2010"}
set aDate to "2010.10.21"
matchRegExp("(\\d{4})\\.(\\d{2})\\.(\\d{2})", aDate, true)
-- {"2010.10.21", "2010", "10", "21"}

Depending on your what you want to achieve, however, something like


set aDate to "21/10/2010"
date aDate

might be all you need.

Thanks Druido,
This is definitely getting me close to what I want. You will have to forgive me for being a bit slow as I am pretty new to applescript and have only gotten as far as I have by borrowing code that I have found on this site.

This bit of code:


-- Example of usage:

set caseInsensitive to false
set anEpisode to "MyTVShow-S12E3"
matchRegExp("(.+)-S\\d+E\\d+", anEpisode, caseInsensitive)
-- {"MyTVShow-S12E3", "MyTVShow"} -- item 2 is the name

Is working well. I do have a couple of questions. What would be the easiest way to get just item 2 set to another variable.

My other question is I want to check for multiple patterns so one might be the S01E01 pattern that is being looked at here, another might be 1x01 or S1E1. Is there a way to test for a regex pattern being valid and then running the appropriate code based on that so maybe a chain of if statements or something like that?

I see the code you posted for dates and it is helpful but I think I am missing how to implement it properly. Basically some shows are named as “ShowName.2010.10.21.avi” or “ShowName.10.21.2010.avi”, sometimes separated with periods or sometimes dashes or other characters.

I really appreciate the help you provided me so far and would be really grateful for any further assistance you can lend.

Actually now that I look at it some more it looks like the way I can test for a match is compare Item 1 against item 2 of what is returned for the regex match.

Basically if it is a proper match then item 1 should be different from item 2. So I think I have solved that issue. That just leaves how is the best way to get just item 1 into a variable or item 2 into a variable and how to properly parse the filenames with dates in them.

Thanks

Alright I am starting to figure all of this out a bit and have a working model. There are just a couple nagging issues. Here is my code so far:



--RegEx Function--
on matchRegExp(regex, txt, |caseSensitive?|)
	if |caseSensitive?| then
		set ci to "i"
	else
		set ci to ""
	end if
	set theRubyOneLiner to quote & "s = '" & txt & "'; s =~ /" & regex & "/" & ci & "; puts Regexp.last_match.to_a" & quote
	set theCommand to "ruby -e " & theRubyOneLiner
	set theMatchData to do shell script theCommand
	set tid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to character id 13 -- new line
	set theMatchData to the text items of theMatchData
	set AppleScript's text item delimiters to tid
	theMatchData
end matchRegExp
--End RegEx Function--

			

set folder_name to "TVShowName.S01.E01.avi"
			
			--Run Regex on filename--
			set caseInsensitive to true --true ignores case, false is case specific
			
			matchRegExp("(.+)S\\d+E\\d+", folder_name, caseInsensitive)
			-- {"MyTVShow.S12E3", "MyTVShow"} -- item 2 is the name
			set regex to result
			
			if length of regex is 0 then
				matchRegExp("(.+)\\d+x\\d+", folder_name, caseInsensitive)
				-- {"MyTVShow.1x01", "MyTVShow"} -- item 2 is the name
				set regex to result
			end if
			
			if length of regex is 0 then
				matchRegExp("(.+)\\d\\d\\d\\d.\\d+.\\d+", folder_name, caseInsensitive)
				-- {"MyTVShow.2010.10.21", "MyTVShow"} -- item 2 is the name
				set regex to result
			end if
			
			if length of regex is 0 then
				matchRegExp("(.+)\\d\\d.\\d\\d.\\d\\d\\d\\d", folder_name, caseInsensitive)
				-- {"MyTVShow.10.21.2010", "MyTVShow"} -- item 2 is the name
				set regex to result
			end if
			
			--Return value of folder name--
			set folder_name to ((text items 2 thru 2 of regex) as string)

Things are working but there are a few snags.

If I have a file in the format of “TVShow.1x01.avi” it works just fine, but if it is “TVShow.12x01.avi” then it adds a 1 to the end of the name.

I think what is happening is the regex is having problems with \d+ if there is nothing solid before it. It is not repeating unless there is a hard coded character before it.

Any ideas how to resolve this?

You mean something like


set theName to item 2 of matchRegExp("(.+)-S\\d+E\\d+", anEpisode, caseInsensitive)

or something else?

One possibility is calling matchRegExp() multiple times, as you do. Another possibility is to build a more complex regular expression.

If a match is not found, then matchRegExp() returns an empty list. Checking the length of the list, as you do in your latest post, is a way to determine whether the filename matches the pattern.

Ruby (and Perl) regular expressions are “greedy” by default: they match as much as possible. So, for example, if the string “abc123” is matched against (.+)(\d+), then abc12 will match .+ and 3 will match \d+. To get the opposite behaviour, you must append a ? after the pattern that should be matched in a “non-greedy” way. For example, “abc123” will match (.+?)(\d+) as (abc)(123).

You are the man!

Not only did you write some code that works great but you have also helped me understand more about applescript and now I actually understand a small sliver of regular expressions.

My script is working beautifully for what I need it to do. Thanks so much!