Editing a string containing a word that can be converted to a number.

Hello

The issue is actually a bit more complicated than what the title of the question says.

I am looking for help with the names of my film archive collection
I don’t know how to continue and correct the errors

The objective is to find a word in the file name, which converted to number is within the time of existence of the film (greater than 1894 and less than the year following the current year) and copy it to the beginning of the file name.
If the first word of the file name is in the range of the cinema existence the file should not be processed.

Examples:
. number film title [director] plus words year_of_production plus words.ext
or also,
. film title [director] plus words year_of_production plus words.ext
(without initial number)

—> year_of_production (and the unchanged file name)


use AppleScript version "2.4"
use scripting additions


property yearFilm : ""
set curYear to year of (current date)

tell application "Finder"
	activate
	set sel to selection as alias list
end tell

tell application "System Events"
	
	repeat with i from 1 to count sel
		set nComplet to name of (item i of sel)
		set nBasic to my nBasicATID(nComplet)
		
		set {ATID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, " "}
		set list_nBasic to every text item of nBasic
		
		repeat with y from 2 to (count of list_nBasic)
			try
				if (((item y of list_nBasic) as number is greater than 1894) and ((item y of list_nBasic) as number is less than (curYear + 1))) then
					set yearFilm to (text item y of list_nBasic) & " "
					--exit repeat
				end if
				
			end try
			set AppleScript's text item delimiters to ATID
		end repeat
		
		tell application "System Events" to set name of (item i of sel) to (yearFilm & nComplet)
	end repeat
	
end tell


--------
on nBasicATID(_nComp)
	set {ATID, AppleScript's text item delimiters} to {AppleScript's text item delimiters, "."}
	if _nComp contains "." then
		set nBsc to (text items 1 thru -2 of _nComp) as text
	else
		set nBsc to _nComp -- folder
	end if
	set AppleScript's text item delimiters to ATID
	
	return nBsc
end nBasicATID

Situations in which the first word converted to number already fulfills the condition of the range of the existence of the film are pending to be solved.

Also the strategy chosen to be able to compare if a string converted to number, meets the condition of being in range (avoid the block try… end try).

I welcome any help or changes to a better approach.

Nevermind. Looking closer at your post I see you were doing something different.

I had a different approach, using this collection of handers. (This is very basic AppleScript and has been in use for like 20 years.

Basically, I have a database that includes movie and TV titles and other information (description, rating, cast, etc.). The script adds a new field (titleSort) to each record, and uses that for sorting.

Converting numbers to words in titles is indeed complicated.

 
use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

set listOfTitles to {"2001: A Space Odessy", "1900", "The Magnificent 7"}
set fixedTitles to {}
repeat with aTitle in listOfTitles
	set titleSort to FixTitleForSorting(aTitle as text)
	set the end of fixedTitles to titleSort & tab & "|" & tab & aTitle as text
end repeat
set AppleScript's text item delimiters to {return}
return fixedTitles as text
on FixTitleForSorting(titleToFix)
	set titleToFix to titleToFix as text
	set titleToFix to PrepTitleForSort(titleToFix)
	set titleToFix to findAndFixNumbers(titleToFix)
	set titleToFix to FixLeadingNonAlphas(titleToFix)
	
	return titleToFix
end FixTitleForSorting

on PrepTitleForSort(titleText)
	set saveTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {""}
	set prefixList to {"A", "An", "The", "El", "La", "Los"}
	if word 1 of titleText is in prefixList then
		set titleWords to every word of titleText
		set the last item of titleWords to the last item of titleWords & ","
		set the end of titleWords to item 1 of titleWords
		set titleWords to the rest of titleWords
		set AppleScript's text item delimiters to " "
		set titleText to titleWords as text
	end if
	set AppleScript's text item delimiters to saveTID
	
	return titleText
end PrepTitleForSort

on NumberToWords(aNum)
	set saveTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {""}
	set onesList to {"one", "two", "three", "four", "five", "six", "seven", "eight", "nine", "ten", "eleven", "twelve", "thirteen", "fourteen", "fifteen", "sixteen", "seventeen", "eighteen", "nineteen"}
	set tensList to {"ten", "twenty", "thirty", "forty", "fifty", "sixty", "seventy", "eighty", "ninety"}
	set AppleScript's text item delimiters to ""
	set aNumList to every text item of (aNum as text)
	set aNum to aNum as integer
	if aNum = 0 then
		set numWord to "zero"
		if the (count of aNumList) > 1 then
			set aNumList to the rest of aNumList
			set numWord to {numWord}
			set the end of numWord to NumberToWords(aNumList as text)
		end if
	else if aNum < 20 then
		set numWord to (item aNum of onesList) as text
	else if aNum < 100 then
		set tenNum to (item 1 of aNumList) as integer
		set oneNum to (item 2 of aNumList) as integer
		set numWord to {item tenNum of tensList}
		if oneNum ≠ 0 then set the end of numWord to item oneNum of onesList
	else if aNum < 1000 then
		set hundNum to (item 1 of aNumList) as integer
		set aNumList to the rest of aNumList
		set hundRem to (aNumList as text) as integer
		set numWord to {item hundNum of onesList, "hundred"}
		if hundRem ≠ 0 then set the end of numWord to NumberToWords(hundRem)
	else if aNum < 1100 then
		set numWord to {"one-thousand"}
		set aNumList to the rest of aNumList
		set thouRem to (aNumList as text) as integer
		if thouRem ≠ 0 then set the end of numWord to NumberToWords(thouRem)
	else if aNum < 2000 then
		set thouNum to items 1 thru 2 of aNumList
		set thouNum to (thouNum as text) as integer
		set numWord to {NumberToWords(thouNum)}
		set aNumList to items 3 thru -1 of aNumList
		set the end of numWord to "hundred"
		set AppleScript's text item delimiters to ""
		set thouRem to (aNumList as text) as integer
		if thouRem ≠ 0 then set the end of numWord to NumberToWords(thouRem)
		
	else if aNum < 10000 then
		set thouNum to item 1 of aNumList
		set aNumList to the rest of aNumList
		set thouNum to (thouNum as text) as integer
		set numWord to {NumberToWords(thouNum)}
		set the end of numWord to "thousand"
		set AppleScript's text item delimiters to ""
		set thouRem to (aNumList as text) as integer
		if thouRem ≠ 0 then set the end of numWord to NumberToWords(thouRem)
		
	else if aNum < 100000 then
		set thouNum to items 1 thru 2 of aNumList
		set aNumList to items 3 thru -1 of aNumList
		set thouNum to (thouNum as text) as integer
		set numWord to {NumberToWords(thouNum)}
		set the end of numWord to "thousand"
		set AppleScript's text item delimiters to ""
		set thouRem to (aNumList as text) as integer
		if thouRem ≠ 0 then set the end of numWord to NumberToWords(thouRem)
	else if aNum < 1000000 then
		set thouNum to items 1 thru 3 of aNumList
		set aNumList to items 4 thru -1 of aNumList
		set thouNum to (thouNum as text) as integer
		set numWord to {NumberToWords(thouNum)}
		set the end of numWord to "thousand"
		set AppleScript's text item delimiters to ""
		set thouRem to (aNumList as text) as integer
		if thouRem ≠ 0 then set the end of numWord to NumberToWords(thouRem)
	else if aNum < 10000000 then
		set milNum to item 1 of aNumList
		set aNumList to the rest of aNumList
		set milNum to (milNum as text) as integer
		set numWord to {NumberToWords(milNum)}
		set the end of numWord to "million"
		set AppleScript's text item delimiters to ""
		set milRem to (aNumList as text) as integer
		if milRem ≠ 0 then set the end of numWord to NumberToWords(milRem)
	else if aNum < 100000000 then
		set milNum to items 1 thru 2 of aNumList
		set aNumList to items 3 thru -1 of aNumList
		set milNum to (milNum as text) as integer
		set numWord to {NumberToWords(milNum)}
		set the end of numWord to "million"
		set AppleScript's text item delimiters to ""
		set milRem to (aNumList as text) as integer
		if milRem ≠ 0 then set the end of numWord to NumberToWords(milRem)
	else if aNum ≤ 536870911 then
		--this is the largest integer that does not get displayed as an exponent
		set milNum to items 1 thru 3 of aNumList
		set aNumList to items 4 thru -1 of aNumList
		set AppleScript's text item delimiters to ""
		set milNum to (milNum as text) as integer
		set numWord to {NumberToWords(milNum)}
		set the end of numWord to "million"
		set AppleScript's text item delimiters to ""
		set milRem to (aNumList as text) as integer
		if milRem ≠ 0 then set the end of numWord to NumberToWords(milRem)
	else
		set numWord to aNum as text
	end if
	set AppleScript's text item delimiters to "-"
	set numWord to numWord as text
	set AppleScript's text item delimiters to saveTID
	return numWord
	
end NumberToWords

on FixLeadingNonAlphas(titleText)
	set saveTID to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {""}
	set titleList to characters of titleText
	repeat
		if item 1 of titleList is not in "ABCDEFGHIJKLMNOPQRSTUVWXYZ" then
			set titleList to the rest of titleList
		else
			exit repeat
		end if
	end repeat
	if titleList is {} then set titleList to "A"
	set titleText to titleList as text
	set AppleScript's text item delimiters to saveTID
	return titleText
end FixLeadingNonAlphas

on findAndFixNumbers(titleToFix)
	set newTitle to {}
	set x to 0
	set titleSize to the count of characters in titleToFix
	repeat
		set x to x + 1
		if x > the (titleSize) then exit repeat
		set thisChar to character x of titleToFix
		if thisChar is in "1234567890" then
			set thisNum to {thisChar}
			set lastChar to ""
			repeat
				set x to x + 1
				if x > the (titleSize) then exit repeat
				set thisChar to character x of titleToFix
				if thisChar is in "1234567890" then
					set the end of thisNum to thisChar
				else
					set lastChar to thisChar
					exit repeat
				end if
			end repeat
			set AppleScript's text item delimiters to ""
			NumberToWords(thisNum as text)
			set wordNum to the result
			set the end of newTitle to wordNum
			set the end of newTitle to lastChar
			
		else
			set the end of newTitle to thisChar
		end if
		
	end repeat
	set AppleScript's text item delimiters to ""
	
	return newTitle as text
end findAndFixNumbers

Hello, stockly
Thank you very much for replying.:slight_smile:

I think we are not talking about the same subject.

I have no interest in movie database.
What I need is to capture the string that all my filenames have that represents the year of production and, if it is not repeated in the first word of the filename, capture it and put it at the beginning of the filename.

Examples:

  1. Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6) —> 2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)
    (The first word is not the year of production.)

  2. 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0) —> 1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)
    (The first word is a number, but it does not fit the range of the film’s existence; we put the year of production at the beginning of the name).

  3. 1945 The Stranger [Orson Welles] 1945 (7,4) —> 1945 The Stranger [Orson Welles] 1945 (7,4)
    (The first word is a number, the year of production, which does fit within the range of the film’s existence. The name is not changed)

  4. 1941 [Steven Spielberg] 1979 (5,5) —> 1941 [Steven Spielberg] 1979 (5,5) (!!!)
    (Singular case that would not have a solution since the title is compatible with a year compatible with the existence of the cinema).

Without seeing what your data or files looklike, I can only guess that this is what you need.
set curYear to year of (current date)

 
	
		-->>>>
		repeat with y from 2 to (count of list_nBasic)
			set thisFilmYear to (item y of list_nBasic) as number
			try
				if thisFilmYear > 1894 and thisFilmYear < (curYear + 1) then
					set yearFilm to (thisFilmYear as text) & " "
					set newFileNameEnd to (yearFilm & nComplet)
				end if
				
				set nComplet to nComplet & newFileNameEnd
				
			end try
			set AppleScript's text item delimiters to ATID
		end repeat
		-->>>>
	


repeat with y from 2 to (count of list_nBasic)
           set thisFilmYear to (item y of list_nBasic) as number 
           try


When trying to convert the items of a string that are not compatible with a number format to isolate the possible compatible item, an error occurs because the statement does not have the protection provided by try… end try for this type of error.

Entering the error-producing statement within the try… end try block clears the error but, unfortunately, the code has no effect on the selection of target files.

Can’t do anything without seeing samples of your data, and your desired output.

Data samples, and desired output in Response #3:

Possible situations:

Data samples —> desired output

  1. Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6) —> 2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)
    (The first word is not the year of production.)

  2. 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0) —> 1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)
    (The first word is a number, but it does not fit the range of the film’s existence; we put the year of production at the beginning of the name).

  3. 1945 The Stranger [Orson Welles] 1945 (7,4) —> 1945 The Stranger [Orson Welles] 1945 (7,4)
    (The first word is a number, the year of production, which does fit within the range of the film’s existence. The name is not changed to avoid duplication of the year at the beginning of the file name)

  4. 1941 [Steven Spielberg] 1979 (5,5) —> 1941 [Steven Spielberg] 1979 (5,5) (!!!)
    (Singular case that would not have a solution since the title is compatible with a year compatible with the existence of the cinema).

You’ll want to use a RegEx capturing
(\d\d\d\d)

Then you need to add conditions to it
IE it must follow “] “ and be followed by a space.
]\s(\d\d\d\d)\s

I see you have some where it’s in twice like at the beginning.
This won’t capture that.
So as far as cleaning that out it’s gonna take some more work
^(\d\d\d\d)\s
Will capture if it’s at the start of the string.
You may need to add further quantifiers to try to establish
That it’s not a movie title with 4 digits.
Which seems like if it is followed by a “[“ it is
^(\d\d\d\d)\s\w
Will make sure that it’s followed by a space and a letter

I can give AppleScript examples later.
RegExs are powerful.
You’ll wanna look at what your analyzing and
Try to find key text that is constant to help you

Also will have and text where the year is at the head
And not at the tail and needs to be added?

IE
1945 The Stranger [Orson Welles]

Based upon the four examples I would take this approach. I saved the examples as paragraphs within a text file. If your actual data is inconsistent then the results may vary — especially the ‘]’.

The script reads the paragraphs into a list. Working through each list item, it then uses the closing ‘]’ to isolate the release year. It checks to see if the record begins with the release year (only true for the Stranger and if missing, will prepend the release year to the record. When complete, it returns the resulting records as paragraphs.

Hope this is what you’re looking for.

set filmList to paragraphs of (read (choose file) as «class utf8»)
-- {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", "20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", "1945 The Stranger  [Orson Welles] 1945 (7,4)  ", "1941 [Steven Spielberg] 1979 (5,5)"}

set nList to {}
set AppleScript's text item delimiters to "]"
repeat with filmString in filmList
	set split2 to last text item of filmString
	set rYear to word 1 of split2 --> release year
	
        -- does filmString begin with release year
	set w1 to word 1 of first text item of filmString
	if w1 is not equal to rYear then
		set end of nList to rYear & space & filmString
	else
		set end of nList to contents of filmString
	end if
	
end repeat
set AppleScript's text item delimiters to linefeed
set newText to nList as text

(*
"2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)
1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)
1945 The Stranger  [Orson Welles] 1945 (7,4)  
1979 1941 [Steven Spielberg] 1979 (5,5)"
*)

You might wish to clear out extraneous spaces beforehand — I counted two.

Hello, technomorph.

First of all, I would like to express my gratitude for your willingness to help. :smiley:

However, the alternative you propose does not seem suitable for my rudimentary knowledge of AppleScript.

Sincere thanks again.

Hello, Mockman :smiley:

Thank you very much for your change of approach, especially in finding an alternative to place the year at the beginning of the file name valid in any of the above situations.

Although you approach the selection by “chose file” instead of a previously made file selection treated as a list of aliases, it does not change the approach much.

I have put in the examples a simple format in the film name for clarity, but the format of the film consists of some fixed fields (those shown in the examples) and other variables (original title; icons representing nationality and genres; audio: original version / original version subtitled / dual; and, finally, if it has won awards, especially in the Oscars awards)

Year Title (Original Title) [Director]Icons for nationality and genres, audio (OV/OVS/Dual), Year, (score), Awards
Example:

— > 2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations

For this reason it is not possible to use the closing bracket as a reference to locate the year. Using a similar strategy (the first pair of parentheses) I get the original title or also the nationality (the icon immediately after the closing bracket).
We will have to think of another strategy to get the year of production.

However, once the string corresponding to the year is obtained, the way to solve the different options that can be found in the front portion of the file name,


 does filmString begin with release year
   set w1 to word 1 of first text item of filmString
   if w1 is not equal to rYear then…

is simple, powerful, avoiding complicated filters referring to date ranges and other unnecessary details. I think it is simply genial.

Thank you very much for your valuable help.

Hello, stockly.

Thank you very much for your collection of handlers.

I am reading carefully the code of one of them
(findAndFixNumbers(titleToFix))
and I think some modifications could be made to get the numeric string representing the year.

The iteration on the characters of the name of the selected files, separating the numeric ones from the non-numeric ones seems to me a strategy to consider to obtain later the year and use it to make the year range filters or, even better, use it to use the alternative proposed by Mockman.


   set w1 to word 1 of first text item of filmString
   if w1 is not equal to rYear then
       set end of nList to rYear & space & filmString
   else
       set end of nList to contents of filmString
   end if

Thank you very much for your valuable help.

This is a Regular Expression solution with help of the Foundation Framework.

The regex pattern “\]\s(\d{4})” searches for a closing bracket followed by a whitespace character and 4 digits and captures the year information.

The result of the operation is in the variable mappedFilmList


use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

set filmList to {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", "20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", "1945 The Stranger [Orson Welles] 1945 (7,4) ", "1941 [Steven Spielberg] 1979 (5,5)"}

set regexPattern to "\\]\\s(\\d{4})"
set regex to my (NSRegularExpression's regularExpressionWithPattern:regexPattern options:0 |error|:(missing value))
set mappedFilmList to {}
repeat with aFilm in filmList
	set firstMatch to (regex's firstMatchInString:aFilm options:0 range:{0, (count aFilm)})
	set extractedRange to (firstMatch's rangeAtIndex:1)
	set yearLocation to extractedRange's location() as integer
	
	set extractedText to text (yearLocation + 1) thru (yearLocation + 4) of aFilm
	if contents of aFilm begins with extractedText then
		set end of mappedFilmList to contents of aFilm
	else
		set end of mappedFilmList to extractedText & space & contents of aFilm
	end if
end repeat

The fourth example behaves like the first two.

Edit:

To get the result in your example you have to capture also the first year representation, if present.

use AppleScript version "2.5"
use framework "Foundation"
use scripting additions

set filmList to {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", "20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", "1945 The Stranger [Orson Welles] 1945 (7,4) ", "1941 [Steven Spielberg] 1979 (5,5)"}

set regexPattern to "(\\d{4})?[^]]+\\]\\s(\\d{4})"
set regex to my (NSRegularExpression's regularExpressionWithPattern:regexPattern options:0 |error|:(missing value))
set mappedFilmList to {}
repeat with aFilm in filmList
	set firstMatch to (regex's firstMatchInString:aFilm options:0 range:{0, (count aFilm)})
	set extractedPrefix to (firstMatch's rangeAtIndex:1)
	set hasYearPrefix to extractedPrefix's |length| = 4
	set extractedRange to (firstMatch's rangeAtIndex:2)
	set yearLocation to extractedRange's location() as integer
	
	set extractedText to text (yearLocation + 1) thru (yearLocation + 4) of aFilm
	if contents of aFilm begins with extractedText or hasYearPrefix then
		set end of mappedFilmList to contents of aFilm
	else
		set end of mappedFilmList to extractedText & space & contents of aFilm
	end if
end repeat

The first part “(\d{4})?[^]]+” of the pattern means: Search for 4 digits (optional), capture the value and ignore all subsequent characters which are not “]”.

The Magic that your looking for is in the Pattern and the Replace

Pattern: (note it will be different in appleScript as you have to double escape things"
^(\d\d\d\d(?=\s\w))?\s?(.?)\s+[(.)]\s+(\d\d\d\d).*?$

The Replace:
$4 - $2 [$3] ($4)
with the replace I’ve added in a " - " between the YEAR and the Title to help further in the future
I’ve also surrounded the ending year with (19xx) again to help with matching in the future.

This code works super fast and quick.
It only fails with a YEAR that starts out the line, has 4 digits and is followed by a space and a letter…
For:
2001: A Space Odyssey [Stanley Kubrick] 1968
it works because of the colon after 2001.
But For:
2001 A Space Odyssey [Stanley Kubrick] 1968
it fails.

Also notice it catches and eliminates any extra space at the end or between things.

Here are some screen shots from the program RegexKit
(CTRL CLICK OPEN IMAGE IN NEW TAB TO SEE FULL SIZE)
MAIN PATTERN WITH BREAKDOWN

SHOW GROUP MATCH CAPTURES

SHOWING CODE

see next post for AppleScript

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property NSRegularExpression : a reference to current application's NSRegularExpression
property NSRegularExpressionCaseInsensitive : a reference to 1
property NSRegularExpressionUseUnicodeWordBoundaries : a reference to 40
property NSRegularExpressionAnchorsMatchLines : a reference to 16
property NSRegularExpressionSearch : a reference to 1024
property NSString : a reference to current application's NSString

property myTestName : ""

property mySourceA : ""
property mySourceB : ""
property myPattern1 : ""
property myPattern2 : ""
property myReplace : ""

property myTestA1 : ""
property myTestA2 : ""

property myTestB1 : ""
property myTestB2 : ""
property myTestExpect1 : ""
property myTestExpect2 : ""

property logRegEx : true
property logResults : true
property logDebug : false



-- RUN TEMPLATE

-- \\b(WAV|24 bit|96|19\\.2)\\b
-- NEED FLAC MISSING BAD LOW REPLACE NOT LIVE

set aWordsPattern1 to "^(\\d\\d\\d\\d(?=\\s\\w))?\\s?(.*?)\\s+\\[(.*)\\]\\s+(\\d\\d\\d\\d).*?$"
set aWordsPattern2 to ""
set aSource1 to "20,000 Leagues Under the Sea [Georges Méliès] 1907"
set aSource2 to "1941 [Steven Spielberg] 1979"
set aReplace to "$4 - $2 [$3] ($4)"

my testRegWithName:"MOVIE FILE NAME SCAANING" pattern1:aWordsPattern1 pattern2:aWordsPattern2 source1:aSource1 source2:aSource2 replaceWith:aReplace expecting1:"" expecting2:""


-- MAIN SCRIPT OBJECT FUNCTIONS
on testRegWithName:aName pattern1:patternNo1 pattern2:patternNo2 ¬
	source1:sourceA source2:sourceB replaceWith:aReplace ¬
	expecting1:expectNo1 expecting2:expectNo2
	my resetValues()
	set myTestName to aName
	if not patternNo1 is "" then set myPattern1 to patternNo1
	if not patternNo2 is "" then set myPattern2 to patternNo2
	if not sourceA is "" then set mySourceA to sourceA
	if not sourceB is "" then set mySourceB to sourceB
	if not aReplace is "" then set myReplace to aReplace
	if not expectNo1 is "" then set myTestExpect1 to expectNo1
	if not expectNo2 is "" then set myTestExpect2 to expectNo2
	
	my runTestA()
	my runTestB()
	if logResults then my logTestResults()
end testRegWithName:pattern1:pattern2:source1:source2:replaceWith:expecting1:expecting2:

on resetValues()
	set myTestName to ""
	set myPattern1 to "NONE"
	set myPattern2 to "NONE"
	set mySourceA to "NONE"
	set mySourceB to "NONE"
	set myReplace to ""
	
	set myTestA1 to "NONE"
	set myTestA2 to "NONE"
	
	set myTestB1 to "NONE"
	set myTestB2 to "NONE"
	
	set myTestExpect1 to "NONE"
	set myTestExpect2 to "NONE"
end resetValues

on runTestA()
	if mySourceA is "NONE" then
		return
	end if
	if not myPattern1 is "NONE" then
		set myTestA1 to my findInString:mySourceA withPattern:myPattern1 replaceWith:myReplace
	end if
	if not myPattern2 is "NONE" then
		set myTestA2 to my findInString:mySourceA withPattern:myPattern2 replaceWith:myReplace
	end if
end runTestA

on runTestB()
	if mySourceB is "NONE" then
		return
	end if
	if not myPattern1 is "NONE" then
		set myTestB1 to my findInString:mySourceB withPattern:myPattern1 replaceWith:myReplace
	end if
	if not myPattern2 is "NONE" then
		set myTestB2 to my findInString:mySourceB withPattern:myPattern2 replaceWith:myReplace
	end if
end runTestB

on logTestResults()
	log ("------------------------------------------- TEST RESULTS LOG")
	log {"----------------myTestName is", myTestName}
	
	log {"myPattern1 is", myPattern1}
	log {"myPattern2 is", myPattern2}
	log {"myReplace is", myReplace}
	
	log {"--------------mySourceA is", mySourceA}
	
	log {"myTestA1 is", myTestA1}
	log {"myTestA2 is", myTestA2}
	if not myTestExpect1 is "NONE" then
		log {"myTestExpect1 is", myTestExpect1}
	end if
	
	log {"--------------mySourceB is", mySourceB}
	log {"myTestB1 is", myTestB1}
	log {"myTestB2 is", myTestB2}
	if not myTestExpect2 is "NONE" then
		log {"myTestExpect2 is", myTestExpect2}
	end if
end logTestResults

-- MAIN FUNCTIONS


on findInString:aString withPattern:aRegExString replaceWith:aReplace
	set aRegEx to my createRegularExpressionWithPattern:aRegExString
	if logDebug then
		log {"aRegEx is:", aRegEx}
	end if
	return (my findInString:aString withRegEx:aRegEx replaceWith:aReplace)
end findInString:withPattern:replaceWith:

on findInString:aString withRegEx:aRegEx replaceWith:aReplace
	if logDebug then log ("findInString:withRegEx:replaceWith: START")
	set aSource to NSString's stringWithString:aString
	set aRepString to NSString's stringWithString:aReplace
	set aLength to aSource's |length|()
	set aRange to (current application's NSMakeRange(0, aLength))
	set aCleanString to (aRegEx's stringByReplacingMatchesInString:aSource options:0 range:aRange withTemplate:aRepString)
	
	return aCleanString
end findInString:withRegEx:replaceWith:

on createRegularExpressionWithPattern:aRegExString
	if (class of aRegExString) is equal to (NSRegularExpression's class) then
		log ("it alreadry was a RegEx")
		return aRegExString
	end if
	set aPattern to NSString's stringWithString:aRegExString
	set regOptions to NSRegularExpressionCaseInsensitive + NSRegularExpressionUseUnicodeWordBoundaries
	set {aRegEx, aError} to (NSRegularExpression's regularExpressionWithPattern:aPattern options:regOptions |error|:(reference))
	if (aError ≠ missing value) then
		log {"regEx failed to create aError is:", aError}
		log {"aError debugDescrip is:", aError's debugDescription()}
		break
		return
	end if
	return aRegEx
end createRegularExpressionWithPattern:



on createPatternForMatchAnyWords:aLine
	set aString to NSString's stringWithString:aLine
	set aArray to aString's componentsSeparatedByString:" "
	set aPattern to NSString's stringWithString:"\\b("
	if (logRegEx) then
		log {"createPatternForMatchAnyWords aArray is:", aArray}
	end if
	
	set aTotal to (aArray's |count|())
	repeat with i from 1 to aTotal
		set aWord to aArray's item i
		set aWord to (aWord's stringByReplacingOccurrencesOfString:"%" withString:" ")
		set aWordPattern to (NSRegularExpression's escapedPatternForString:aWord)
		if (i ≠ aTotal) then
			set aWordPattern to (aWordPattern's stringByAppendingString:"|")
		end if
		if (logRegEx) then
			log {"aWord is:", aWord}
			log {"aWordPattern is:", aWordPattern}
		end if
		set aPattern to (aPattern's stringByAppendingString:aWordPattern)
	end repeat
	set aPattern to aPattern's stringByAppendingString:")\\b"
	if (logRegEx) then
		log {"final pattern is:", aPattern}
	end if
	return aPattern
end createPatternForMatchAnyWords:


on createPatternForMatchAllWords:aLine
	set aString to NSString's stringWithString:aLine
	set aArray to aString's componentsSeparatedByString:" "
	set aPattern to NSString's stringWithString:"^"
	if (logRegEx) then
		log {"createPatternForMatchAllWords aArray is:", aArray}
	end if
	
	repeat with i from 1 to (aArray's |count|())
		set aWord to aArray's item i
		if ((aWord's |length|()) > 1) then
			set aWordPattern to (my createPatternForMatchWord:aWord)
		else
			set aWordPattern to (my createPatternForMatchLetter:aWord)
		end if
		if (logRegEx) then
			log {"aWordPattern is:", aWordPattern}
		end if
		set aPattern to (aPattern's stringByAppendingString:aWordPattern)
	end repeat
	set aPattern to aPattern's stringByAppendingString:".*$"
	if (logRegEx) then
		log {"final pattern is:", aPattern}
	end if
	return aPattern
end createPatternForMatchAllWords:

-- (?=.*\\bYou\\b)
on createPatternForMatchWord:aWord
	set aWordPattern to NSString's stringWithString:"(?=.*\\b"
	set aWordPattern to (aWordPattern's stringByAppendingString:aWord)
	set aWordPattern to (aWordPattern's stringByAppendingString:".?\\b)")
	return aWordPattern
end createPatternForMatchWord:

on createPatternForMatchLetter:aWord
	set aWordPattern to NSString's stringWithString:"(?=.*\\b"
	set aWordPattern to (aWordPattern's stringByAppendingString:aWord)
	set aWordPattern to (aWordPattern's stringByAppendingString:".{0,2}\\b)")
	return aWordPattern
end createPatternForMatchLetter:



Hello, stefanK and technomorph

You both propose a pattern-based resource that is totally unknown to me, but considering your comments, it seems worth knowing about.

I will read and study carefully the examples you send me and I would be grateful if you could tell me where I can find information about it.

Although the examples I have chosen to clearly state my question lead one to think that there is a pattern in relation to the closing bracket and the 4 characters indicating the year, in reality there is not since, as I state in comment #12, there are fixed fields and others that are variable.

“I have put in the examples a simple format in the film name for clarity, but the format of the film consists of some fixed fields (those shown in the examples) and other variables (original title; icons representing nationality and genres; audio: original version / original version subtitled / dual; and, finally, if it has won awards, especially in the Oscars awards)”

It is also possible to find an underscore at the beginning of the filename indicating that this movie has already been seen by me.

_Year Title (Original Title) [Director]Icons for nationality and genres, audio (OV/OVS/Dual), Year, (n,n), Awards

Example:

— > _2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations

Thank you very much for all your help.

Typo…

Have a look at this. Seems to do what you want, but I’m still not sure what the purpose is. You can handle titles like 1941 differently.

Later you said some titles may have an _ and I don’t know how that would work. Would you want the script to ignore it? Would it go before or after the first number?


	set titleInfo to {¬
	"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
	"20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
	"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
	"1941 [Steven Spielberg] 1979 (5,5)"}

set AppleScript's text item delimiters to {" [", "] ", " ("}
set fixedTitles to {}
repeat with thisTitle in titleInfo
	set thisTitleInfo to text items of thisTitle
	set {titleText, creator, productionYear, otherInfo} to thisTitleInfo
	try
		set titleNumber to word 1 of titleText as number
		set prodYear to productionYear as number
		if not titleNumber = prodYear then
			if (titleNumber > 1894) and titleNumber < (prodYear + 2) then
				set titleText to "?" & productionYear & "? " & titleText
			else
				set titleText to productionYear & " " & titleText
				
			end if
		end if
	on error
		set titleText to productionYear & " " & titleText
	end try
	set the end of fixedTitles to titleText & " [" & creator & "] " & productionYear & " (" & otherInfo
end repeat
return fixedTitles

--		{"2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
--		"1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
--		"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
--		"?1979? 1941 [Steven Spielberg] 1979 (5,5)"}

Please try this, it captures any 4 digit combination after the first captured group

As already mentioned by others Regular Expression is a very powerful way to parse strings.
There are many tutorials.

use AppleScript version "2.5"
use framework "Foundation"
use scripting additions


set filmList to {"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
	"20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
	"1945 The Stranger [Orson Welles] 1945 (7,4) ", ¬
	"1941 [Steven Spielberg] 1979 (5,5)", ¬
	"_2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations"}

set regexPattern to "(_?\\d{4})?.+(\\d{4})"
set regex to my (NSRegularExpression's regularExpressionWithPattern:regexPattern options:0 |error|:(missing value))
set mappedFilmList to {}
repeat with aFilm in filmList
	set firstMatch to (regex's firstMatchInString:aFilm options:0 range:{0, (count aFilm)})
	set extractedPrefix to (firstMatch's rangeAtIndex:1)
	set hasYearPrefix to extractedPrefix's |length| = 4 or extractedPrefix's |length| = 5
	set extractedRange to (firstMatch's rangeAtIndex:2)
	set yearLocation to extractedRange's location() as integer
	set cocoAFilm to my (NSString's stringWithString:(contents of aFilm))
	
	set extractedText to text (yearLocation + 1) thru (yearLocation + 4) of aFilm
	if hasYearPrefix and extractedPrefix's |length|() = 5 then
		set end of mappedFilmList to text 2 thru -1 of contents of aFilm
	else if contents of aFilm begins with extractedText or hasYearPrefix then
		set end of mappedFilmList to contents of aFilm
	else
		set end of mappedFilmList to extractedText & space & contents of aFilm
	end if
end repeat 
 

Here’s a guess at how to handle underscores:


use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions


set titleInfo to {¬
	"Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
	"20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
	"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
	"1941 [Steven Spielberg] 1979 (5,5)", ¬
	"_2021 Drive My Car (Doraibu mai kâ) [Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations"}

set AppleScript's text item delimiters to {" [", "] ", " (", "_"}
set fixedTitles to {}
repeat with thisTitle in titleInfo
	set thisTitleInfo to text items of thisTitle
	if item 1 of thisTitleInfo is not "" then
		set {titleText, creator, productionYear, otherInfo} to thisTitleInfo
		try
			set titleNumber to word 1 of titleText as number
			set prodYear to productionYear as number
			if not titleNumber = prodYear then
				if (titleNumber > 1894) and titleNumber < (prodYear + 2) then
					set titleText to "?" & productionYear & "? " & titleText
				else
					set titleText to productionYear & " " & titleText
					
				end if
			end if
		on error
			set titleText to productionYear & " " & titleText
		end try
	else
		set {titleText, creator, productionYear, otherInfo} to the rest of thisTitleInfo
		
		set titleText to "_" & titleText
	end if
	set the end of fixedTitles to titleText & " [" & creator & "] " & productionYear & " (" & otherInfo
end repeat
return fixedTitles

--{"2003 Kill Bill Volume 1 [Quentin Tarantino] 2003 (7,6)", ¬
--"1907 20,000 Leagues Under the Sea [Georges Méliès] 1907 (6,0)", ¬
--"1945 The Stranger  [Orson Welles] 1945 (7,4)", ¬
--"?1979? 1941 [Steven Spielberg] 1979 (5,5)", ¬
--"_2021 Drive My Car [Doraibu mai kâ)] Ryûsuke Hamaguchi]Various icons Dual 2021 (6,9) Oscar Best International Film. 4 nominations"}


You say your data is inconsistent. If it contains additional [ or] or) characters then this won’t work.