Script 2 of 4. IMDB information to the annotations of a video

My problem: I have a growing collection of videos that I’ve acquired over the years. My problem is that I have so many now that it’s difficult to remember what they all are, what they’re about, who is in each etc.

The solution: I developed a workflow of 4 applescripts to accomplish my task. The workflow ultimately places the information about each video in the video file, such that I can open a video and instantly see the data or use Spotlight to search on the data. A more detailed explanation of what each script does is explained inside each applescript.

This is script 2 of 4. IMDB (Internet Movie Database) information to the annotations of a video:
Do to the complexity of this script, this script works on the front video in QuickTime Player as opposed to a folder full of video files. The reason being that you need to check the accuracy of the returned results before writing it to the annotations of the video. This script searches the IMDB for the file name of the video and returns information about the video. Once you verify the returned information the script will write that data to the annotations.

The 4 scripts are here:
http://bbs.applescript.net/viewtopic.php?pid=82570#p82570
http://bbs.applescript.net/viewtopic.php?pid=82571#p82571
http://bbs.applescript.net/viewtopic.php?pid=82572#p82572
http://bbs.applescript.net/viewtopic.php?pid=82573#p82573

(* this script will get the name of the front movie in quicktime. The name is derived from its file name. The script then searches the IMDB movie database for that name. The top result from the imdb search is used to return the following about the movie: {release_date, the_genre, user_rating, mpaa_rating, plot_outline, the_cast (the first 4 listed cast members)}. For this to work properly you need to make sure the file name of your movie is the movie title as defined in the IMDB database. The file extension in the file name will not affect the search.  *)

(* After the search, the results are presented to you in a dialog box at which point you can validate that they're accurate. *)

(* At this point you have 3 choices in the dialog box. 1) you can do nothing and quit the script, 2) you can perform another IMDB search (useful if your original search wasn't accurate), and 3) if your search was accurate then you can write this data as annotations in the movie file. *)

(* Note: If you write the results as annotations then you need to save the movie file before closing to retain the new annotations. *)
(* Tip: if the script is having trouble finding the correct movie information you can use the "New Search" button to type in new search terms. If the script still can't find the correct movie information then you can go to the imdb website and find it yourself. Once you go to the proper web page for the movie, you need to get the imdb movie number which is the number starting with "tt" in the url of the web page. Put that number into the script's search box and the script should then find the correct information. *)

-- check for the right system version and end the script if it isn't
-- 10.4 only features are...
-- 1) The way it writes annotations is new in 10.4 and is used in the setAnnotations() subroutine
tell (system attribute "sysv") to set vSys to (("1" & it mod 4096 div 256) as string) & "." & it mod 256 div 16 & "." & it mod 16
if vSys < "10.4" then
	display dialog "This script requires the installation of MacOSX 10.4 or higher." buttons {"OK"} default button 1 with icon 2
	return
end if

-- get movie title from file name of front movie in quicktime
tell application "QuickTime Player"
	try
		if not (exists front movie) then error "Error: no movies are open in Quicktime Player!"
	on error error_message
		beep
		display dialog error_message buttons {"OK"} default button 1
		return
	end try
	set movie_name to name of front movie
	if saveable of movie 1 is false then error "This movie has previously been set so that it cannot be copied, edited, or saved."
	stop movies
	set movie_path to original file of movie movie_name
end tell
set nmExt to my getName_andExtension(movie_path)
set movie_title to item 1 of nmExt
set movie_title to my stripYear(movie_title)

repeat
	with timeout of 3600 seconds -- ie. do not time out for at least an hour
		-- setup the movie title into imdb search form ie. word1+word2+word3 etc
		set search_title to my titleIMDB(movie_title)
		
		try
			-- perform the search and find only the top result
			set search1Header to "http://www.imdb.com/find?s=all&q="
			try
				set top_result to do shell script "curl " & quoted form of (search1Header & search_title) & " | grep -i \"popular titles\""
			on error
				set top_result to do shell script "curl " & quoted form of (search1Header & search_title) & " | grep -i \"exact matches\""
			end try
			
			-- obtain the movie number from the top result
			set movie_number to my movieNum(top_result)
			
			-- get the movie web page from imdb using the movie number
			set search2Header to "http://www.imdb.com/title/"
			set movie_page to do shell script "curl " & quoted form of (search2Header & movie_number & "/") without altering line endings
		on error
			-- sometimes when you search for a movie title, instead of presenting you with a list of movies to pick from the website jumps you directly to the movies page. My script errors in these cases so it will do it the hard way and use Safari.
			try
				tell application "Safari"
					activate
					open location (search1Header & search_title)
					delay 1
					my web_page_loading()
					set thisurl to the URL of document 1
					tell application "System Events" to tell process "safari"
						keystroke "w" using command down
						keystroke "h" using command down
					end tell
				end tell
				set movie_page to do shell script "curl " & quoted form of thisurl without altering line endings
			end try
		end try
		
		-- strip out pertinent info from web page
		try
			set movie_title to do shell script "echo " & quoted form of movie_page & " | grep -i \"<title>\""
			set movie_title to parseTitle(movie_title)
		on error
			set movie_title to "missing value"
		end try
		
		try
			set release_date to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"release date:\""
			set release_date to my parseReleaseDate(release_date)
		on error
			try
				set release_date to do shell script "echo " & quoted form of movie_page & " | grep -i \"Sections/Years\""
				set release_date to my parseReleaseDate2(release_date)
			on error
				set release_date to "missing value"
			end try
		end try
		
		try
			set the_genre to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"genre:\""
			set the_genre to my parseGenre(the_genre)
		on error
			set the_genre to "missing value"
		end try
		
		try
			set user_rating to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"<b>user rating:</b>\""
			set user_rating to my parseUserRating(user_rating)
		on error
			set user_rating to "missing value"
		end try
		
		try
			try
				set mpaa_rating to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"mpaa\""
				set mpaa_rating to my parseMPAARating(mpaa_rating)
			on error
				set mpaa_rating to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"certification:\" | grep -i \"usa\""
				set mpaa_rating to my parseCertificationRating(mpaa_rating)
			end try
		on error
			set mpaa_rating to "missing value"
		end try
		
		try
			try
				set plot_outline to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"plot outline:\""
			on error
				try
					set plot_outline to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"plot summary:\""
				on error
					set plot_outline to do shell script "echo " & quoted form of movie_page & " | grep -A 1 -i \"tagline:\""
				end try
			end try
			set plot_outline to my parsePlotOutline(plot_outline)
		on error
			set plot_outline to "missing value"
		end try
		
		try
			try
				set the_cast to do shell script "echo " & quoted form of movie_page & " | grep -i \"Cast overview, first billed only\""
			on error
				set the_cast to do shell script "echo " & quoted form of movie_page & " | grep -i \"Credited cast\""
			end try
			set the_cast to my castMembers(the_cast, 4)
		on error
			set the_cast to "missing value"
		end try
		
		-- fix html code in decimal unicode format ie. special characters in the form of ç
		set movie_title to my decHTML_to_string(movie_title)
		set release_date to my decHTML_to_string(release_date)
		set the_genre to my decHTML_to_string(the_genre)
		set user_rating to my decHTML_to_string(user_rating)
		set mpaa_rating to my decHTML_to_string(mpaa_rating)
		set plot_outline to my decHTML_to_string(plot_outline)
		set the_cast to my decHTML_to_string(the_cast)
		
		-- compile the results into a list of records
		set ann_records to {{ann_heading:"Full Name", ann_value:movie_title}, {ann_heading:"Copyright", ann_value:release_date}, {ann_heading:"Genre", ann_value:the_genre}, {ann_heading:"Warning", ann_value:user_rating}, {ann_heading:"Special Playback Requirements", ann_value:mpaa_rating}, {ann_heading:"Description", ann_value:plot_outline}, {ann_heading:"Performers", ann_value:the_cast}}
		
		-- display the dialog box to choose the next action
		set dialog_text to "Movie Title: " & movie_title & return & "Release Date: " & release_date & return & "Genre: " & the_genre & return & "User Rating: " & user_rating & return & "MPAA Rating: " & mpaa_rating & return & "Plot Outline: " & plot_outline & return & "The Cast: " & the_cast
		display dialog dialog_text buttons {"Cancel", "New Search", "Set Annotations"} default button 1
		set buttonEntered to the button returned of result
		if buttonEntered is "New Search" then
			repeat
				display dialog "Type in a new movie title to search." default answer (item 1 of nmExt) with icon note buttons {"Cancel", "OK"} default button "OK"
				set {text_entered, button_pressed} to {text returned, button returned} of the result
				if text_entered is not "" then
					set movie_title to text_entered
					exit repeat
				end if
			end repeat
		else if buttonEntered is "Set Annotations" then
			repeat with aRecord in ann_records
				my setAnnotations(aRecord)
			end repeat
			
			-- open movie properties window so you can check that the annotations were written properly
			tell application "QuickTime Player" to tell movie 1
				set movie_name to name
				set show detailed movie info window to true
			end tell
			exit repeat
		end if
	end timeout
end repeat

(*====================== SUBROUTINES ==========================*)
on setAnnotations(this_record)
	tell application "QuickTime Player" to tell movie 1
		if exists annotation (this_record's ann_heading) then
			beep 2
			set full text of annotation (this_record's ann_heading) to (this_record's ann_value)
		else
			try
				make new annotation with properties {name:(this_record's ann_heading), full text:(this_record's ann_value)}
			on error the error_message number the error_number
				display dialog "Error: " & the error_number & ". " & the error_message buttons {"OK"} default button 1
			end try
		end if
	end tell
end setAnnotations

on titleIMDB(movie_title)
	set text item delimiters to "."
	set search_title to text items of movie_title
	set text item delimiters to space
	set search_title to search_title as string
	set text item delimiters to "_"
	set search_title to text items of search_title
	set text item delimiters to space
	set search_title to search_title as string
	set search_title to text items of search_title
	set text item delimiters to "+"
	set search_title to search_title as string
	set text item delimiters to ""
	return search_title
end titleIMDB

on movieNum(the_string)
	set text item delimiters to "<a href=\"/title/"
	set first_cut to text items of the_string
	set part_result to item 2 of first_cut
	set text item delimiters to "/"
	set second_cut to text items of part_result
	set text item delimiters to ""
	set movie_number to item 1 of second_cut
	return movie_number
end movieNum

on parseTitle(movie_title)
	set text item delimiters to "<title>"
	set a to text items of movie_title
	set text item delimiters to ""
	set movie_title to a as string
	set text item delimiters to "</title>"
	set a to text items of movie_title
	set text item delimiters to ""
	set movie_title to a as string
	return movie_title
end parseTitle

on parseReleaseDate(release_date)
	set text item delimiters to return
	set a to text items of release_date
	set text item delimiters to ""
	set release_date to item 2 of a
	return release_date
end parseReleaseDate

on parseReleaseDate2(release_date)
	set text item delimiters to "</a>"
	set a to text items of release_date
	set text item delimiters to ""
	set release_date to characters -4 thru -1 of (item 1 of a) as string
	return release_date
end parseReleaseDate2

on parseGenre(the_genre)
	set remove_strings to {return, " / ", space, "><"}
	repeat with a_string in remove_strings
		set text item delimiters to a_string
		set a to text items of the_genre
		set text item delimiters to ""
		set the_genre to a as string
	end repeat
	set a to characters of the_genre
	set the_count to count of a
	set the_genre to {}
	repeat with i from 1 to the_count
		set i_char to item i of a
		if i_char is ">" then
			repeat with j from (i + 1) to the_count
				set j_char to item j of a
				if j_char is ":" or j_char is "=" then
					copy j + 1 to i
					exit repeat
				end if
				if j_char is "<" then
					set end of the_genre to (items (i + 1) thru (j - 1) of a) as string
					copy j + 1 to i
					exit repeat
				end if
			end repeat
		end if
	end repeat
	set text item delimiters to "," & space
	set the_genre to the_genre as string
	set text item delimiters to ""
	return the_genre
end parseGenre

on parseUserRating(user_rating)
	set text item delimiters to return --(ASCII character 10)
	set a to text items of user_rating
	set user_rating to item 2 of a
	set remove_strings to {space, "<b>", "</b>"}
	repeat with a_string in remove_strings
		set text item delimiters to a_string
		set a to text items of user_rating
		set text item delimiters to ""
		set user_rating to a as string
	end repeat
	return user_rating
end parseUserRating

on parseMPAARating(mpaa_rating)
	set text item delimiters to return
	set a to text items of mpaa_rating
	set text item delimiters to ""
	set mpaa_rating to item 2 of a
	return mpaa_rating
end parseMPAARating

on parseCertificationRating(mpaa_rating)
	set text item delimiters to "certificates=USA:"
	set a to text items of mpaa_rating
	set mpaa_rating to item 2 of a
	set text item delimiters to "&&heading="
	set a to text items of mpaa_rating
	set text item delimiters to ""
	set mpaa_rating to item 1 of a
	set mpaa_rating to "USA-" & mpaa_rating
	return mpaa_rating
end parseCertificationRating

on parsePlotOutline(plot_outline)
	set text item delimiters to return
	set a to text items of plot_outline
	set text item delimiters to ""
	set plot_outline to item 2 of a
	set text item delimiters to "<a class="
	set a to text items of plot_outline
	set text item delimiters to ""
	set plot_outline to item 1 of a
	return plot_outline
end parsePlotOutline

on castMembers(the_cast, how_many)
	set text item delimiters to "<td class=\"nm\">"
	set a to text items of the_cast
	set text item delimiters to ""
	if how_many > ((count of a) - 1) then set how_many to ((count of a) - 1)
	set cast_members to {}
	repeat with i from 2 to (how_many + 1)
		set end of cast_members to my castMember(item i of a)
	end repeat
	set text item delimiters to ", "
	set cast_string to cast_members as string
	set text item delimiters to ""
	return cast_string
end castMembers

on castMember(the_string)
	set c to characters of the_string
	repeat with i from 1 to (count of c)
		set i_char to item i of c
		if i_char is ">" then
			repeat with j from (i + 1) to (count of c)
				set j_char to item j of c
				if j_char is "<" then
					set real_name to (items (i + 1) thru (j - 1) of c) as string
					exit repeat
				end if
			end repeat
			exit repeat
		end if
	end repeat
	set text item delimiters to "<td class=\"char\">"
	set d to text items of the_string
	set e to item 2 of d
	set text item delimiters to "</td>"
	set d to text items of e
	set text item delimiters to ""
	set char_name to item 1 of d
	set cast_member to real_name & " as " & char_name as string
	return cast_member
end castMember

on stripYear(movie_title)
	if movie_title contains "(" then
		set x to offset of "(" in movie_title
		if character (x + 5) of movie_title is ")" then
			if length of movie_title > (x + 5) then
				if character (x - 1) of movie_title is space then
					set movie_title to (characters 1 thru (x - 2) of movie_title & characters (x + 6) thru -1 of movie_title) as string
				else
					set movie_title to (characters 1 thru (x - 1) of movie_title & characters (x + 6) thru -1 of movie_title) as string
				end if
			else
				if character (x - 1) of movie_title is space then
					set movie_title to (characters 1 thru (x - 2) of movie_title) as string
				else
					set movie_title to (characters 1 thru (x - 1) of movie_title) as string
				end if
			end if
		end if
	end if
	return movie_title
end stripYear

on getName_andExtension(F)
	set F to F as string
	set {name:Nm, name extension:Ex} to info for file F
	if Ex is missing value then set Ex to ""
	if Ex is not "" then
		set Nm to text 1 thru ((count Nm) - (count Ex) - 1) of Nm
		set Ex to "." & Ex
	end if
	return {Nm, Ex}
end getName_andExtension

on web_page_loading()
	set theDelay to 10 -- the time in seconds the script will wait to let a web page load
	set numTries to 3 -- the number of stop/reload cycles before giving up
	set my_delay to 0.25
	set myCounter to 0
	set finished to false
	repeat until finished is true
		set startTime to current date
		set myCounter to myCounter + 1
		set web_page_is_loaded to false
		delay my_delay
		tell application "Safari"
			activate
			repeat until web_page_is_loaded is true
				-- check time and do this if 10 seconds hasn't elapsed
				delay 1
				if (startTime + theDelay) > (current date) then
					if name of window 1 contains "Loading" then
						delay my_delay
					else if name of window 1 contains "Untitled" then -- failed
						delay 2
						if name of window 1 contains "Untitled" then
							set web_page_is_loaded to true
							set finished to true
							set frontApp to displayed name of (info for (path to frontmost application))
							tell application frontApp to display dialog "The web page will not load!"
						end if
					else if name of window 1 contains "Failed to open page" then
						tell application "System Events" to tell process "Safari"
							keystroke "." using command down -- stop the page
							delay my_delay
							keystroke "r" using command down -- reload the page
						end tell
						delay my_delay
						set web_page_is_loaded to true
					else
						delay my_delay * 6
						return true
					end if
				else -- if 10 seconds has elapsed then do this
					tell application "System Events" to tell process "Safari"
						-- if we tried 3 times then give up
						if myCounter is numTries then
							keystroke "." using command down -- stop the page
							return false
						else -- try again because we didn't try 3 times yet
							keystroke "." using command down -- stop the page
							delay my_delay
							keystroke "r" using command down -- reload the page
							delay my_delay
							set web_page_is_loaded to true
						end if
					end tell
				end if
			end repeat
		end tell
	end repeat
end web_page_loading

on decHTML_to_string(the_string)
	set {TIDs, text item delimiters} to {text item delimiters, "&#"}
	set b to text items of the_string
	set text item delimiters to TIDs
	set uniList to {item 1 of b}
	repeat with i from 2 to (count of b)
		set this_string to item i of b
		set string_count to count of this_string
		repeat with j from 1 to string_count
			if item j of this_string is ";" or item j of this_string is "\\" then
				set nDec to text 1 thru (j - 1) of this_string -- get the decimal value
				set nHex to do shell script "perl -e 'printf(\"%04X\", " & nDec & ")'" -- convert decimal to hex
				set uChar to run script "«data utxt" & nHex & "»" -- convert unicode hex to unicode character
				if string_count > j then
					set u_string to (uChar & (text (j + 1) thru string_count of this_string)) as string
				else
					set u_string to uChar
				end if
				set end of uniList to u_string
				exit repeat
			end if
		end repeat
	end repeat
	return uniList as string
end decHTML_to_string

I made a significant change to the above script. I added a handler to it called “decHTML_to_string(the_string)”. This handler will convert the html code to text when it contains decimal unicode values. These decimal values are when you see results that contain things like “&#XXX;” where “XXX” is some number. These number values stand for a unicode character and the script now accurately accounts for them.

On July 11th, 2007 another small change was made to the “on movieNum(the_string)” handler to work better with the IMDB database