Processing HTML with GREP and Safari and/or CURL

I’ve got a list of URL’s (900+). These are catalog pages of items. I’ve got Safari already set up download each of these to local html files for further processing.

Each page contains this HTML in the source:

Master Category > Sub Category

Which basically displays as

Master Category > Sub Category

Using GREP (I think) I’m trying to retrieve this text to

  1. Insert MasterCategory > SubCategory into the page, replacing what’s there
  2. Rename the file to “MasterCategory_SubCategory.html”

Grep is a new fangled thing to me. Any assistance is muy appreiciated.

Model: pb tibook 667
AppleScript: 1.9.3
Browser: Safari 312.3.1
Operating System: Mac OS X (10.3.9)

Update: I’ve found a pattern that will find my target string:

([^<]) > ([^<])

But I’m stumped on how to

Leave this line in place.
Strip out the html and generate a regular string
Put that string between the Title tags

This works on a single file:

choose file without invisibles
set theFile to quoted form of POSIX path of result

set ASTID to AppleScript's text item delimiters
set AppleScript's text item delimiters to {"<"}

try
	do shell script "grep -o '<font color=\"#0000ff\">[^<]*</font></a></nobr> > <nobr><strong>[^<]*</strong></nobr>' " & theFile & " | colrm 1 22"
	get every text item of result
	get first item of result & " > " & (text 8 thru -1 of (sixth item of result))
	
	do shell script "perl -p -i -e " & quoted form of ("s~<title>[^<]*</title>~<title>" & result & "</title>~") & " " & theFile
on error errorMsg number errorNum
	display dialog "Error (" & errorNum & "):" & return & return & errorMsg buttons "Cancel" default button 1 with icon caution
end try

set AppleScript's text item delimiters to ASTID

This assumes that “Master Category” and “Sub Category” do not contain a “>” character.