applescript with html source

I’m new to applescript and I’m trying to make a script to:

  1. find links on a webpage that contain a certain url path but have different endings in sequence (i.e. www.blablabla.com/1 /2 /3, and so on)
  2. follow the links and find certain text on the linked page, but the trick is, i don’t know what the text will be–but i do know the formatting of it, so if i can search the html source of the page to find the code that says what font, color, size, etc and then somehow select the text following those tags and copy and paste it into a separate document

any idea if this is even possible and any advice on how to accomplish it?

the actual application of this is to take an html calendar from a website with links to all the different events planned for each date, copy the info (date, time, and description of the event) and create new events in a calendar in iCal.

thanks!

If you’re using AppleScript, the quick-n-dirty solution would be:

  1. Download HTML documents with curl. See Standard Additions’ ‘do shell script’ command and do ‘man curl’ in Terminal.

  2. Search HTML documents with regular expressions. e.g. See TextCommands.

  3. Clean up whitespace and decode HTML entities in resulting text. (See TextCommands.)

Might also need to futz with text encodings as a precursor to #2 and/or strip redundant tags as a precursor to #3; see TextCommands for that as well.

For the non-quick-n-dirty solution, look into HTML parsers. There’s a shortage of decent ones for AppleScript, but here’s one for Python: BeautifulSoup.

HTH

There has been a lot written about parsing html with AS, some good some bad. I seem to get what I need without anything but vanilla AS, but I’ve never tried an extreme case.
Here’s a working example of your idea, you just need to dial it in and write to file:


tell application "Safari" to set BaseURl to URL of document 1
set ParsedURL to text returned of (display dialog "Enter the base URL without the numeric ending" default answer BaseURl)
set ThisPage to text returned of (display dialog "Starting page number?" default answer "1")


try
	repeat
		tell application "Safari" to set HTMLsource to source of document 1
		
		set AppleScript's text item delimiters to "<"
		set Blocks to text items of HTMLsource
		set AppleScript's text item delimiters to ""
		
		repeat with ThisBlock in Blocks
			if ThisBlock contains "color" then display dialog ThisBlock as string
			if ThisBlock contains "font" then display dialog ThisBlock as string
			if ThisBlock contains "size" then display dialog ThisBlock as string
		end repeat
		
		set NextPage to ParsedURL & (ThisPage + 1)
		
		tell application "Safari" to set URL of document 1 to NextPage
	end repeat
on error
	exit repeat
end try

SC

Thanks this is all very helpful!

2 more questions:

  1. How can I set the individual results from list of TextCommands search results to different variables so that I can later set an iCal event’s fields (e.g. location, date, time, etc.) to these variables?

  2. This ones a little more complicated :D. Since the online html calendar I’m trying to duplicate in iCal is constantly updated, I need to run this script periodically in order to update the iCal calendar. However, I figure it would make more sense and save time to ignore already created iCal events and just add the new ones from the online html calendar, instead of creating an entirely new iCal calendar from scratch each time. In order to do this, I’m thinking of 2 possible ways: (a.)Checking the actual iCal calendar to see if events with the names found on the online html calendar already exist, which would cause the script to ignore those events on the html calendar and move on to ones that have been added since the last time i ran the script. (b.)Creating some type of log file that logs previously found URLs. then when the script is run again, it would check the log file and ignore the URLs in it, so it would only search for URL links added since the script was last run.

If this even makes sense to you, let me know what sounds good and any advice on how to check the iCal calendar for those events, or how to create and check the log file!

Thanks again!

What your doing sounds possible, maybe even complicated! I’ve never used TextCommands. I wouldn’t put the results of the search in a list; I would keep them separate for ease of use later. Just curious, what did you ask TextCommands to do? (Decode HTML, search for string, etc)

If you can post the link to the calendar it will be much easier to start. If that’s not possible just post the calendar hosting site, that way I can see what’s going on with formatting.
SC

Here’s the calendar (a single day’s view, August 7th, to be exact):
http://intranet.risd.edu/ContentAdministrator/calendar/view/day.asp?month=8&day=7&year=2005
I didn’t post it before because I didn’t know how involved anyone would really want to get, but thanks for all the help!!

I thought I’d use TextCommands to search this page for the html tag that makes the titles of the events bold “” and set the text following that (until the end bold tag “”) to a variable. Therefore being able to enter that variable as the title of a new event in iCal. Then I’ll just repeat that procedure for each of the different fields–description, location, date, and time. I’m not sure, however, how to set TextCommands’ search results to a variable. Also, when there are multiple events listed on one day (as in the example of august 7th), when I search for “” or any other such tag, it will be returned multiple times: once for each event. So I’ll need to set each title in the list to a different variable, I assume.

After all that’s done for each event on one day, the script will move on to the next day by adding 1 to the “day=” portion of the URL. I’m not sure when to make it stop, maybe just after a few months into the future. And I guess I could make it start on the current day just by asking the user to input the current date…or better yet, getting the date from the computer’s clock!

(on a side note, that’s my school’s website, as you can see…it’s hard to explain what an art student is doing trying to program! but it does explain why i’m using a mac! :D)

well, if you get it via curl to a file, you can get the event titles like so:

grep “” /path.html | sed -e ‘s/<[^>]>//g’ | sed 's/^[[:punct:][:space:]]//g’

the pertinent info with:

grep “Location:” /path/RISD.html | sed -e ‘s/<[^>]>//g’ | awk -F";" ‘{ gsub(" &nbsp", “”); gsub(" -&nbsp", “”); print $0}’ | sed 's/^[[:punct:][:space:]]//g’

which will get you:

Location:;The RISD MuseumDate:;8/7/2005Time:;02:30 PM;03:30 PM
Location:;The RISD MuseumDate:;8/7/2005Time:;01:30 PM;02:30 PM
Location:;RISD MuseumDate:;8/7/2005Time:;1:30pm;3:30pm

and you can parse THAT a lot easier in applecript for your own needs.

tell application "TextCommands"
	set allMatches to search htmlTxt for "<b>(.*?)</b>.*?<td width=\"500\"><pre><font face=\"verdana\" size=\"2\">(.*?)</font>.*?<span class=\"gray\">Location:&nbsp;&nbsp;</span>(.*?)<br>.*?<span class=\"gray\">Date:&nbsp;&nbsp;</span>(.*?)<br>.*?<span class=\"gray\">Time:&nbsp;&nbsp;</span>(.*?)&nbsp;-&nbsp;(.*?)<br>" with regex -- ugly, fragile RE for scraping data out of horrid tag soup
end tell
repeat with matchRef in allMatches -- process each match in turn
	set {theTitle, theDescription, theLocation, theDate, startTime, endTime} to matchRef
	-- [convert title and description HTML to plain text, and date and time strings to AS date objects, then add entry to iCal here]
end repeat

How’s this?

Note: This script contains ’ ', but due to the html formatting of this page it looks like a space, so make sure you click the ‘Open this Scriplet in your Editor:’ link instead of copying and pasting.

set old_delimiters to text item delimiters

set this_date to date "Sunday, 7 August 2005 12:00:00 AM" --(current date)
tell this_date to set the_url to "http://intranet.risd.edu/ContentAdministrator/calendar/view/day.asp?month=" & (month of it as number) & "&day=" & day of it & "&year=" & year of it
set source_text to do shell script "curl " & quoted form of the_url
set the_events to {}

set text item delimiters to "<b>"
if (count of text items of source_text) is not 1 then
	set source_text to text items 2 thru -1 of source_text
	repeat with this_text in source_text
		set text item delimiters to "</b>"
		set b to {text item 1 of this_text}
		set text item delimiters to "<br>"
		set this_text to text items of this_text
		repeat with i from 1 to 3
			set text item delimiters to item i of {"Location:  </span>", "Date:  </span>", "Time:  </span>"}
			-- set text item delimiters to item i of {"Location:&nbsp;&nbsp;</span>", "Date:&nbsp;&nbsp;</span>", "Time:&nbsp;&nbsp;</span>"}
			set temp_text to text item 2 of (item i of this_text)
			set text item delimiters to " "
			-- set text item delimiters to "&nbsp;"
			set temp_text to text items of temp_text
			set text item delimiters to space
			set end of b to text items of temp_text as string
		end repeat
		set end of the_events to b
	end repeat
end if

set text item delimiters to old_delimiters
return the_events

:lol::lol::lol: My sentiments exactly. I realized, why are we parsing html when the TEXT of the pages all have the same format. The events in Safari are

"set EventList to paragraphs of text of document 1 whose font is “Verdana-Bold”
->{"Always on Sunday: This land is our land! ", "
", "Artist on Call: Rachel Paiewonsky
", "
", "Life in Our Eyes: The Art of Media and Social Chan
", "
"}

Then I parsed the paragraphs of the text for date, and lists of times & location. Time and date are then coerced to a “date” format (didn’t know it works!). Tell Ical create a calendar, if it already exits add new event using coerced date as start date. One variable for date & time gets us started.

“tell this_calendar to set this_event to make new event at the end of events with properties {start date:ThisTime, summary:TheName, description:(item 1 of EventList as string), location:(item 1 of theLocations as string)}”

Now we need to count the items and replace “item 1” in the properties with “item i” to match event names with data. Not really a good description, but I think vet scripters know where I’m headed. Then each event on a page will be matched to a corresponding item(EventList will be parsed to match items). Here is a prototype that grabs the first event from a page and sends it to iCal:

tell application "Safari"
	set EventList to paragraphs of text of document 1 whose font is "Verdana-Bold"
	set PageContent to text of document 1
end tell



set TextBlocks to paragraphs of PageContent

set theLocations to {}
set theTimes to {}


repeat with ThisBlock in TextBlocks
	if ThisBlock contains "Date:" then set theDate to ThisBlock as string
	
	if ThisBlock contains "Location:" then copy ThisBlock as string to the end of theLocations
	
	if ThisBlock contains "Time:" then copy (date (theDate & (ThisBlock as string))) to the end of theTimes
end repeat

set ThisTime to date (item 1 of theTimes as string)
set TheName to (item 1 of EventList as string)


tell application "iCal"
	try
		set this_calendar to (the first calendar whose title is "Events Cal")
		tell this_calendar to set this_event to make new event at the end of events with properties {start date:ThisTime, summary:TheName, description:(item 1 of EventList as string), location:(item 1 of theLocations as string)}
	on error
		make new calendar at end of calendars with properties {title:"Events Cal"}
		display dialog "Calendar was created; run script again"
	end try
end tell

Still to come… grab all events on page, repeat entire process on next page :smiley:

SC

thanks guys you are awesome! :smiley: i am extremely impressed with your skill (and generosity!)

good thought sitcom, your script seems to work great! if you get around to making it repeat with the next event/date, here are a few other little changes/additions I noticed:

  1. The location field is not always present for every event on the site; see March 3, 2005 for an example http://intranet.risd.edu/ContentAdministrator/calendar/view/day.asp?month=3&day=3&year=2005. So when the script runs on a page like that, it puts the second event’s location as the first event’s in iCal, since it finds the second event’s as the first instance of “Location” on the page. Maybe scripting it to find “Location” relative to the “Date” field, so that if there was no “Location” right before the “Date,” it would just skip it and leave the location for that iCal event blank. Also on this note, sometimes there is the word “location” or “date” or “time” in the description of the event, so I don’t know if the script would find that instead and mess it up.

  2. Currently the script makes the “Notes” field of the event in iCal the same as the “Title.” Maybe another variable could be added to the script that would take the text of the description of the event from the website and put it in the “Notes” field in iCal.

  3. The end time is currently created with the same time as the start time in iCal, so it looks like the event lasts no time at all! I think you mentioned this in your last post, sitcom, but the language was starting to get a little complicated for a newbie like me :rolleyes:!

Even though you are a newb you can still participate in the delelopment process, and by the time we’re done you won’t be a newb anymore!
Off to work with me…
SC

Understand, it’s not the RE that’s inherently unsafe, it’s the whole HTML scraping process. (HTML scraping is an evil kludge. Google around for discussions of the issues involved.) A 20-line vanilla AS solution will be just as ugly and fragile; the difference is that the RE is a lot quicker and simpler to write (took me one minute to write that one), and if it breaks because something in the HTML data isn’t quite as anticipated (see below) then it will be much quicker and simpler to fix as well. REs are great for this sort of quick-n-dirty work because they help you concentrate all this nastiness into a smaller percentage of your overall script.

Being certain your script handles every corner case correctly is one of the main hassles in HTML scraping. Anyway, that particular problem is easily dealt with:

tell application "TextCommands"
	set allMatches to search htmlTxt for "
			<b>(.*?)</b> # title
			.*?
			<td\\swidth=\"500\"><pre>
			<font\\sface=\"verdana\"\\ssize=\"2\">(.*?)</font> # description
			.*?
			(?:<span\\sclass=\"gray\">Location:.*?</span>(.*?)<br>
			.*?)? # location (optional)
			<span\\sclass=\"gray\">Date:.*?</span>(.*?)<br> # date
			.*?
			<span\\sclass=\"gray\">Time:.*?</span>(.*?)&nbsp;-&nbsp;(.*?)<br> # start and end times
			" with regex and comments allowed -- (RE reformatted for better readability)
end tell

sorry sitcom, i realize that sounded too assuming and directed specifically at you. i didn’t mean to imply that, nor that i wasn’t fiddling around with this myself. my apologies

I’ve been fooling around with hhas’s script, but i’m trying to figure out, if i get the html via curl, how i set that to a variable (which would be htmlTxt in hhas’s example) so it can be searched with TextCommands. Also, does that very simple “(optional)” truly make TextCommands not find other instances of “Location” in the html?

then, leaving the html world and back in sitcom’s script, i tried adding something that would make it copy nothing (“”) to theLocations if ThisBlock “does not contain “Location:”” but it seemed to copy nothing every time, not only when there was no location listed. i guess i need some kind of “else” type of command instead, but i dont know what that is. I also tried adding a line just like the second line in which the EventList variable is set, but with a “NoteList” variable defined by the use of the font Verdana (not bold like the titles). Unfortunately, despite finding the correct text, it also finds a bunch of blank paragraphs for all the line breaks, both before and among the text from the descriptions on the site. So trying to make the iCal description field item 1 of NoteList leaves me with two blank lines. Somehow I’ll have to consolidate that text…I can see how TextCommands could come in handy there, but I don’t know how to tell it which empty lines to get rid of and which to not. i’ll keep working on that…

findSubPageLinks()

on findSubPageLinks()
	set theHTML to do shell script "curl [url=http://intranet.risd.edu/ContentAdministrator/calendar/view/calendar.asp]http://intranet.risd.edu/ContentAdministrator/calendar/view/calendar.asp"[/url]
	tell application "TextCommands"
		set foundLinks to search theHTML for "/ContentAdministrator/calendar/view/view_event\\.asp\\?event=\\d+" with regex -- extract URLs (quick-n-dirty)
	end tell
	repeat with linkRef in foundLinks
		set theURL to "http://intranet.risd.edu" & linkRef
		processSubPage(do shell script "curl " & theURL, theURL)
	end repeat
end findSubPageLinks


on processSubPage(theHTML, theURL)
	tell application "TextCommands"
		set allMatches to search theHTML for "
<b>(.*?)</b>
.*?
<td.*?><pre>
<font.*?>(.*?)</font>
.*?
(?:Location:&nbsp;&nbsp;</span>(.*?)<br>
.*?)?
Date:&nbsp;&nbsp;</span>(.*?)<br>
.*?
Time:&nbsp;&nbsp;</span>(.*?)&nbsp;-&nbsp;(.*?)<br>
" with regex and comments allowed -- extract entries (quick-n-dirty)
	end tell
	if allMatches is {} then
		log "Warning: couldn't extract entries from " & theURL
	end if
	repeat with matchRef in allMatches
		set {theTitle, theDescription, theLocation, theDate, startTime, endTime} to matchRef
		processEntry(theTitle, theDescription, theLocation, theDate, startTime, endTime)
	end repeat
end processSubPage


on processEntry(theTitle, theDescription, theLocation, theDate, startTime, endTime)
	-- cleanup HTML (quick-n-dirty)
	tell application "TextCommands"
		set theTitle to decode HTML theTitle
		set theDescription to decode HTML theDescription
		set theLocation to decode HTML theLocation
	end tell
	-- build date objects (quick-n-dirty)
	set startDate to date (theDate & " " & startTime)
	set endDate to date (theDate & " " & endTime)
	-- TEST
	log {theTitle, theDescription, theLocation, startDate, endDate}
	log
end processEntry

That’s just a comment (everything from the ‘#’ symbol up to the next linefeed character). The bit that does the actual work is the ‘(?:…)?’ - the ‘(?:…)’ indicate a group of characters and the ‘?’ after it indicates it’s optional. The regular expression language used by TextCommands is documented here:

http://python.org/doc/2.3.5/lib/re-syntax.html

BTW, one other suggestion: ask the site maintainers if they could also publish their events calendar in industry-standard .ics format, which calendar apps such as iCal can subscribe directly to. That’d save yourself and no doubt other folks quite a bit of hassle.

I have no idea what your talking about. :stuck_out_tongue:
Well, I did have a nicely returned list of lists for your variables (now included end time of event, removed the line breaks). Then we found a page that doesn’t fit the format! I still think parsing the text rather than html on this particular task is ok. The final limitation is the page formatting in either case. Either way produces a routine that looks for consistent formatting. Without that, its back to the HTML soup…
Variables to be passed to iCal in a list of lists:


tell application "Safari"
	set InitialEventList to (paragraphs of text of document 1 whose font is "Verdana-Bold")
	set PageContent to text of document 1
end tell



set EventList to {}

repeat with TheEvent in InitialEventList
	if (TheEvent as string) is not (ASCII character 10) then copy (TheEvent as string) to the end of EventList
end repeat

set TextBlocks to paragraphs of PageContent

set theLocations to {}
set theTimes to {}
set EndTimes to {}


repeat with ThisBlock in TextBlocks
	if ThisBlock contains "Date:" then set theDate to ThisBlock as string
	
	if ThisBlock contains "Location:" then copy ThisBlock as string to the end of theLocations
	
	if ThisBlock contains "Time:" then
		copy (date (theDate & (ThisBlock as string))) to the end of theTimes
		set AppleScript's text item delimiters to "-"
		set EndTime to text item 2 of ThisBlock as string
		set AppleScript's text item delimiters to ""
		copy EndTime to the end of EndTimes
	end if
	
	
	
end repeat

return {theDate, EventList, theLocations, theTimes, EndTimes}


{"Date: 8/7/2005 ", {"Always on Sunday: This land is our land!
", "Artist on Call: Rachel Paiewonsky
", "Life in Our Eyes: The Art of Media and Social Chan
"}, {"Location: The RISD Museum ", "Location: The RISD Museum ", "Location: RISD Museum “}, {date “Sunday, August 7, 2005 2:30:00 PM”, date “Sunday, August 7, 2005 1:30:00 PM”, date “Sunday, August 7, 2005 1:30:00 PM”}, {” 03:30 PM ", " 02:30 PM ", " 3:30pm "}}

Now considering future pages with any missing variable…
SC

:lol:

!!! :o could you post the url of the page you found that doesn’t have the standard formatting?
i agree with avoiding the html route…thought i’d get the most i could out of exploring it in the meantime!

hhas–thanks for the info. believe me i’d be more than happy if they’d just publish the calendar as an .ics! :slight_smile: I’m hoping to publish the calendar this script will generate on icalx.com to save those other folks the hassle

I was referring to the link you posted that has no “Location” in it. That caused me to have to take a different approach, which ended up working better that the original concept. Instead of building lists of events and thier properties, I send each event & proplist to iCal. If there is no location, I use “To be announced”. However, the location of that post is in the text. That’s far too random to parse. The times became an issue if an event goes past midnight- had to “build” the end date, otherwise you get an error. I scripted to add a day, but not if its the last day of the month (31 would become 32). How specific do you want to get?
Play with this and tell me if you get errors… I had none. Once it’s a little more dialed in (I know you want to add more fields) we will add the final repeat that takes the script through all the calendar links.



--Extract bold typed headings from and text of document
tell application "Safari"
	set InitialEventList to (paragraphs of text of document 1 whose font is "Verdana-Bold")
	set PageContent to text of document 1
end tell

--Parse event list
set EventList to {}
repeat with TheEvent in InitialEventList
	if (TheEvent as string) is not (ASCII character 10) then copy (TheEvent as string) to the end of EventList
end repeat

--Build event data list
set TextBlocks to paragraphs of PageContent


set NoLocation to "To be announced"
set CurrentItem to 1
--Order of List is Event name, Location, Date and start time, End time

set CurrentEventData to {}
repeat with ThisBlock in TextBlocks
	--If no location is found before the date, thelocation is not set
	if ThisBlock contains "Location:" then set theLocation to (ThisBlock as string)
	
	--set Location routine. "Date" is the flag to start list since "Location" is not always present
	if (ThisBlock as string) contains "Date:" then
		--copy the  Event name
		copy (item CurrentItem of EventList) to the end of CurrentEventData
		--copy location or no location
		try
			copy theLocation to the end of CurrentEventData
		on error
			copy NoLocation to the end of CurrentEventData
		end try
		--Will grab next event with increment
		set CurrentItem to CurrentItem + 1
		--Create date string variable
		set theDateString to (ThisBlock as string)
	end if
	
	--Next is date
	if ThisBlock contains "Time:" then
		set theDate to (date (theDateString & (ThisBlock as string)))
		copy theDate to the end of CurrentEventData
		
		--Then the end time 
		set AppleScript's text item delimiters to "-"
		set StartTime to text item 1 of ThisBlock as string
		set EndTime to text item 2 of ThisBlock as string
		--The case of an event occuring past midnight
		if StartTime contains "PM" and EndTime contains "AM" then
			
			set AppleScript's text item delimiters to "/"
			
			set theMonth to (text item 1 of theDateString)
			set NewDay to (text item 2 of theDateString) + 1
			set theYear to (text item 3 of theDateString)
			
			set AppleScript's text item delimiters to ""
			--Correct the day in the datestring
			set theDateString to theMonth & "/" & NewDay & "/" & theYear
			
		end if
		
		set EndTime to (date (theDateString & (EndTime)))
		copy EndTime to the end of CurrentEventData
		copy "----" & return to the end of CurrentEventData
	end if
	
	set currentcount to (count items of CurrentEventData) as number
	--When the 5 properties fill the list, send them to ical for processing
	if currentcount is 5 then
		
		tell application "iCal"
			try
				set this_calendar to (the first calendar whose title is "Events Cal")
				tell this_calendar to set this_event to make new event at the end of events with properties ¬
					{summary:(item 1 of CurrentEventData), location:(item 2 of CurrentEventData), start date:(item 3 of CurrentEventData), end date:(item 4 of CurrentEventData)}
				
			on error
				make new calendar at end of calendars with properties {title:"Events Cal"}
				set this_calendar to (the first calendar whose title is "Events Cal")
				tell this_calendar to set this_event to make new event at the end of events with properties ¬
					{summary:(item 1 of CurrentEventData), location:(item 2 of CurrentEventData), start date:(item 3 of CurrentEventData), end date:(item 4 of CurrentEventData)}
			end try
		end tell
		set CurrentEventData to {}
	end if
	
end repeat

SC

wow sitcom, truly beautiful! :smiley:
the only error i’m getting is still with the events without a location listed. i don’t understand what’s causing it, but instead of putting “To be announced” it seems to put the same location of the last event that was run through the script. so i ran it on March 4, 2005 (http://intranet.risd.edu/ContentAdministrator/calendar/view/day.asp?month=3&day=4&year=2005) and it listed the location for the second event “Freshman Declare” as “Wheeler Gym,” the location of the previous event. It works when the only event with no location is the first on the page (March 3) but if it’s after that, it puts the previous event’s location.
the only other field I want to include is the detailed description of the events from the website in the notes field of the iCal event.
very nice solution with the added date when it runs past midnight! definitely don’t worry about the last day of the month, i’m sure people will be able to figure it out if that rare occasion arises. :slight_smile: same goes with a location that’s listed in the event’s description–lol, that would be ridiculous!

tell application “iCal”
try
----iCal instructions
end try
end tell
set CurrentEventData to {}–This cleared the info of each event on each repeat
set theLocation to “”–I neglected to add this (which is why it grabbed the first location and kept it), which clears the location on each repeat
end if

iCal notes: “Summary” is the name at the top (why not “Name”?), “Description” is what I was calling the summary and giving mixed fields.
This should save you a ton of typing! So then, prepare to be dazzled-


--Extract bold typed headings from and text of document
tell application "Safari"
	set InitialEventList to (paragraphs of text of document 1 whose font is "Verdana-Bold")
	set PageContent to text of document 1
	set InfoParagraphs to attribute run of text of document 1
end tell

--Parse event list
set EventList to {}
repeat with TheEvent in InitialEventList
	if (TheEvent as string) is not (ASCII character 10) then copy (TheEvent as string) to the end of EventList
end repeat

--Build event data list
set TextBlocks to paragraphs of PageContent


set NoLocation to "To be announced"
set CurrentItem to 1
--Order of List is Event name, Location, Date and start time, End time

set CurrentEventData to {}
set CurrentEventcount to 1

repeat with ThisBlock in TextBlocks
	--If no location is found before the date, thelocation is not set
	if ThisBlock contains "Location:" then set theLocation to (ThisBlock as string)
	
	
	--set Location routine. "Date" is the flag to start list since "Location" is not always present
	if (ThisBlock as string) contains "Date:" then
		--copy the  Event name
		copy (item CurrentItem of EventList) to the end of CurrentEventData
		--copy location or no location
		try
			copy theLocation to the end of CurrentEventData
		on error
			copy NoLocation to the end of CurrentEventData
		end try
		--Will grab next event with increment
		set CurrentItem to CurrentItem + 1
		--Create date string variable
		set theDateString to (ThisBlock as string)
	end if
	
	--Next is date
	if ThisBlock contains "Time:" then
		set theDate to (date (theDateString & (ThisBlock as string)))
		copy theDate to the end of CurrentEventData
		
		--Then the end time 
		set AppleScript's text item delimiters to "-"
		set StartTime to text item 1 of ThisBlock as string
		set EndTime to text item 2 of ThisBlock as string
		--The case of an event occuring past midnight
		set AppleScript's text item delimiters to ""
		if StartTime contains "PM" and EndTime contains "AM" then
			
			set AppleScript's text item delimiters to "/"
			
			set theMonth to (text item 1 of theDateString)
			set NewDay to (text item 2 of theDateString) + 1
			set theYear to (text item 3 of theDateString)
			
			set AppleScript's text item delimiters to ""
			--Correct the day in the datestring
			set theDateString to theMonth & "/" & NewDay & "/" & theYear
			
		end if
		
		set EndTime to (date (theDateString & (EndTime)))
		copy EndTime to the end of CurrentEventData
		
	end if
	
	set currentcount to (count items of CurrentEventData) as number
	--When the 4 properties fill the list, send them to ical for processing
	
	if currentcount is 4 then
		
		set CurrentInfo to {}
		
		set CurrentEvent to (item CurrentEventcount of EventList) as string
		set copyFlag to false
		
		
		repeat with ThisParagraph in InfoParagraphs
			
			if (ThisParagraph & (ASCII character 10)) contains CurrentEvent and (ThisParagraph as string) is not (ASCII character 10) then
				set copyFlag to true
			else if the copyFlag is true and ThisParagraph contains "Date:" then
				exit repeat
			end if
			
			if copyFlag is true then copy ThisParagraph as string to the end of CurrentInfo
		end repeat
		
		set CurrentEventcount to CurrentEventcount + 1
		
		copy (CurrentInfo as string) to the end of CurrentEventData
		
		tell application "iCal"
			try
				set this_calendar to (the first calendar whose title is "Events Cal")
				tell this_calendar to set this_event to make new event at the end of events with properties ¬
					{summary:(item 1 of CurrentEventData), location:(item 2 of CurrentEventData), start date:(item 3 of CurrentEventData), end date:(item 4 of CurrentEventData), description:(item 5 of CurrentEventData)}
			on error
				make new calendar at end of calendars with properties {title:"Events Cal"}
				set this_calendar to (the first calendar whose title is "Events Cal")
				tell this_calendar to set this_event to make new event at the end of events with properties ¬
					{summary:(item 1 of CurrentEventData), location:(item 2 of CurrentEventData), start date:(item 3 of CurrentEventData), end date:(item 4 of CurrentEventData), description:(item 5 of CurrentEventData)}
			end try
		end tell
		set CurrentEventData to {}
		set theLocation to ""
	end if
	
end repeat

The event name (Summary) and the location end up in the info (Description) with the event information. I can still parse further to get them out, but I think its ok to have the name and location listed in the info field too? Test the hell out of this to see if it produces errors on cases we haven’t found yet. It will still be better for you to edit a few fields on random errors than to type out all the dates. Once your satisfied we will of course add the loop that takes it through desired time periods.
SC