Parsing date in time in a block of text

Does anyone have any good ideas how to search for a date and time in a large block of text, like an email? I have some reasonable script that recognizes standard apple date formats as well as today, tomorrow, this friday, next thu, etc. so now I want it to be able to pick out the date from a random bunch of text.

I was thinking of trying this code on ever group of 2 or 3 words in the file, and if it works, to assume that’s a date. Then I’d look for a “PM” or “AM” or “:” or “o’clock” or “at” in the 5 words or so before and after where the date was found. But if anyone has other ideas, that’d be much appreciated. I’d love to get something working like gmail where it recognizes date in your text for adding to the calendar.

Here’s the date stuff I have so far:

property wdays : {"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"}

set dateset to false
set pretext to ""
set todayd to my dayNumber((weekday of (current date)) as string)

repeat while not dateset
	set datet to text returned of (display dialog pretext & "What date do you want to set this event for?" default answer "tomorrow" with title "Event Date")
	set errdate to false
	set foundd to my dayNumber(datet)
		set foundd2 to my dayNumber(word 2 of datet)
	on error
		set foundd2 to -1
	end try
	if datet is "tomorrow" then
		set mydate to (current date) + days
	else if datet is "today" then
		set mydate to (current date)
	else if foundd is not -1 then
		set mydate to (current date) + days * ((foundd - todayd + 7) mod 7)
	else if foundd2 is not -1 and word 1 of datet is "next" then
		set mydate to (current date) + days * (((foundd2 - todayd + 7) mod 7) + 7)
			set mydate to date datet
		on error
			set pretext to "That was not a valid date format. Try something like 9/18, tuesday, next friday, or Sept 18.
			set errdate to true
		end try
	end if
	set dateset to not errdate
end repeat
log mydate
set timet to text returned of (display dialog "What time do you want to set this event for?" default answer "12:00 PM" with title "Event Date")
set timed to (date ("9/9/09 " & timet))
log timed
set time of mydate to time of timed
log mydate

on dayNumber(day)
	set foundd to -1
	repeat with whichd from 1 to count of wdays
		ignoring case
			if day is item whichd of wdays then set foundd to whichd
		end ignoring
	end repeat
	if foundd > 7 then set foundd to foundd - 7
	log day & " = " & foundd
	return foundd
end dayNumber

That’s a formidable task because the emails will not necessarily use any reasonable format for a date and you’re not sure whether the date reference is an event to be recorded or something else:

  1. … if you recall, in last September’s meeting, we …
  2. … is next Tuesday ok for you?
  3. … Thank God it’s Friday. I couldn’t take another week like this one.
  4. … can we schedule a meeting sometime in the week before the October Sales Meeting? I really …
  5. … He’s away just now - left two days ago for an 18-day cruise.
  6. … It’s been a blue Monday. I hope …
  7. … Can we meet at Ruby Tuesday’s? It’s just outside the mall door.
  8. … Tuesday’s Child is full of grace.
  9. … When is Octoberfest this year?
  10. … No, there are only 30 days in June.

Thanks for the reply Adam. Yes, that’s true, there may be a lot of other date references, but I am only worried about trying to recognize them when it’s clear and the user invokes my script to create a new event in iCal, so it will limit the search to dates that must be in the text, and in the future.

Any ideas/help there?

As an indicator of what a task it would be, look at this for just one instance: a weekday preceded by next. Even then, it’s sort of wrong, because the average person referring to a Thursday this week wouldn’t say ‘next’, they would say ‘this’, but the script would work as we expected if the message was for next Monday. We’d have to add a check to see if Thursday was in this week after today. You see how it builds up? The handler just finds the next weekday of a given name after today, not the one in the next week. English can be rather imprecise.

property tDays : {Wednesday, Thursday, Friday, Saturday, Sunday, Monday, Tuesday} -- no quotes
property astid : AppleScript's text item delimiters

set emText to "Hi, John;
Sure, next Thursday at 2 is fine with me.

repeat with aWD in tDays
	if (aWD as text) is in emText then
		set AppleScript's text item delimiters to aWD as text
		if last word of text item 1 of emText is "next" then
			set theWkDay to getNextWkDay(aWD as text, current date)
			set AppleScript's text item delimiters to astid
			exit repeat
		end if
	end if
	set AppleScript's text item delimiters to astid
end repeat

to getNextWkDay(wkDay, theDate) -- adapted from a script by Nigel Garvey
	set keyDate to date "Wednesday, January 1, 1000 12:00:00 AM"
	repeat with k from 1 to 7
		if item k of tDays as string = wkDay then
			set wkd to item k of tDays
			exit repeat
		end if
	end repeat
	set tWD to keyDate + (k - 1) * days
	return theDate - (theDate - tWD) mod weeks + weeks
end getNextWkDay


Hey Adam,

Thanks for the code snippet. I’m not so worried about the semantics of “next thursday”. I can leave that up to the user, as they will be able to verify each detected date, but I was more asking what everyone thought was a good way to go about detecting valid dates in a text block- either as you have done by looking for weekdays, months, “/”'s as in 9/24/06 etc. or
running the code I had above on all of the 1, 2, and 3 word groups from the text block until you find a valid date.