Using MULTILINE with Satimage.osax

mrtoner · September 11, 2013, 1:59am

Is there a bug in Satimage.osax or am I just not understanding something? I have this AppleScript:

â

set regex_flags to {"MULTILINE"}
set theContact to grep("^Client Name:\\s+(.*)\\n", theContent, "\\1", {})
set theContactEmail to grep("^Email:\\s+(.*)\\n", theContent, "\\1", {})
set thePackage to grep("has booked\\s+(.*)\\s+with", theContent, "\\1", {})
set theAddress to grep("at\\s+(.*)\\s+on", theContent, "\\1", {})
set theUser to grep("with\\s+(.*)\\s+at", theContent, "\\1", {})
set theTime to grep("on\\s+(.*)\\.\\s+$", theContent, "\\1", {})
set theNote to grep("^Email:\\s+(.*)\\n(.*)", theContent, "\\2", regex_flags)

on grep(findThis, inThis, returnThis, regex_flags)
	try
		return find text findThis in inThis using returnThis regexpflag regex_flags with regexp and string result
	on error errMessage number errNumber
		if errNumber is equal to -2763 then
			return ""
		end if
	end try
end grep

This works fine except for theNote. I’m parsing an email message with content similar to this

John Smith has booked Photo Package 1 with Jane at 123 Main St, Los Angeles 90001 on 09.12.13 3:00 PM.

Client Name: John Smith
Email: john@example.com
Lockbox Code: tenant Elmo will be there
Status: rental
Contacts Name: Sam
Contacts Number: 555-999-0000
Notes: Thank you:)

theNote is set to “Lockbox Code: tenant Elmo will be there” if I don’t use {“MULTILINE”}; when I do use that, however, theNote is “”. How do I get all the text following the Email line?

Nigel_Garvey · September 11, 2013, 9:06am

What do you want to get? The “Notes:” entry or everything after the “E-mail” line? Your script returns just the entire “Notes:” line on my machine.

Edit: The “." grep pattern is normally “greedy”, so in “MULTILINE” mode, one would expect it to read right to the end of the text before back-tracking to find whatever’s written after it. I expect that your version of the text ends with a linefeed, so the "(.)\n” in the offending grep skips over everything to the end of the text, then winds back to the linefeed at the end of it. The second memory capture then picks up the empty insertion point after this, which is what’s returned. My text doesn’t end with a linefeed, so after skipping to the end of the text, the regex process winds back to the linefeed between the last two lines and the last line is what’s captured and returned.

This returns everything after the “Email:” line:

set theNote to grep("^Email:\\s+[^\\n]*\\n(.*)", theContent, "\\1", regex_flags)

Or perhaps make the “.*” “lazy”:

set theNote to grep("^Email:\\s+.*?\\n(.*)", theContent, "\\1", regex_flags)

mrtoner · September 11, 2013, 7:51pm

Right, everything after the “Email” line (including the “Notes” line). I was only getting the first line after “Email”.

Greedy always gets me, but making it non-greedy (your second example) didn’t work. Your first example does, though – thanks a lot!