Extracting URL from Mail message

I receive fairly regular messages which contain a variable URL. I’d like to capture this via AppleScript to then process it further.

I’ve added the SatImage OSAX so that I can use the “find text” directive to find the line the URL is located on; however, I am stuck on actually capturing the URL that would be found.

If anyone can assist, I’d be grateful.

Thanks,

Des

Hello.

You don’t need anything fancy, text item delimiters set to http and https will give you lists with more than 1 item if the protocol is there.

The endling of the url are gotten with text item delimters like space and < whatever other delimiter has been chosen in your text.

You grab that text, from the second operation and put the protocol back on in front, and voìla you have the url.

1 protocol at a time does the trick, and don’t forget to set back AppleScript’s text item delimiters when you are done.

Thanks for your reply. I’m a bit lost - I have the script able to find the non-variable part of the URL, but I can’t figure out how to extract the variable part (after the /, basically) so that I can concatenate the two components.

I’d welcome any assistance on how best to achieve that.

Thanks,

Des

Hello.

Could you please show what you have got so far, there is several ways to do this.

Thanks again for engaging with me on this.

I figured out a way to get where I wanted last night - I used “set theURL to (the offset of adURL in theMsgContent)” as the basis for extracting the URL and have been successful in completing it.

Thanks,

Des

I have a working script - almost.

When I tested initially via selecting a Mail message and running the script, it seemed to be working exactly as I wanted. Then I realized it wasn’t running via the rule I’d created. Having researched further, I see that for this to work, “on perform mail action” is required. I tried adding the appropriate code before the “tell” and after the “end tell” commands; however, the script then fails on the first line after the “end using terms from”, saying the variable “theURL” is undefined. It therefore seems the end of the routine is not causing the variables to be passed.

My working script is as follows. I’m assuming from what I’ve seen that it requires some surgery to enable it to run inside a rule - I’m somewhat stuck as I’m not seeing where I can adjust it (I tried the end of the on perform at the end of the main script, before the sub-routines; however, it returned only “Caption:”).

I’d appreciate it if one of the experts here could review and comment.

Thanks,

Des


set charcount_limit to 140
set login to "xxxxxx"
set api_key to "xxxxxxx"
set adURL to "xxxxxxx"
set adCaption to "Caption:"

tell application "Mail"
	
	set theMsgs to selection
	repeat with theMsg in theMsgs
		set theMsgContent to the content of theMsg
		set theURL to (the offset of adURL in theMsgContent)
		set theURL to the Unicode text theURL thru (theURL + 49) of theMsgContent
		set theCaption to ((the offset of adCaption in theMsgContent) + 21)
		set theCaption to the Unicode text theCaption thru (theCaption + 65) of theMsgContent
	end repeat
	
end tell

set theURL to encode_text(theURL, true, false)
set theCaption to trimWhiteSpace(theCaption)
set tid to AppleScript's text item delimiters
set AppleScript's text item delimiters to ASCII character 10 -- (a line feed)
set theCaption to text item 1 of theCaption -- not text of, text items of
set AppleScript's text item delimiters to tid -- whatever they were before - ALWAYS SET THEM BACK!

set bitly to "curl --stderr /dev/null \"http://api.bit.ly/v3/shorten?format=txt&longUrl=" & theURL & "&login=" & login & "&apiKey=" & api_key & "\""
set bitly to (do shell script bitly)
set tweet to theCaption & " - " & bitly

set charcount_tweet to (count characters of tweet)
if charcount_tweet ≤ charcount_limit then
	-- post to twitter
	set twitter_status to quoted form of tweet
	set twitter_parm to "status=" & twitter_status
	try
		set tweet_results to do shell script "twurl -d " & twitter_parm & " /1/statuses/update.xml"
	end try
end if

-- this sub-routine is used to encode text 
on encode_text(this_text, encode_URL_A, encode_URL_B)
	set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
	set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
	set the URL_B_chars to ".-_:"
	set the acceptable_characters to the standard_characters
	if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
	if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
	set the encoded_text to ""
	repeat with this_char in this_text
		if this_char is in the acceptable_characters then
			set the encoded_text to (the encoded_text & this_char)
		else
			set the encoded_text to (the encoded_text & encode_char(this_char)) as string
		end if
	end repeat
	return the encoded_text
end encode_text

on encode_char(this_char)
	set the ASCII_num to (the ASCII number this_char)
	set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
	set x to item ((ASCII_num div 16) + 1) of the hex_list
	set y to item ((ASCII_num mod 16) + 1) of the hex_list
	return ("%" & x & y) as string
end encode_char

on trimWhiteSpace(aString)
	if aString is not "" then
		-- setup for no delimiter
		set savedTextItemDelimiters to AppleScript's text item delimiters
		set AppleScript's text item delimiters to ""
		-- start with the tail end by revering the list
		set these_items to reverse of (every text item of aString)
		-- keep peeling off 1st space
		repeat while item 1 of these_items is space
			set these_items to rest of these_items
		end repeat
		-- flip the list, now do the leading characters
		set these_items to reverse of these_items
		repeat while item 1 of these_items is space
			set these_items to rest of these_items
		end repeat
		-- reconstruct the string
		set these_items to these_items as string
		-- restore and return
		set AppleScript's text item delimiters to savedTextItemDelimiters
		return these_items
	end if
end trimWhiteSpace

I’ve carried out further testing of my script, as well as the sample rule actions script.

The sample script (below) runs fine, displaying the dialog.

using terms from application "Mail"
	on perform mail action with messages theMessages for rule theRule
		tell application "Mail"
			set theText to "This AppleScript is intended to be used as an AppleScript rule action, but is also an example of how to write scripts that act on a selection of messages or mailboxes." & return & return & "To view this script, hold down the option key and select it again from the Scripts menu."
			repeat with eachMessage in theMessages
				set theSubject to subject of eachMessage
				try
					-- If this is not being executed as a rule action,
					-- getting the name of theRule variable will fail.
					set theRuleName to name of theRule
					set theText to "The rule named '" & theRuleName & "' matched this message:"
					set theText to theText & return & return & "Subject: " & theSubject
					display dialog theText
					set theText to ""
				end try
			end repeat
			if theText is not equal to "" then
				display dialog theText buttons {"OK"} default button 1
			end if
		end tell
	end perform mail action with messages
end using terms from

My script runs cleanly in the editor if I comment out the “using terms” and “on perform” statements. If I activate them, create a rule and send an email to test it, it doesn’t run.

I’ve tried to log any errors, but running from a rule doesn’t appear to write to the event log. Is there anything obvious that is causing it to fail?

using terms from application "Mail"
	
	on perform mail action with messages theMsgs
		
		tell application "Mail"
			
			set charcount_limit to 140
			set login to "xxxxx"
			set api_key to "xxxxxxxx"
			set adURL to "xxxxxxxxxxx"
			set adCaption to "Caption:"
			
			set theMsgs to selection
			repeat with theMsg in theMsgs
				set theMsgContent to the content of theMsg
				set theURL to (the offset of adURL in theMsgContent)
				set theURL to the Unicode text theURL thru (theURL + 49) of theMsgContent
				set theCaption to ((the offset of adCaption in theMsgContent) + 21)
				set theCaption to the Unicode text theCaption thru (theCaption + 65) of theMsgContent
				
			end repeat
			
			try
				set theURL to my encode_text(theURL, true, false)
				set theCaption to my trimWhiteSpace(theCaption)
				set tid to AppleScript's text item delimiters
				set AppleScript's text item delimiters to ASCII character 10 -- (a line feed)
				set theCaption to text item 1 of theCaption -- not text of, text items of
				set AppleScript's text item delimiters to tid -- whatever they were before - ALWAYS SET THEM BACK!
				
				set bitly to "curl --stderr /dev/null \"http://api.bit.ly/v3/shorten?format=txt&longUrl=" & theURL & "&login=" & login & "&apiKey=" & api_key & "\""
				set bitly to (do shell script bitly)
				set tweet to theCaption & " - " & bitly
				
				set charcount_tweet to (count characters of tweet)
				if charcount_tweet ≤ charcount_limit then
					-- post to twitter
					set twitter_status to quoted form of tweet
					set twitter_parm to "status=" & twitter_status
					try
						set tweet_results to do shell script "twurl -d " & twitter_parm & " /1/statuses/update.xml"
					end try
				end if
			end try
		end tell
		
	end perform mail action with messages
	
end using terms from

-- this sub-routine is used to encode text 
on encode_text(this_text, encode_URL_A, encode_URL_B)
	set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
	set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
	set the URL_B_chars to ".-_:"
	set the acceptable_characters to the standard_characters
	if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
	if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
	set the encoded_text to ""
	repeat with this_char in this_text
		if this_char is in the acceptable_characters then
			set the encoded_text to (the encoded_text & this_char)
		else
			set the encoded_text to (the encoded_text & encode_char(this_char)) as string
		end if
	end repeat
	return the encoded_text
end encode_text

on encode_char(this_char)
	set the ASCII_num to (the ASCII number this_char)
	set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
	set x to item ((ASCII_num div 16) + 1) of the hex_list
	set y to item ((ASCII_num mod 16) + 1) of the hex_list
	return ("%" & x & y) as string
end encode_char

on trimWhiteSpace(aString)
	if aString is not "" then
		-- setup for no delimiter
		set savedTextItemDelimiters to AppleScript's text item delimiters
		set AppleScript's text item delimiters to ""
		-- start with the tail end by revering the list
		set these_items to reverse of (every text item of aString)
		-- keep peeling off 1st space
		repeat while item 1 of these_items is space
			set these_items to rest of these_items
		end repeat
		-- flip the list, now do the leading characters
		set these_items to reverse of these_items
		repeat while item 1 of these_items is space
			set these_items to rest of these_items
		end repeat
		-- reconstruct the string
		set these_items to these_items as string
		-- restore and return
		set AppleScript's text item delimiters to savedTextItemDelimiters
		return these_items
	end if
end trimWhiteSpace

Hello.

Yes, you must log to a file, use the handler below, and call it with statements like:


my logit("starting up","mailrule")

You can then look for the mailrule.log in the sidebar of Console.app.


to logit(log_string, log_file)
	do shell script ¬
		"echo `date '+%Y-%m-%d %T: '`\"" & log_string & ¬
		"\" >> $HOME/Library/Logs/" & log_file & ".log"
end logit

Excellent! Thank you so much - this helped enormously. The script works as posted - the problem was with the Twitter API (or, rather, my understanding of it).

Regards,

Des

Hmmm - maybe I spoke too soon.

I’m getting inconsistent results from the script. This morning, the script ran on receipt of a message and the following was logged:

Running the script manually (having commented out the “on perform” code), it runs through and I see data at all the log points I added:

I therefore captured the email text and created a new message from one of my accounts, directed to the same address and with the same subject (which the rule uses). Again, it ran through to completion, creating log entries as above.

The log order for the first, failed, run of the script is incorrect. Is this likely to be the issue, that the script processing is running some steps in parallel instead of sequentially? I understood AppleScript ran one statement at a time, but that would not explain why “starting up” shows two seconds later than the statements which follow it.

Thanks,

Des

Hello.

I am sorry that I can’t help you, but I don’t use mail.app at the moment. For personal reasons, I have nothing against Mail.app.

If I were you, I’d put in even more log statements to be absolute sure that the starting up message you received is indeed from the first message, and not the second.

As long as you haven’t put in some “ignoring application responses block” I think everything that is within a mail rule, should be executed sequentially for that single message.

I believe I’ve found the problem - some of the emails include HTML tags and this seems to cause the script to blow out in the repeat loop.

I managed to get past the issue by truncating the area which includes the tags (they’re not required in my script). I’ve tested this and it seems to have stabilized things.

I have found another issue, though ;(

Some of the emails include a dollar value in the caption - when it is processed by the script, it strips the $ and the number to its right, so $80,000 appears as 0,000 in the resulting variable.

I’ve gone through Apple’s documentation on variables, but can’t see why the characters are being stripped.

Following up:

I’ve been trying to understand why the $ sign is being stripped; I came across this thread:

http://macscripter.net/viewtopic.php?id=37502

The oddity is that I have no shell script activity, so I’m wondering if this is an issue related to how the mail message is encoded. I therefore set up a simple script including a $ value and tested it - no problem found at all; the display dialog shows $64,000 as expected.

tell application "Mail"
	set MsgContent to "This is a sentence. This is the $64,000 question.
	This is the next line"
	set adCaption to "This is the $64,000 question"
	set theCaption to (the offset of adCaption in MsgContent)
	set theCaption to the rich text theCaption thru (theCaption + 29) of MsgContent
	display dialog theCaption
end tell

So, if it is the inbound email encoding that is the issue, is there a way of forcing it to a valid encoding that won’t cause the stripping?

Thanks,

Des