Decode text returned from OSX Mail Raw Source email

I have a script to extract a section of text from the the raw source of an email in OSX Mail. That works fine.

However the text in the raw source email is encoded, two examples below, and is returned by the script in the same format

That's great thank you, I've just replied

It hasn=E2=80=99t been available

Is there a way I can get the applescript to convert/decode the returned text?

Many thanks in advance.

To confirm the html entity character, use character id along with the number, like so:

character id 39
--> "'"

This first part of the script will replace the &#xx; strings with the appropriate character.

set reallyBadUTF to "That's great thank you, I've just replied."

set bad to {"'", "'"}

set text item delimiters to bad
set reallyBadUTF to text items of reallyBadUTF as text
--> "That's great thank you, I've just replied.

This additionally deals with the bad url encoding:

use framework "Foundation"
use scripting additions

set reallyBadUTF to "That's great thank you, I've just replied.

It hasn=E2=80=99t been available."

set text item delimiters to {"'", "'"}
set reallyBadUTF to text items of reallyBadUTF as text -- replace html entities
--> "That's great thank you, I've just replied.

-- It hasn=E2=80=99t been available." 

set text item delimiters to {"%", "="}
set badUTF to text items of reallyBadUTF as text -- replace url encoding 
set output to urlDecode(badUTF)
--> "That's great thank you, I've just replied.

-- It hasn’t been available."

-- url decode handler
on urlDecode(input)
	tell current application's NSString to set rawUrl to stringWithString_(input)
	set theEncodedURL to rawUrl's stringByRemovingPercentEncoding -- 4 is NSUTF8StringEncoding
	return theEncodedURL as «class utf8»
end urlDecode

If there are other problematic html entities —eg the ampersand— then you can modify the first part so it looks like this and it will cycle through and replace them.

set reallyBadUTF to "That's great thank you, I've just replied."

set badEntities to {"'", "&"}
set goodEntities to {"'", "&"}

repeat with bg from 1 to count of badEntities
	set text item delimiters to item bg of badEntities
	set reallyBadUTF to text items of reallyBadUTF
	set text item delimiters to item bg of goodEntities
	set reallyBadUTF to reallyBadUTF as text
end repeat
--> "That's great thank you, I've just replied"

NB I found the really nice urlDecode handler thanks to chrillek on the Devonthink forum:

(ePub and Copy with Source Link - #16 by chrillek - Feedback - DEVONtechnologies Community).

1 Like

Hi.

Here’s a similar idea performed entirely in ASObjC. I’m not sure how generally effective it’ll be with mikeytttt’s e-mails, but it attempts to avoid some of the possible pitfalls.

The Xcode documentation’s rather ambiguous about which framework(s) to use for which NSAttributedString method, but this works for me:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
-- use scripting additions

set input to "That's great thank you, I've just replied
I'm 100% certain — well, 99.9% certain anyway.
It hasn=E2=80=99t been available"

set |⌘| to current application
set input to |⌘|'s class "NSString"'s stringWithString:(input)
-- Replace any line breaks in the text with HTML versions.
set input to input's stringByReplacingOccurrencesOfString:("\\R") withString:("<br/>") ¬
	options:(|⌘|'s NSRegularExpressionSearch) range:({0, input's |length|()})
-- Convert the "HTML" string to data and thence to an attributed string.
set HTMLData to input's dataUsingEncoding:(|⌘|'s NSUTF8StringEncoding)
set attributedString to |⌘|'s class "NSAttributedString"'s alloc()'s initWithHTML:(HTMLData) ¬
	documentAttributes:(missing value)
-- Extract a string with the HTML entities interpreted.
set output to attributedString's |string|()
-- Percent encode any existing percent signs in the string.
set output to output's stringByReplacingOccurrencesOfString:("%") withString:("%25")
-- Convert any apparent "equals" encoding to percent encoding.
set output to output's stringByReplacingOccurrencesOfString:("=([0-9A-Fa-f]{2})") withString:("%$1") ¬
	options:(|⌘|'s NSRegularExpressionSearch) range:({0, output's |length|()})
-- Interpret the percent encoding and return.
return (output's stringByRemovingPercentEncoding()) as text
1 Like