-- Working With Message Source to Archive HTML Emails

Hello MacScripters,

I’ve been working on a script that utilizes’s AppleScript functionality to retrieve the raw source of messages – and then to archive HTML e-mails into Evernote.

Frankly, getting clean HTML out of Mail has been one of the more frustrating AppleScript projects that I have ever undertaken and I wanted to reach out to the community to ask how others are transferring rich HTML content from to other applications through AppleScript? (FWIW – I’d really like to steer clear of any GUI scripting or reliance upon external script additions to maximize portability of my code.)

Let me tell you what I’ve done so far: I’ve developed a simple HTML parser to extracts HTML from a multipart e-mail message. It’s not perfect, but it is fairly reliable and I feel that it will be even more so after I spent some time refining it. (I would really love to see how different people approach this, so feel free to share any examples of different ways of doing this!)

The most frustrating aspect of this for me has been dealing with HTML encoded characters. When I look at the raw source from many e-mail messages, I can see hex values for certain characters which Mail signifies with an “=” rather than the “%” that I’ve seen in other URL-encoded strings (i.e., = 20, = 0A, etc.). Moreover, some characters are “multibyte” encoded characters which look more like this: “=E2=80=A2”.

I have tried few different approaches to do URL decoding with little success. I found some examples utilizing Shell scripting and Perl or PHP, but I could not get them to work reliably. As a temporary hack, I have been using AppleScript’s text item delimiters to swap out some URL encoded strings with their Unicode character:

set AppleScript's text item delimiters to "=E2=80=A2"
set theSourceItems to text items of theSource
set AppleScript's text item delimiters to "¢"
set theSource to theSourceItems as text

This works and most of my messages are now archiving perfectly into Evernote. However, I can’t imagine doing this type of substitution for the entire character set and feel like there is a much easier way to do this!

I’m really hoping that, despite AppleScript’s known shortcomings when dealing with HTML code, that there is a simple solution that I’ve missed somehow!

So how are you transferring rich HTML content from through AppleScript?

Any help or insights you might offer would be deeply appreciated – and I thank you in advance for sharing your thoughts and your experiences.