How to get formatted content of Mail.app email?

johncatalano · January 21, 2023, 3:56am

I want to get the formatted content of an email message from Mail.app so that I can feed it to wkhtmltopdf or princexml via stdin.

I know I can get the body of an email message using “content” or “source”.

I think the best way is to get the contents as raw HTML. Is this possible? Is it possible to get the body of an email, as HTML?

ccstone · January 22, 2023, 2:26am

Nope. You’d have to get the source and parse it, unless there’s some magic spell to be found in AppleScriptObjC.

@Shane_Stanley?

Shane_Stanley · January 22, 2023, 2:43am

Yeah, you’d need to parse the source, I think.

bwill · January 26, 2023, 2:21pm

Developer of Mail Archiver here, which archives emails to a database or PDF. There is a lot of work involved in making html out of emails.

johncatalano · January 28, 2023, 2:44am

I’m curious… which app?
This one?: ‎Mail Archiver X Easy on the Mac App Store

This one?: https://emailarchiverpro.com

bwill · January 28, 2023, 10:58am

I wasn’t sure if I could mention my app so I didn’t use the name. Mail Archiver X Easy can print to PDF but it can’t create PDFs in batch. This can only be done with the main version Mail Archiver X.

Doing PDFs is not that hard when the html exists. Creating the html out of the email parts is the real problem.

ionah · January 28, 2023, 11:25am

I find this discussion very interesting.
I’m trying to make PDFs from HTML files with AppleScriptObjC but what I’m getting is a total mess!

@johncatalano are you using the tools you’re mentioning in your first post?
Are they reliable?

@bwill do you mean you’re able to convert an HTML file to PDF using AppleScriptObjC or Applescript?

bwill · January 28, 2023, 11:53am

I get data from Mail with AppleScript where I have to. That means the accounts and the mailboxes. Because for Gmail all emails are in a single mailbox “All Mails” I need to get the header data for all emails.

The rest of my app is done in Xojo. Creation of pdf out of html is done by using macOS functions after loading the html into a html viewer.

WKHtmlToPDF works fine, but it’s only available for Intel and an ARM version is unlikely. I found a Python library a while ago which creates only very simple pdfs. Other tools to create pdfs out of html are super pricey like Prince.

KniazidisR · January 28, 2023, 1:00pm

Hello,

One of the solutions could well be considered the installing of a ready-made application like the one mentioned above.

For those who would like to get to the bottom of everything with their brains, I confirm Shane Stanley’s remark that the HTML of the message body can be obtained by normal AppleScript parsing. Because, the message source consists of headers followed by a null line (i.e., two contiguous CRLFs) and then an HTML body.

That is, to get HTML, your Apple-script must throw out the headers and the above null line.

Here is a link to the RFC 822 specification, which should be read carefully. Especially the following information, which sheds light on how to distinguish headers from the body (for their subsequent elimination from the source of message):

B.2. SEMANTICS

       Headers occur before the message body and are terminated by
  a null line (i.e., two contiguous CRLFs).

       A line which continues a header field begins with a SPACE or
  HTAB character, while a line beginning a field starts with a
  printable character which is not a colon.

       A field-name consists of one or more printable characters
  (excluding colon, space, and control-characters). A field-name
  MUST be contained on one line. Upper and lower cases are not dis-
  tinguished when comparing field-names.

ionah · January 28, 2023, 2:31pm

@bwill
Thanks! You’re confirming what I thought.
I think I will give a try to Prince as they have a personal free licence.