Safari apparently ignores the “as” parameter here entirely. It didn’t throw an error when I sent “save as 2,” among other things. What I end up with is a “.download” file which is in fact a plain text file containing the document’s page source.
On second thought can anyone recommend a scriptable browser that’s capable of saving to the WebArchive format? I just downloaded OmniWeb but it doesn’t seem to hook into the document/saving terminology in Standard Suite at all. textutil knows how to convert html to webarchive, but that’s kinda useless without embedded pictures et al.
My only solid lead so far:
http://tuvix.apple.com/documentation/Cocoa/Reference/WebKit/Classes/WebArchive_Class/Reference/Reference.html
I’ve never written a line of Cocoa in my life so that page is rather baffling. I guess I could go there if I had to.
Every ‘hit’ Google found seemed to relate to WebArchive as part of the System’s Webkit. I don’t think this accessible from plain vanilla AppleScript - I think it requires AS Studio in XCode.
Thank you Adam for your response. The more time passes and the more I Google I’m finding that webarchive probably isn’t the format I want for this project anyway. I’ll do better to write my own archiving system using the html source, I was just looking for the lazy way out with “save as,” but opened a can of worms instead. Ah well.
Except for database driven sites, however, it’s not hard to parse the HTML for img, and html=“…” links and collect them along with the rest. You could build your own complete file for a page that would open locally (not that I’ve tried doing it, but I certainly parse info out of web page sources that I use cURL to get.
There seems to be some limited discussion on the web about applescript solutions for this, one of which (I haven’t tested) is here. If that doesn’t get you anywhere, do a google seach for “webarchive safari applescript”, which might turn up more.
If this is for anyone but you, or if you plan on using different browsers with this content, note that a webarchive is not a standard format of anything. Other browsers (probably all that are not osx native)… such as firefox… do not know what to do with a webarchive. If that’s of no concern to you, continue on. Also, after looking through the webarchive protocols, it seems like you’d have to parse the html source to find a list of all of the media and resources, anyways… making writing your own solution nearly as appealing as figuring out webarchive. Doing it manually would allow you to stay in a language you know, and also give you more control over how and where data is stored. But, it’s certainly not as clean as having one cute little file to open. Unfortunately, I have not messed with the webarchive class at all, so I don’t know how deep the rabbit hole goes.
Aside from gui-scripting, and the writing your own offline archiving magic, there aren’t a lot of options. If you’ve never written any obj-c, you might find the task of jumping into the webkit to be a bit unpleasant.
Good luck,
j
Thank you Jobu. Your response sums up what I found and how I feel about it. When I was first learning to applescript I was a GUI fiend, but now that I’m learning to shell script I’m quickly becoming a GNU fiend.
The “learning” link points to a page of regex implementations. If you want to use regex in AppleScripts, you can by installing the Satimage osax
Thanks for the tip, I forgot about Satimage. That link is a little off-topic, it’s from my personal blog. Back on topic it appears that someone did in fact release a webarchive GUI script in the form of the Automator action Save Safari Web Archive.