Make scripting Safari pure fun with the help of DEVONtechnologies

Have you ever viewed a website in Safari and wanted to automatically download all linked PDF documents instead of manually clicking onto the links? Or get the embedded web videos and images? For me, the answer is definitely YES.

Unfortunately, most (free) Mac web browser simply don’t feature advanced AppleScript support. Of course, they let you create new documents and tabs or even execute Javascript with Apple’s unique language of automation. But as soon as your script gets mature and wants to grab all GIF images of a website, you will be deeply disappointed about the few available AppleScript commands most browsers offer.

And that’s sad, because web browser are now such essential tools to get bulks of valuable information to your desktop. Automation hooks could turn out to be a real competitive advantage.

At least Safari, which is quite scriptable, let’s you get the source code of a website, which you can then painfully parse yourself. But in my opinion, that is like travelling economy class when you have a free first class ticket in your pocket. The browser already parsed the HTML document, it surely knows about all contained objects. So why parse it all again?

But knowledge workers, rejoice! There is no need to be desperate, as DEVONtechnologies comes to the rescue. Let me show you, how you can use two of their products to make scripting Safari pure fun. Or at least less painful.

If you are like me and already own a copy of DEVONagent or DEVONthink, that is perfect. If not, then I recommend to download the demo, so that you can test the AppleScripts presented in this article on your Mac.

All scripts were tested on Intel & PowerPC based Mac, moreover they require an internet connection and Mac OS X 10.5 (simply because of the newly introduced Downloads folder). If you only have DEVONthink Pro available, then you need to open the scripts first in the Script Editor and replace the command «tell application “DEVONagent”» with «tell application “DEVONthink Pro”», as I wrote them for DEVONagent. The source code also contains hints how to modify the code, so that the scripts can be used with earlier incarnations of Mac OS X.

Both DEVONthink Pro and DEVONagent feature an extraordinary AppleScript dictionary that also contains some very powerful commands to process websites and RSS feeds. By combining them with Safari’s great ability to return the source code of a website, you can easily and quickly create convenient workflows, that can save you a lot of time and clicking.

Let’s have a look at the first example:

Example 1: Download all PDF files linked to in Safari’s frontmost document View source code

As described in the beginning of the article, there are times when you want to download every linked PDF document of a website you are currently viewing in Safari. The Safari PDF Grabber below will do exactly this for you without the need to manually search and click the links. I actually created this script for mass downloading product brochures from the web into out supplier database in order to keep our R&D staff up-to-date.

Safari PDF Grabber (ca. 38 KB

To use this script, please first download and unzip it. Then open a website in Safari that contains links to PDF files, just like this sample Google search for Berol surfactants from Akzo Nobel. Start the AppleScript with a double click and watch your Downloads stack.

Here is how it works: The script asks Safari for the URL and source code of the frontmost document. If successfull, it passes these values on to DEVONagent, which extracts all PDF links from the corresponding website using its fantastic «get links of» command. The shortened AppleScript code segment looks like this:


tell application "Safari"
	set websiteurl to URL of document 1
	set websitesource to source of document 1
end tell

tell application "DEVONagent"
	set websitelinks to get links of websitesource base URL websiteurl type "PDF"
end tell

The PDF links are then processed by a Python script located inside the AppleScript bundle, which downloads the PDF documents to the Downloads folder using the urllib. Unfortunately AppleScript’s own built-in URL Access Scripting is just not reliable enough to accomplish the download task. The Python script will not overwrite existing files with the same file name, but append a number to the new file name, much like the Finder does.

It’s really great: You don’t need to parse the raw HTML source yourself, DEVONagent does the job just fine and you can concentrate on finding more of those interesting PDF documents.

Example 2: Download all embedded image files in Safari’s frontmost document View source code

Studying lots of PDF documents may improve your knowledge tremendously, but often an image can express so much more. So let’s focus on downloading images now. With only slight modifications to the previous script, we can easily create an AppleScript that will download all images contained in Safari’s frontmost document:

Safari IMG Grabber (ca. 43.4 KB)

The usage is the same as with the Safari PDF Grabber.

The script’s internal procedure is also almost the same as with the Safari PDF Grabber, except this time we are using a special command from DEVONagent’s AppleScript dictionary, that extracts image links only: «get embedded images of»


tell application "Safari"
	set websiteurl to URL of document 1
	set websitesource to source of document 1
end tell

tell application "DEVONagent"
	set websitelinks to get embedded images of websitesource base URL websiteurl
end tell

If you want to download images of a certain file fomat only (PNG/JPEG/GIF), you can easily specify this like follows:


tell application "DEVONagent"
	set websitelinks to get embedded images of websitesource base URL websiteurl type "PNG"
end tell

Flickr, here we come!

Example 3: Disassembling a RSS feed in Safari’s frontmost document

The last example in this article is addressed to advanced web and AppleScript users and does not provide a complete script, but rather presents a method to process your fine collection of RSS feeds more efficiently.

We all know, that RSS feeds are a great resource for your daily shot of interesting news. They are nicely stuctured and come without the clutter of website themes and other graphical bling bling. Just text, which can be easily scanned for subjects and guarantees a fast read.

So let’s take this one step further and write an AppleScript, that automatically searches the articles of a RSS (or RDF/Atom) feed for certain keywords and opens the corresponding links in new tabs. The example code below uses Apple’s Hot News RSS feed and searches for articles related to ‘AppleScript’:


tell application "Safari"
	make new document with properties {URL:"feed://www.apple.com/main/rss/hotnews/hotnews.rss"}
	delay 3
	set feedurl to URL of document 1
end tell

tell application "DEVONagent"
	set feedsource to download markup from feedurl
	set feeditems to get items of feed feedsource
	repeat with feeditem in feeditems
		if |title| of feeditem contains "AppleScript" or description of feeditem contains "AppleScript" then
			tell application "Safari"
				tell window 1
					make new tab with properties {URL:link of feeditem}
				end tell
			end tell
		end if
	end repeat
end tell

As you can see from the code above, DEVONagent’s «get items of feed» command is very powerful. It returns a record of articles and each article supports the following keys that can be inspected: Title, link, date, description, content, author, and HTML code.

I guess you can already think of many ways how to make use of this script snippet :smiley:

That’s all for today, have fun scripting Safari!

Closing words:

I am totally aware of the fact, that you can achieve the same results by using another programming language and regular expressions. But I am an AppleScript nut and I wanted to show how you can easily get acceptable results without parsing and studying the regex bible :smiley:

Just wanted to say thanks for this. Looks like they might have a new customer in me. Anything that will solve a few Applescipt problems for me a year is welcome on my machine.

this is a great post, thanks!
just used it to download a lot of linked PDFs for my research, and it worked a treat in Lion.
cheers.
Joe Lafferty (Dundee, Scotland)

Model: MacBook Pro
AppleScript: latest? Lion
Browser: Safari 534.48.3
Operating System: Mac OS X (10.7)

A further question for additional help. Is it possible to script to download from a link like I have indicated below?
This is for a text or rich text file from the website below.
http://www.entertheworshipcircle.com/store/index.php?cPath=130_179_147&ref=1&filter=album&sort=2a&page=show_all

I’ve amended the script above from PDF to TXT but when I run it i get the error
“Sorry an error occurred:
DEVONagent could not find any links in the HTML document.” then quotes the web page with (—) at the end.

is this because this is a kind of quasi shopping site, even tho the downloads are free?

thanks,

Joe
—example single links below

RTF
http://www.entertheworshipcircle.com/store/download_free.php?products_id=1384

TXT
http://www.entertheworshipcircle.com/store/download_free.php?products_id=906

Model: MacBook Pro
AppleScript: latest? Lion
Browser: Safari 534.48.3
Operating System: Mac OS X (10.7)