Get all image URLs in a Safari document

Nigel_Garvey · January 29, 2006, 9:25pm

Inspired by donquexada’s script Download images and set to desktop background image, and after some rummaging around in a JavaScript primer, here’s a handler that returns a list of all the image URLs in the front Safari document.

on getImageURLs()
	-- JavaScript: get all the image URLs in a Web document
	-- as a single, line-feed-delimited Unicode text.
	set js to "var imageURLtext = \"\";
var imageCount = document.images.length;
var lastImageIndex = (imageCount - 1);

for (i = 0;  i < imageCount;  i++) {
	imageURLtext = (i < lastImageIndex) ?  (imageURLtext + document.images[i].src + \"\\n\") :  (imageURLtext + document.images[i].src);
}
return imageURLtext;"
	
	-- Get all the image URLs from the front document as a delimited Unicode text.
	tell application "Safari" to set imageURLtext to (do JavaScript js in front document)
	
	-- Separate the URLs and edit out any duplicates.
	set imageURLs to {}
	if ((count imageURLtext) > 0) then
		set allImageURLs to paragraphs of imageURLtext
		considering case
			repeat with i from 1 to (count allImageURLs)
				set thisURL to item i of allImageURLs
				if (thisURL is in imageURLs) then
				else
					set end of imageURLs to thisURL
				end if
			end repeat
		end considering
	end if
	
	return imageURLs
end getImageURLs

getImageURLs()

julifos · January 29, 2006, 10:27pm

Only for the records, the following one (very similar, only the cleaning part is done in JS itself) will support (i)frames:

set extractImages to "

var z = '';
for (i = 0; i < top.document.images.length; i++) processImg(top.document.images[i].src);
for (i = 0; i < top.frames.length; i++) mf(top.frames[i]);
z = z.split('%%%').join('');
z

function mf(obj){
	if (obj.frames.length == 0) { // extract images from this page
		try {
			obj.document.images; throw 'OK';
		} catch (e) {
			if (e=='OK') for (q=0;q<obj.document.images.length;q++) processImg(obj.document.images[q].src);
		}
	} else { // rotate again
		for (q=0;q<obj.frames.length;q++) mf(obj.frames[q]);
	}
}

function processImg(img){
	petabyte = '\\r' + img + '%%%';
	if (z.indexOf(petabyte) == -1) z += petabyte;
}

"


tell application "Safari" to rest of paragraphs of (do JavaScript extractImages in document 1)

Nigel_Garvey · January 30, 2006, 12:41am

Cool! And about three times as fast as my effort. Thanks jj. I thought there must be some way to do the whole process in JavaScript, but I couldn’t find it. I’ll study the details of your script with interest!

Your version errors if there aren’t any image URLs to return, but that’s easily fixed in the AppleScript code at the end:

set extractImages to "

var z = '';
for (i = 0; i < top.document.images.length; i++) processImg(top.document.images[i].src);
for (i = 0; i < top.frames.length; i++) mf(top.frames[i]);
z = z.split('%%%').join('');
z

function mf(obj){
	if (obj.frames.length == 0) { // extract images from this page
		try {
			obj.document.images; throw 'OK';
		} catch (e) {
			if (e=='OK') for (q=0;q<obj.document.images.length;q++) processImg(obj.document.images[q].src);
		}
	} else { // rotate again
		for (q=0;q<obj.frames.length;q++) mf(obj.frames[q]);
	}
}

function processImg(img){
	petabyte = '\\r' + img + '%%%';
	if (z.indexOf(petabyte) == -1) z += petabyte;
}

"

-- tell application "Safari" to rest of paragraphs of (do JavaScript extractImages in document 1)
tell application "Safari" to (do JavaScript extractImages in document 1)
if ((count result) is 0) then
	{}
else
	rest of paragraphs of result
end if

julifos · January 30, 2006, 12:55am

I’m an amateur javascripter (as well as applescripter), but this is a very funny (and very easy) language… If you study it, you will become a guru in a week, for sure (and you will be an expert also in lots of javascript-based languages as well)

BTW, this is my favorite JS reference:
http://www.devguru.com/Technologies/ecmascript/quickref/javascript_index.html

julifos · January 30, 2006, 1:00am

So, does this throw an error in your machine?

rest of paragraphs of ""

Nigel_Garvey · January 30, 2006, 1:15pm

Hi, jj. Thanks for the JavaScript link. It looks like a very good reference source.

No. But where “” is the result of Safari’s ‘do JavaScript’ command, I get the error: “Can’t get rest of {}.” (!) It turns out to be due to a curious AppleScript bug in Jaguar, which doesn’t occur in Tiger.

rest of paragraphs of ("" as Unicode text)
--> AppleScript error: Can't get rest of {}.

Where “” is an empty Unicode text, getting its paragraphs or characters returns an empty list, but that list has no ‘rest’! A convenient work-round seems to be a coercion to ‘item’:

rest of paragraphs of ("" as Unicode text as item)

-- Or, in your script:
tell application "Safari" to rest of paragraphs of ((do JavaScript extractImages in document 1) as item)

There are a couple of related anomalies that still exist in AS 1.10.3:

items of "" --> {}

rest of items of "" -->  AppleScript error: Can't get rest of {}.

items of ("" as Unicode text) --> AppleScript error: Can't get every item of "".

items of ("" as Unicode text as item) --> {}

rest of items of ("" as Unicode text as item) -->  AppleScript error: Can't get rest of {}.