Saving image from web page

Coltrane · January 10, 2010, 4:08pm

Hi

I am following a comic at http://www.hs.fi/fingerpori/1135231647953 and would like to download all strips to my hard drive. The strips are standard image files I can save but the URL of each file seem to base on some random number so I have to click a link (“Seuraava”) to get the URL of next image. Thus, as far as I understand, this precludes the usage of curl, which would be the easiest possibility.

So, I would like to write an AppleScript that right clicks the image, selects “Save image to Downloads”, click the link to get next strip and then loop this. I have already figured out the clicking link part but I don’t know how to do the saving image part.

Any help would be appreciated. Of course, if there is some easier method to achieve the same results, I would be glad to here it!

StefanK · January 10, 2010, 5:06pm

Hi,

try this


set destFolder to POSIX path of ((path to desktop as Unicode text) & "WebImages:")
do shell script "/bin/mkdir -p ~/Desktop/WebImages"
tell application "Safari" to set numberOfPictures to do JavaScript "document.images.length" in document 1

set {TID, text item delimiters} to {text item delimiters, "/"}
repeat with i from 1 to numberOfPictures
	tell application "Safari" to set picID to do JavaScript "document.images[" & ((i - 1) as string) & "].id" in document 1
	if picID starts with "strip" then
		tell application "Safari" to set picURL to do JavaScript "document.images[" & ((i - 1) as string) & "].src" in document 1
		set fName to last text item of picURL
		do shell script "curl -o " & quoted form of (destFolder & fName) & space & picURL
	end if
end repeat
set text item delimiters to TID

Coltrane · January 10, 2010, 11:45pm

I tried your script but I got following error:

error “The variable picID is not defined.” number -2753 from “picID”

(it points to line “if picID starts with “strip” then”)

StefanK · January 11, 2010, 8:15am

hm, I tested the script with your link above and it worked fine.
The site with the comic must be the current document of Safari

Coltrane · January 12, 2010, 9:05am

Thank you very much for all your help! I now got the script working. I still got the same error couple of times again without changing anything in the script - very strange…

So, below is the whole script. 5 times repeat is just for testing purposes - for final version I must increase the number to very big because I don’t know the total number of strips. Maybe it would be possible to add a feature that when the “Seuraava” link no more exist (i.e. the latest strip is showing) the repeat ends and the script e.g. shows a message dialog. But of course it stops also when it encounters an error


set destFolder to POSIX path of ((path to desktop as Unicode text) & "WebImages:")
do shell script "/bin/mkdir -p ~/Desktop/WebImages"
tell application "Safari" to set numberOfPictures to do JavaScript "document.images.length" in document 1

repeat 5 times
	
	set {TID, text item delimiters} to {text item delimiters, "/"}
	repeat with i from 1 to numberOfPictures
		tell application "Safari" to set picID to do JavaScript "document.images[" & ((i - 1) as string) & "].id" in document 1
		if picID starts with "strip" then
			tell application "Safari" to set picURL to do JavaScript "document.images[" & ((i - 1) as string) & "].src" in document 1
			set fName to last text item of picURL
			do shell script "curl -o " & quoted form of (destFolder & fName) & space & picURL
		end if
	end repeat
	set text item delimiters to TID
	
	delay 2
	tell application "System Events" to tell UI element "Seuraava" of group 10 of UI element 1 of scroll area 1 of group 3 of window 1 of application process "Safari"
		repeat until exists
			delay 0.2
		end repeat
		click
	end tell
	
end repeat

macman_al · January 17, 2010, 3:03am

Here is the solution to your request. This script starts from “today’s” comic strip and works it’s way backwards X number of pages you set in the repeat. You only need to modify the number of repeats for the number of days you want to download. I tested it with 50 and it worked fine. Running 10.6.2 (couldn’t resist the challenge)

set destFolder to POSIX path of ((path to desktop as Unicode text) & "WebImages:")
do shell script "/bin/mkdir -p ~/Desktop/WebImages"
set Start_URL to "http://www.hs.fi/fingerpori/"
set PageSource to do shell script "curl " & Start_URL

repeat 50 times
	
	--  Get date of comic strip --
	set x to the offset of ("<h1>Fingerpori</h1>") in PageSource
	set tmpTXT to text (x) thru (x + 50) of PageSource
	set TMP_Start to the (offset of ("<p>") in tmpTXT) + 3
	set TMP_End to the (offset of ("</p>") in tmpTXT) - 1
	set FileName to (text TMP_Start thru (TMP_End) of tmpTXT) & ".gif"
	
	--  Download the comic strip --
	set x to the offset of ("display: block;") in PageSource
	set tmpTXT to text (x) thru (x + 150) of PageSource
	set TMP_Start to the offset of ("http:") in tmpTXT
	set TMP_End to the (offset of ("align") in tmpTXT)
	set PictURL to text (TMP_Start) thru (TMP_End - 3) of tmpTXT
	do shell script "curl -o " & quoted form of (destFolder & FileName) & space & PictURL
	
	-- Get the URL of the next webpage --
	set x to the offset of ("Edellinen") in PageSource
	set tmpTXT to text (x - 120) thru (x) of PageSource
	set TMP_Start to the offset of ("http:") in tmpTXT
	set TMP_End to the offset of ("class") in tmpTXT
	set full_URL to text (TMP_Start) thru (TMP_End - 3) of tmpTXT
	set PageSource to do shell script "curl " & full_URL
	
end repeat