I am following a comic at http://www.hs.fi/fingerpori/1135231647953 and would like to download all strips to my hard drive. The strips are standard image files I can save but the URL of each file seem to base on some random number so I have to click a link (“Seuraava”) to get the URL of next image. Thus, as far as I understand, this precludes the usage of curl, which would be the easiest possibility.
So, I would like to write an AppleScript that right clicks the image, selects “Save image to Downloads”, click the link to get next strip and then loop this. I have already figured out the clicking link part but I don’t know how to do the saving image part.
Any help would be appreciated. Of course, if there is some easier method to achieve the same results, I would be glad to here it!
set destFolder to POSIX path of ((path to desktop as Unicode text) & "WebImages:")
do shell script "/bin/mkdir -p ~/Desktop/WebImages"
tell application "Safari" to set numberOfPictures to do JavaScript "document.images.length" in document 1
set {TID, text item delimiters} to {text item delimiters, "/"}
repeat with i from 1 to numberOfPictures
tell application "Safari" to set picID to do JavaScript "document.images[" & ((i - 1) as string) & "].id" in document 1
if picID starts with "strip" then
tell application "Safari" to set picURL to do JavaScript "document.images[" & ((i - 1) as string) & "].src" in document 1
set fName to last text item of picURL
do shell script "curl -o " & quoted form of (destFolder & fName) & space & picURL
end if
end repeat
set text item delimiters to TID
Thank you very much for all your help! I now got the script working. I still got the same error couple of times again without changing anything in the script - very strange…
So, below is the whole script. 5 times repeat is just for testing purposes - for final version I must increase the number to very big because I don’t know the total number of strips. Maybe it would be possible to add a feature that when the “Seuraava” link no more exist (i.e. the latest strip is showing) the repeat ends and the script e.g. shows a message dialog. But of course it stops also when it encounters an error
set destFolder to POSIX path of ((path to desktop as Unicode text) & "WebImages:")
do shell script "/bin/mkdir -p ~/Desktop/WebImages"
tell application "Safari" to set numberOfPictures to do JavaScript "document.images.length" in document 1
repeat 5 times
set {TID, text item delimiters} to {text item delimiters, "/"}
repeat with i from 1 to numberOfPictures
tell application "Safari" to set picID to do JavaScript "document.images[" & ((i - 1) as string) & "].id" in document 1
if picID starts with "strip" then
tell application "Safari" to set picURL to do JavaScript "document.images[" & ((i - 1) as string) & "].src" in document 1
set fName to last text item of picURL
do shell script "curl -o " & quoted form of (destFolder & fName) & space & picURL
end if
end repeat
set text item delimiters to TID
delay 2
tell application "System Events" to tell UI element "Seuraava" of group 10 of UI element 1 of scroll area 1 of group 3 of window 1 of application process "Safari"
repeat until exists
delay 0.2
end repeat
click
end tell
end repeat
Here is the solution to your request. This script starts from “today’s” comic strip and works it’s way backwards X number of pages you set in the repeat. You only need to modify the number of repeats for the number of days you want to download. I tested it with 50 and it worked fine. Running 10.6.2 (couldn’t resist the challenge)
set destFolder to POSIX path of ((path to desktop as Unicode text) & "WebImages:")
do shell script "/bin/mkdir -p ~/Desktop/WebImages"
set Start_URL to "http://www.hs.fi/fingerpori/"
set PageSource to do shell script "curl " & Start_URL
repeat 50 times
-- Get date of comic strip --
set x to the offset of ("<h1>Fingerpori</h1>") in PageSource
set tmpTXT to text (x) thru (x + 50) of PageSource
set TMP_Start to the (offset of ("<p>") in tmpTXT) + 3
set TMP_End to the (offset of ("</p>") in tmpTXT) - 1
set FileName to (text TMP_Start thru (TMP_End) of tmpTXT) & ".gif"
-- Download the comic strip --
set x to the offset of ("display: block;") in PageSource
set tmpTXT to text (x) thru (x + 150) of PageSource
set TMP_Start to the offset of ("http:") in tmpTXT
set TMP_End to the (offset of ("align") in tmpTXT)
set PictURL to text (TMP_Start) thru (TMP_End - 3) of tmpTXT
do shell script "curl -o " & quoted form of (destFolder & FileName) & space & PictURL
-- Get the URL of the next webpage --
set x to the offset of ("Edellinen") in PageSource
set tmpTXT to text (x - 120) thru (x) of PageSource
set TMP_Start to the offset of ("http:") in tmpTXT
set TMP_End to the offset of ("class") in tmpTXT
set full_URL to text (TMP_Start) thru (TMP_End - 3) of tmpTXT
set PageSource to do shell script "curl " & full_URL
end repeat