Saving Images from internet

vpatibandla5 · January 14, 2009, 2:12am

Hey guys i need a help. I am new to mac and apple script. I dont know if this would be possible in apple scripts or not. Please help me out

I got an account with snap fish and i wanna save some 300 pictures at the same time from one link. Every time if i have to save a image, i have to click that link, open the image then save it and finally go back. I have to follow the same procedure for each and every picture. Is there a way that i can automate this procedure or script it.

Thanks for your help
Venkata

Jerome · January 14, 2009, 3:12am

It might be possible if the link to the photo is a link to download it. If you do a control click on the link can you download it through the contextual menu or do you just get the thumbnail? If you can download the actual image then the link is there and you should be able to get the links from the web page and download them using URL access to a specific folder. Something like this works in a script that I have written to do something similar:

tell application "URL Access Scripting" to download ImageLink to (DownloadFolder & FileName) replacing yes without progress

mark_hunte · January 14, 2009, 10:36am

Jerome:

It might be possible if the link to the photo is a link to download it. If you do a control click on the link can you download it through the contextual menu or do you just get the thumbnail? If you can download the actual image then the link is there and you should be able to get the links from the web page and download them using URL access to a specific folder. Something like this works in a script that I have written to do something similar:
tell application "URL Access Scripting" to download ImageLink to (DownloadFolder & FileName) replacing yes without progress

If as Jerome says the link goes straight to the image then this script should work.

set DownloadFolder to POSIX file "/Users/USERNAME/Desktop/IMAGES/"

tell application "Safari"
	set thecounter to 0
	(*find  links in page *)
	set UrlCount to (do JavaScript "document.links.length" in document 1)
	
	--set UrlCount to (do JavaScript "document.images.length" in document 1)
	
	(* repeat with the  number of  links found*)
	repeat UrlCount times
		
		try
			
			--set urlLink to (do JavaScript "document.images[" & thecounter & "].src" in document 1)
			
			(* get link target *)
			set urlLink to do JavaScript "document.links[" & thecounter & "].href" in document 1
			(* check for jpeg *)
			if urlLink contains ".jpg" or urlLink contains ".jpeg" then
				
				(* get file name *)
				set theImage_Url to do shell script "basename " & quoted form of urlLink
				(* download file  *)
				tell application "URL Access Scripting" to download urlLink to (DownloadFolder & theImage_Url as string) replacing yes without progress
			end if
			
		end try
		set thecounter to thecounter + 1
	end repeat
	
end tell

I have left two commented out parts in the script that count the number of images on the page and download the images on the page ( not the target ), Just to show what can be done.

Nigel_Garvey · January 14, 2009, 12:43pm

!

if urlLink contains ".jpg" or urlLink contains ".jpeg" then

ralph.lindenfeld · January 14, 2009, 8:54pm

Another method I’ve been using to download images is via curl


do shell script "curl http://www.path/to/file/imagename.jpg -O

the -O option dumps the file to the PWD (present working directory) so I usually preface that shell script call with:


do shell script "cd new/path"

-Ralph

mark_hunte · January 15, 2009, 12:46am

Oops, it worked in my tests, but I did not have any “jpeg” links, it would have failed on the “jpeg” part.
I always forget to put the what ever contains in for the second part of the OR operator.

(Amended above)

Thanks

Jerome · January 15, 2009, 1:23pm

Ralph,

Thanks, I think I tried CURL at one point but couldn’t get it to work the way I wanted, and as I recall your shell command examples fix it. I will try those to see how it works. I don’t know a lot about shell scripting but where I have been able to figure out how (usually with the aid of this forum) to use it I have found it works as good and often faster than an straight AppleScript method.

Nigel_Garvey · January 15, 2009, 1:32pm

Sorry it was the most intelligent contribution I could make to this thread. I know it’s possible to write a single JavaScript to return all the relevant URLs (see this topic, for example), which would reduce the number of ‘do JavaScript’ commands sent to Safari from some 301 to just 1. But not knowing enough about Web page structure generally, and not knowing anything about SnapFish in particular, I couldn’t offer code tailored Venkata’s requirements.

ralph.lindenfeld · January 15, 2009, 2:17pm

Jerome- are you behind a firewall?

Jerome · January 15, 2009, 4:25pm

Standard Mac firewall here at home turned on with an AirPort Extreme router or whatever the latest N release is, should probably improve that. They had one where I worked as well. It’s been about a year since I worked on that part of it so I don’t recall the exact problems I encountered. The URL access worked the way I wanted for the most part with only a few glitches on one specific site which we just lived on and manually downloaded if it errored out.

It looks like I might have the opportunity to use this again on something that might make me some money, which is good since this economy has put both me an my wife out of work here recently. So as I look into reworking it for this new application I will try again to get the CURL method to work and see if that clears up some of the problems that I was having.

mark_hunte · January 15, 2009, 11:30pm

Jerome:

Standard Mac firewall here at home turned on with an AirPort Extreme router or whatever the latest N release is, should probably improve that. They had one where I worked as well. It’s been about a year since I worked on that part of it so I don’t recall the exact problems I encountered. The URL access worked the way I wanted for the most part with only a few glitches on one specific site which we just lived on and manually downloaded if it errored out.

It looks like I might have the opportunity to use this again on something that might make me some money, which is good since this economy has put both me an my wife out of work here recently. So as I look into reworking it for this new application I will try again to get the CURL method to work and see if that clears up some of the problems that I was having.

I would love to use Curl at work for some of the task I need to get done, But getting curl to register with the proxy server has always been a problem.

ralph.lindenfeld · January 16, 2009, 4:03pm

try:

curl http://path/to/file/name.jpg -O -x http.proxy.abc.com:12345

where

abc is the host name
12345 is the port number

-O writes to file
-x uses proxy server

your sys admin should be able to give you those values…

Nigel_Garvey · January 22, 2009, 12:14am

The script below is the sort of thing I had in mind. I think it works, but it’s difficult to test it meaningfully on a dial-up connection. The JavaScript code returns all the relevant URLs at once, formatted so that the JPEGs from any particular folder can all be retrieved with a single ‘curl’ command:

Thus only one ‘do JavaScript’ command is needed and only as many 'do shell script’s as there are folders containing the JPEGs. Specifying several files in one ‘curl’ command allows ‘curl’ to try to retrieve them all through the same connection, which (according to the “man” page) saves a lot of faffing around with handshakes. The script (if it works for others besides me) should handle Venkata’s 300 files quite a bit faster than Mark’s script above. The files are downloaded to a folder on the Desktop, into a subfolder hierarchy that reflects the hierarchy on the server.

The caveats are that there’s no progress display and I don’t know if there’s a clean way to abort the process before it’s completed… JavaScript or shell script experts are welcome to improve it.

on main()
	tell application "Safari"
		set js to " // All variables global except function parameters!
Safari1 = ('" & version & "'.split('.')[0] == '1') ;
if (Safari1) { // Safari 1.0's JavaScript doesn't always resolve relative URLs.
	// If running in Safari 1.0, prepare a look-up table to help this script do it itself.
	baseURLLookup = parent.document.URL.split('/').slice(0,-1)
	baseURL = baseURLLookup.join('/') + '/' ;
	for (i = baseURLLookup.length -1 ; i > -1 ; i--) { baseURLLookup[i] = baseURLLookup.slice(0,i).join('/') + '/' ; }
	for (i = 0, j = baseURLLookup.length - 1 ; i < j ; i++, j--) { // Safari 1.0's JavaScript doesn't have 'reverse'.
		x = baseURLLookup[i] ;
		baseURLLookup[i] = baseURLLookup[j] ;
		baseURLLookup[j] = x ;
	}
}

// This URL-gathering process is based on a script by JJ.
URLs = '' ;
for (i = 0; i < top.document.links.length; i++) { processEntry(top.document.links[i].href) ; }
for (i = 0; i < top.frames.length; i++) { mf(top.frames[i]) ; }
URLs = URLs.split('%%%').slice(0,-1).sort() ; // Sorting groups toether URLs that reference the same folders.
return combineURLs() ; 

function mf(thisFrame) {
	if (thisFrame.frames.length == 0) { // extract links from this page
		try {
			for (q = 0 ; q < thisFrame.document.links.length ; q++) { processEntry(thisFrame.document.links[q].href) ; }
		} catch (e) {
		}
	} else { // rotate again
		for (q = 0 ; q < thisFrame.frames.length ; q++) { mf(thisFrame.frames[q]) ; }
	}
}

function processEntry(thisURL) { // If this link is to a JPEG, append it to 'URLs'.
	if ((thisURL.indexOf('.jpg') > -1) || (thisURL.indexOf('.jpeg') > -1)) {
		if (Safari1) { thisURL = absURL(thisURL) ; }
		entry = thisURL + '%%%' ;
		if (URLs.indexOf(entry) == -1) { URLs += entry ; }
	}
}

function absURL(thisURL) { // If this URL is relative, get its absolute form. (Not necessary with Safari 2.0 or later.)
	if (thisURL.indexOf('://') > -1) { return thisURL ; } // The URL's already complete.
	if (thisURL.indexOf('..') == -1) { return baseURL + thisURL ; } // It's relative to this document's folder.
	
	URLbits = thisURL.split('/') ;
	for (x = 0 ; URLbits[x] == '..' ; x++) {} ;
	return baseURLLookup[x] +  thisURL.substring(x * 3,thisURL.length) ; // It's relative to some folder up-hierarchy.
}

function combineURLs() { // Combine URLs to files that are in the same folders, ie. 'http://path/to/folder/{file,file,file.}'.
	URLcombos = '' ;
	currentFolderURL = '' ;
	for (i = 0 ; i < URLs.length ; i++) {
		URLbits = URLs[i].split('/') ;
		thisFileName = URLbits[URLbits.length -1] ;
		thisFolderURL = URLbits.slice(0,-1).join('/') ;
		if (thisFolderURL == currentFolderURL) {
			URLcombos += ( ',' + thisFileName) ;
		} else {
			if (currentFolderURL != '') { URLcombos += '}\\r' ; }
			URLcombos += (thisFolderURL + '/{' + thisFileName) ;
			currentFolderURL = thisFolderURL ;
		}
	}
	if (URLcombos > '') { URLcombos += '}' ; }
	return URLcombos ;
}"
		
		set urlCombos to paragraphs of (do JavaScript js in front document)
	end tell
	
	if ((count urlCombos) > 0) then
		set downloadFolderPath to POSIX path of file ((path to desktop as Unicode text) & "Downloads folder:")
		set dataFile to ((path to temporary items as Unicode text) & "curl URL combos.txt") as file specification
		set dataFilePath to quoted form of POSIX path of dataFile
		
		set astid to AppleScript's text item delimiters
		set AppleScript's text item delimiters to "/" as Unicode text
		repeat with thisCombo in urlCombos
			-- Save this URL combo to file to be read by the shell script below.
			-- (It may be too long to go into the shell script directly.)
			set fRef to (open for access dataFile with write permission)
			try
				set eof fRef to 0
				write thisCombo as string to fRef
			end try
			close access fRef
			
			set currentDownloadPath to quoted form of (downloadFolderPath & text from text item 3 to (offset of "/{" in thisCombo) of thisCombo)
			do shell script "mkdir -p " & currentDownloadPath & " ; cat " & dataFilePath & " | xargs curl -o " & currentDownloadPath & "#1"
		end repeat
		set AppleScript's text item delimiters to astid
	end if

	beep 2 -- Finished!
end main

main()