I recently had a need to create a Script to download all PDF files linked to the foremost Safari window. The site requires that I login to access the files and the links seem to redirect to another Server.
I have had no experience working with AppleScript before, so I guess you could say that I’m lost over my head in deep water. Someone else was able to help me out a bit earlier, but the result was only downloading 4k files that wouldn’t open.
Here’s what I have so far.
tell application "Safari"
set nLinks to do JavaScript "document.links.length" in document 1
repeat with i from 0 to nLinks - 1
set strURL to do JavaScript "document.links[" & i & "].href" in document 1
if strURL ends with ".pdf" then my download(strURL)
end repeat
end tell
to download(strURL)
set fileName to getFileName(strURL)
do shell script "curl -o /Desktop/" & fileName & " " & strURL
end download
to getFileName(strPath)
return last item of split(strPath, "/")
end getFileName
to split(str, delim)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to delim
set str to every text item of str
set AppleScript's text item delimiters to astid
return str
end split
If anyone could PLEASE help me figure this one out, I would be greatly appreciative! I’m kinda in a jam at the moment and I really don’t want to have to download all these files manually.
I made some slight modifications but the only one that probably matters is adding the tilde before the Desktop that tells CURL to download to the local Desktop folder. You could probably add some more error checking with “quoted form of …” to ensure that the paths passed to the shell have their quotes properly escaped though this probably isn’t a huge probably if the page you are parsing is valid HTML. Anyway, this code works for me (be sure you are testing on a page that actually has PDF links):
Hey, thanks for the help! I tried the Script with the modifications that you made, but there were a few problems. The main one was that it downloaded all the PDF’s, but again each was only 4k in size and would not open.
I’m still confused on that part because I don’t know if it’s because I have to login to this particular site to access these files, or if there’s some redirecting going on that I don’t know about.
The reason I say that is because I opened one of the files with TextEdit and saw the message “Found. This file can be found here.” As far as the HTML goes, it seems valid enough. The only difference between the page and the links is the URL that they are located. IE:
Any ideas what the problem could be? If so, what would I have to do to correct it? I’m really stumped on this one, probably because I’ve never used AppleScript before.
When you say you need to login, do you mean you log in to get to the page with the links on it, or that you need to log in when you click on a link to download the pdf’s? If it is a problem with a server redirect, you can add ‘-L’ to your curl string and it will attempt to follow any redirects passed to curl in the page header. If you’re getting a link that says “this page has been moved here”, it’s a server message default on some servers that means either they’ve moved the pages or you’ve got the address wrong. Without more specifics, like the page source code, anything more would be pure speculation.
Login to get to the page with the links, not to download them. As far as adding “-L” to the Curl string, would that be in place of the “-o” or with it? I’ll also take a look at the page’s source code to see if I notice anything.
I added “-L” to the curl sting like you suggested. Still no luck, but here’s an example of what I see going on in the Event Log.
NOTE: I replaced the Site, Directory and File name with generic info as to not identify it’s source and possibly violate any agreements that I may have.
I think that part of the problem, is that maybe the server is checking to see that I am logged-in before actually allowing me to download the files. Thus, why the link inside the 4k files is directing me to a portion of the site that only the Public can access.
Isn’t there a way to tell the Script to pass pre-defined Login information, or establish a session when accessing a particular site? Let me know if you can make any sense of it. Thanks again for all your help!
Actually it was I who suggested using -L. But, using -L will not help you in this situation.
I’ve been writing an app to do something very similar to this for a slightly different problem. I’m guessing that being “logged in” is really just based on the existence of a cookie. Check your cookies file and see if there is a cookie from the site you’re accessing. If so, you also need to send them the cookies they are looking for so they know your not the “ENEMY” :rolleyes:. Getting the cookie requires logging in via the form. Once you have it, you can tell curl to send that cookie along with your request. I recently posted code similar to this for downloading an image, modified to pass and retrieve cookies from the server.
(* The url to get the pdf from *)
set theFileUrl to "http:/ /www.thesecretgarden.net/theHidingPlace/eyes_only/snapDragon1.pdf"
(* Set the path to the cookies file *)
set pathToCookies to (POSIX path of "Mac HD:abUsers:Secret Location:cookies.txt")
set pathToCookies to ("'" & pathToCookies & "'") as string --> Pad the string with single quotes to allow spaces in the curl command
(* Get the pdf using curl *)
try
set theCurlCommand to ("curl -o '~/Desktop/snapDragon1.pdf' -b " & pathToCookies & " -c " & pathToCookies & " " & theFileUrl)
do shell script theCurlCommand
end try
The -b and -c parameters tell it where to find old cookies and where to save new ones if required.
Ideally (imo) you would use applescript to log in and dl the pdf’s. Then you could log in, write your own cookies file, and get the pdf’s without the assistance of a browser. With the code posted above, you’ll need to log in with your browser, and then access the browser’s cookie file for any cookies you need. It must be a “netscape-style” cookies file. More info about passing cookies (:oops:) with curl can be found here and here.
Without more specifics, there’s little else I can do. The whole process could most likely be automated, but you’ll have to at least give the url of the login page (in private if desired) and help us see some more of the details. I don’t think any of us need any more top-secret pdf’s, anyways.