Downloading a PDF that uses JavaScript in its link

rdlsmith · November 7, 2010, 1:27am

I’m a member of www.investors.com (the on-line home of Investor’s Business Daily, IBD) and as such I like to download the paper every day it’s available. I’d like to be able to script this. ( Because a spring loaded web page just isn’t good enough. )

At first I thought authentication would be the issue but I found I can authenticate through the browser (Safari) and save the password to the keychain and I’m in. I’d like other options as to authentication but for now and to keep this focused I’ll limit this thread to just getting the download correct.

Because the Safari browser stores my password I can load the following link and go straight to where I need to download my nightly PDF:

http://www.investors.com/MyIBD/RecentIssues.aspx

If I context click the most recent link and “copy link” as of this writing I see the following.

http://eibd.investors.com/pdf/eIBD110810.pdf#page=1

Using this code:



-- http://eibd.investors.com/pdf/eIBD110810.pdf

display dialog "Download URL" default answer (return & return) buttons {"Cancel", "Download"} default button 2
set theURL to text returned of the result
set theName to text -((offset of "/" in (reverse of characters of theURL) as text) - 1) thru -1 of theURL

set theFile to (choose file name default name theName default location (path to desktop folder))
tell application "URL Access Scripting" to download theURL to theFile replacing yes

--do shell script "/usr/bin/curl " & theURL & " -o " & quoted form of POSIX path of theFile

I feed it the link (http://eibd.investors.com/pdf/eIBD110810.pdf) and it does indeed save a file but not all of it. Something like 4KB when it should be much larger. My browser is open and I am authenticated to the page. I do get an error message when that is not the case.

I tried the do shell as well, same results.

This might be of use, here’s the view source of the page and link:

                <li>                    
                    Monday - 11/08/2010
                     ( 
                    <a href="http://eibd.investors.com/pdf/eIBD110810.pdf#page=1" target="_blank" onclick="javascript:__doPostBack('PDF', ''); return true;">PDF</a>
                     )
                </li>

If, in Safari, I manually context click that link I can use the “Download Linked File As…” menu option and download the PDF. If I just click the link, it opens the PDF in the browser. I never use the second option as I prefer to download the file.

So I’m looking for options. I suppose if I could script an automator that would load the page and download the most recent every night, that would be great. If I could get it to work. Automator blows up (exits) the record mode whenever I click that link. I know less about automator than AppleScript and only tried hoping I could save some actions as script and learn from them.

Even better, I’d like to write an AppleScript Application as I think I would understand that better. I have many hurdles to overcome here but the first is just downloading the file. Technically, the script does create a file but I have no idea if it’s really downloading anything. The format is invalid when trying to open.

I appreciate any efforts to help me resolve this issue.