Web scrape foiled by button

HI all,

I’m trying to scrape election result from the florida division of elections site. There is a button to download the results, but when I use the url that it refers to, I get an incomplete file ” basically just the headers in tab delimited form.

I think the url is requesting a cookie that instructs the full download of data.

Here’s the link to the download page: http://enight.dos.state.fl.us/EnightResultsDownload.html
The button links to: http://enight.dos.state.fl.us/ResultsExtract.Asp

Let me say first that I don’'t know anything about .asp

The source relevant page code

When selecting the ˜Download' button below, a tab-delimited file of election results will be downloaded to your machine. The format of this file can be opened and further processed in most spreadsheet and database programs. The table below provides a description of the fields comprising the data extract file.

Is there a way apple script something that will make this button perform properly or am I outta luck?

Thanks in advance

The problem is not a cookie. It is that the URL that you see in the browser after clicking on that button does not fully encapsulate the data that is needed to replicate the request that your browser made to the server. The browser sends data from that form (with the single visible button and several hidden inputs) is submitted to the server through the connection made to the server in addition to (parts of) the URL.

I wrote up this code that uses curl to submit the form data (from the hidden form input fields) and save the result to a chosen file:

set theUrl to "http://enight.dos.state.fl.us/ResultsExtract.Asp"
set formDataURLEncoded to "ElectionDate=01%2F29%2F2008&OfficialResults=N&PartyRaces=Y&DataMode=E&FormsButton2=Download"
set outputPathname to choose file name with prompt "Save Election Data To File" default name "Florida 2008-01-29.tsv" default location (path to desktop folder)

set shellCmd to "/usr/bin/curl --output " & quoted form of POSIX path of outputPathname & " --data " & quoted form of formDataURLEncoded & "  " & quoted form of theUrl

do shell script shellCmd

It is not generic in that the form data is pre-encoded in the script, but it was enough to let me download some data in the case you mentioned. Hopefully you should be able to adapt it for any other uses you come across.

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari 3.0.4 (523.12)
Operating System: Mac OS X (10.4)

Thanks! That makes sense.

I’ve tried to hard-code the location and set it with a string and in both cases I get a “can’t
make “MacHD:Users:newsgraphics:Desktop:” folder into type alias” error. Am I missing something obvious?


set theUrl to "http://enight.dos.state.fl.us/ResultsExtract.Asp"
set formDataURLEncoded to "ElectionDate=01%2F29%2F2008&OfficialResults=N&PartyRaces=Y&DataMode=E&FormsButton2=Download"
set outputPathname to choose file name with prompt "Save Election Data To File" default name "Florida 2008-01-29.tsv" default location (my (path to desktop folder) as string)

set shellCmd to "/usr/bin/curl --output " & quoted form of POSIX path of outputPathname & " --data " & quoted form of formDataURLEncoded & " " & quoted form of theUrl

do shell script shellCmd

Even if I can’t get this done for tonight’s election, I appreciate the help. I haven’t played much with shell commands. Thanks again.

ended up using a single url in automator that looked like so:

http://enight.dos.state.fl.us/ResultsExtract.Asp?ElectionDate=01%2F29%2F2008&OfficialResults=N&PartyRaces=Y&DataMode=E&FormsButton2=Download

Worked like a charm. Scraped all night. Thanks again for your help.