Curl Workarounds

I’m starting to see how incredibly awesome curl commands are (thanks Stefan!) and I saw someone mentioned in an older post that w/ the curl command you can generally access anything your browser can, but occassionaly there’s some creative workarounds required for certain pages.

Well, I’m in need of a workaround for my new script.


set phoneNum to "408-886-9012"
set whitePage to "http://www.411.com/search/ReversePhone?phone="
set theDate to do shell script "curl " & whitePage & phoneNum
set theFile to (":Users:MYUSERNAME:Documents:" & "whitepage_" & phoneNum & ".txt")
try
	set ff to open for access theFile with write permission
	write theDate to ff starting at 0
	close access ff
on error
	try
		close access theFile
	end try
end try

tell application "TextEdit"
	open (":Users:MYUSERNAME:Documents:" & "whitepage_" & phoneNum & ".txt")
	
end tell


This script returns some html code, but not the entire webpage, and I see this mentioned in the code:

Forbidden you don’t have permission to access this server

Any suggestions on a workaround to this?

Use This, tested and works

set theDate to do shell script "curl -A \"Mozilla/4.0\" -s " & whitePage & phoneNum

From the man page

-A/–user-agent
(HTTP) Specify the User-Agent string to send to the HTTP server.
Some badly done CGIs fail if its not set to “Mozilla/4.0”. To
encode blanks in the string, surround the string with single
quote marks. This can also be set with the -H/–header option
of course.
If this option is set more than once, the last one will be the
one that’s used.

In most cases, a blank user-agent will also work:

set phoneNum to "408-886-9012"
set whitePage to "http://www.411.com/search/ReversePhone?phone="
set theDate to do shell script "curl -A '' " & whitePage & phoneNum

If an user-agent isn’t specified, then curl identifies itself (e.g. curl 7.13.1); Some sites block connections from such user-agents.

thank you, that worked.