Curl to search MacScripter

Hi everyone,

I am building an app similar to “MacScripterWatcher” that has some additional
functionality.

One thing I want to accomplish is to perform a search using curl and get the
resulting html. When I do this now the resulting html has “You do not have permission.
Not logged in” error in the body even though I am currently logged in.

Getting the html from a specific page is working normally just not when performing a search.

Is there a way around this?

Thanks for your help.

Craig

Here is what I am using.


set the_url to "http://bbs.macscripter.net/search.php?action=search&keywords=attach+drawer&author=&forum=3&sort_by=5&sort_dir=DESC&show_as=topics&search=Submit"

set x to my get_URL_source(the_url)

on get_URL_source(the_url)
	if class of the_url = string then
		set the_script to "cURL " & (quoted form of the_url)
	else if class of the_url = list then
		set the_script to "cURL "
		repeat with i from 1 to count of the_url
			if class of (item i of the_url) = string then
				set the_script to the_script & " " & (quoted form of (item i of the_url))
			else
				return "error: the submitted parameter is not a string or list of strings."
			end if
		end repeat
	else
		return "error: the submitted parameter is not a string or list of strings."
	end if
	return do shell script the_script
end get_URL_source

You have to be logged in to search here. However, curl won’t have the same cookies as Safari, so you are not logged in.

Hi Craig,

I don’t want any competition to MacScripterWatcher! ;):wink:

I know your problem, but I have no solution for it.
If you try to access a page in a restricted area, you have to login in any way.
Already being logged in with a browser is not sufficient, neither using the -u flag with username and password
curl has the capability to handle cookies, so maybe it could be possible to use that.

Thanks guys. I will get right on the cookie option.

Regards,

Craig

The main idea is to use the –cookie-jar option and login to the web site by having curl POST the login form information as if it had been filled out in a normal browser. Then for the search queries, use the –cookie option to supply the login cookie. The following seems to work for me:


set un to "user"
set pw to "password"

set cjpp to getCookieJarPOSIXPath()
set l to Login_MacScripterBBS(un, pw, cjpp)
set s to Search_MacScripterBBS("http://bbs.macscripter.net/search.php?action=search&keywords=attach+drawer&author=&forum=3&sort_by=5&sort_dir=DESC&show_as=topics&search=Submit", cjpp)
{l, s}

to Search_MacScripterBBS(search, cjpp)
	(* Salient bit:
	Use the --cookie option, giving it the path to the previously written cookie jar file so that the required login cookie (punbb_cookie) will be supplied to (e.g.) search.php.
		Alternatively, supply the cookie itself (maybe extracted from the file written to by Login_MacScripterBBS, or copied from some cookie tool): --cookie punbb_cookie=value
	Adjust the rest as needed.
	*)
	do shell script "
 /usr/bin/curl \\
 --verbose \\
 --cookie " & quoted form of cjpp & " \\
 " & quoted form of search & " \\
 2>&1"
end Search_MacScripterBBS

to Login_MacScripterBBS(un, pw, cjpp)
	(* Salient bits:
	Use --data options to provide form data to login.php.
	Use --cookie-jar to have curl record the cookie that login.php sets.
	Use --verbose and capture stderr (2>&1) for debugging/fun.
	*)
	do shell script "
 /usr/bin/curl \\
 --verbose \\
 --data form_sent=1 \\
 --data redirect_url=index.php \\
 --data req_username=" & quoted form of encode_form_value(un) & " \\
 --data req_password=" & quoted form of encode_form_value(pw) & " \\
 --data login=Login \\
 --cookie-jar " & quoted form of cjpp & " \\
 http://bbs.macscripter.net/login.php?action=in \\
 2>&1"
end Login_MacScripterBBS

to getCookieJarPOSIXPath()
	path to temporary items folder from user domain
	POSIX path of result
	result & "curl-cookies-XXXXXXXXXX"
	do shell script "/usr/bin/mktemp " & quoted form of result
end getCookieJarPOSIXPath

to encode_form_value(str)
	encode_text(str, true, true) -- This is not exactly what the spec calls for (http://www.w3.org/MarkUp/html-spec/html-spec_toc.html#SEC8.2.1), but it might work anyway.
end encode_form_value

(* URL encoding from: http://www.apple.com/applescript/sbrt/sbrt-08.html "Text Encoding | Decoding" *)

-- A sub-routine for encoding high-ASCII characters:
on encode_char(this_char)
	set the ASCII_num to (the ASCII number this_char)
	set the hex_list to {"0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "A", "B", "C", "D", "E", "F"}
	set x to item ((ASCII_num div 16) + 1) of the hex_list
	set y to item ((ASCII_num mod 16) + 1) of the hex_list
	return ("%" & x & y) as string
end encode_char

-- this sub-routine is used to encode text 
on encode_text(this_text, encode_URL_A, encode_URL_B)
	set the standard_characters to "abcdefghijklmnopqrstuvwxyz0123456789"
	set the URL_A_chars to "$+!'/?;&@=#%><{}[]\"~`^\\|*"
	set the URL_B_chars to ".-_:"
	set the acceptable_characters to the standard_characters
	if encode_URL_A is false then set the acceptable_characters to the acceptable_characters & the URL_A_chars
	if encode_URL_B is false then set the acceptable_characters to the acceptable_characters & the URL_B_chars
	set the encoded_text to ""
	repeat with this_char in this_text
		if this_char is in the acceptable_characters then
			set the encoded_text to (the encoded_text & this_char)
		else
			set the encoded_text to (the encoded_text & encode_char(this_char)) as string
		end if
	end repeat
	return the encoded_text
end encode_text

-- A sub-routine for decoding a three-character hex string:
on decode_chars(these_chars)
	copy these_chars to {indentifying_char, multiplier_char, remainder_char}
	set the hex_list to "123456789ABCDEF"
	if the multiplier_char is in "ABCDEF" then
		set the multiplier_amt to the offset of the multiplier_char in the hex_list
	else
		set the multiplier_amt to the multiplier_char as integer
	end if
	if the remainder_char is in "ABCDEF" then
		set the remainder_amt to the offset of the remainder_char in the hex_list
	else
		set the remainder_amt to the remainder_char as integer
	end if
	set the ASCII_num to (multiplier_amt * 16) + remainder_amt
	return (ASCII character ASCII_num)
end decode_chars

-- this sub-routine is used to decode text strings 
on decode_text(this_text)
	set flag_A to false
	set flag_B to false
	set temp_char to ""
	set the character_list to {}
	repeat with this_char in this_text
		set this_char to the contents of this_char
		if this_char is "%" then
			set flag_A to true
		else if flag_A is true then
			set the temp_char to this_char
			set flag_A to false
			set flag_B to true
		else if flag_B is true then
			set the end of the character_list to my decode_chars(("%" & temp_char & this_char) as string)
			set the temp_char to ""
			set flag_A to false
			set flag_B to false
		else
			set the end of the character_list to this_char
		end if
	end repeat
	return the character_list as string
end decode_text

Note that your MacScripter BBS username and password will be exposed on the command line and as such may be visible to other users on the same machine (through ps and the like). If you need it to be better secured, this could be avoided by using –data’s ability to read from a file instead of the command line.

In practice you might not have to do the login transaction each time. Instead you might do it once and keep the cookie jar in a predictable, resuable place (not in a temp directory with a randomized filename). That way the login cookie could be reused across multiple searches. Alternatively, if you normally use Firefox (maybe other browsers, too) to login to the MacScripter BBS, you could point curl to the browser’s cookie file (i.e. ~/Library/Application Support/Firefox/Profiles/something/cookies.txt). Safari stores cookies in a plist file that is incompatible with curl’s –cookie option.

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari Version 3.1 (4525.13)
Operating System: Mac OS X (10.4)

Edit History: Add full path to mktemp call. Fix typo and clarify comments in code.

Hi chrys,

brilliant, works perfectly, and thank you very much for the explanation
FYI: The new version of FireFox uses also a sql file for the cookies,
but Camino creates the plain text file.

For the URL encoding stuff you can also use a perl line

to encode_form_value(str)
	do shell script "perl -e 'use URI::Escape; print uri_escape(\"" & str & "\")';"
end encode_form_value

chrys,

Thank you so much! Works great.

Regards,

Craig

Hey guys,

I spoke too soon.

The first time I ran the script everyting worked fine.
Since then I get this error.

I tried a re-start just incase my computer was acting funny.

Thanks again!

Craig

Error:

The part that is highlighted in SD.

It is odd that it would work once but not again. It worked many times as I was testing it. All my testing with with 10.4.11 and Script Editor. I do not know of a reason that the shell would starting considering the unquoted newline to be a command name (this is what I have inferred from the error message). Try removing the newline and space before /usr/bin/curl. Really, you could take out all the trailing backslashes, embedded newlines, and extra spaces. That would put it all on one line like most of the shorter do shell script command lines.

I originally broke it up that way to try to make it a little easier to read. The backslashes should be at the very end of the line (no trailing spaces!) and are double so that the string that the shell gets has an actual, single backslash at the end of every line that is in the middle of the command. This causes the shell to “continue” the line by ignoring the backslash and the following newline.

Without the initial newline:

do shell script "/usr/bin/curl \\
 --verbose \\
 --data form_sent=1 \\
 --data redirect_url=index.php \\
 --data req_username=" & quoted form of encode_form_value(un) & " \\
 --data req_password=" & quoted form of encode_form_value(pw) & " \\
 --data login=Login \\
 --cookie-jar " & quoted form of cjpp & " \\
 http://bbs.macscripter.net/login.php?action=in \\
 2>&1"

Without embedded newlines:

do shell script "/usr/bin/curl --verbose --data form_sent=1 --data redirect_url=index.php --data req_username=" & quoted form of encode_form_value(un) & " --data req_password=" & quoted form of encode_form_value(pw) & " --data login=Login --cookie-jar " & quoted form of cjpp & " http://bbs.macscripter.net/login.php?action=in 2>&1"

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari Version 3.1 (4525.13)
Operating System: Mac OS X (10.4)

Edits: Clarify and explain about newline as command name. Clarify description of suggestion. Grammar, typos, punctuation.

chrys,

That fixed it. I had to change it in both sections where there were \ but then it
worked. I ran several tests using different searches and they all worked.

Thanks again!

Craig

Glad to hear that getting rid of the backslashes and line breaks got it working. I would be interested to know about it if you ever discover what caused it to stop working.

Software updates or installation? Change in ~/.MacOSX/environment.plist? Some other change in the environment inherited by the shell started by do shell script? Was there some change to /bin/sh that corresponds to the change in behavior?

It is not important, but it is something that has left me puzzled.

Hmm.

to tryIt(cmd)
	try
		do shell script cmd
	on error m number n
		{|ERROR message|:m, |ERROR number|:n}
	end try
end tryIt

{|LF|:tryIt(ASCII character 10), |CR|:tryIt(ASCII character 13)}
(* -> {|LF|:"", |CR|:{|ERROR message|:"sh: line 1: 
: command not found", |ERROR number|:127}} *)

Do you suppose it is possible that the newlines in the string literal in the script text were carriage returns instead of line feeds? Maybe the breakage happened after copy and pasting through some program that translated the line breaks into carriage returns? Was the script saved as a plain text file at some point?

This has become a lesson for me to avoid embedded line breaks in string literals if the difference between CR and LF is important.

Model: iBook G4 933
AppleScript: 1.10.7
Browser: Safari Version 3.1 (4525.13)
Operating System: Mac OS X (10.4)

I haven’t followed this closely. I just wanted to point out a different way to break up long lines:

do shell script "/usr/bin/curl " & ¬
	" --verbose" & ¬
	" --data form_sent=1" & ¬
	" --data redirect_url=index.php" & ¬
	" --data req_username=" & quoted form of encode_form_value(un) & ¬
	" --data req_password=" & quoted form of encode_form_value(pw) & ¬
	" --data login=Login" & ¬
	" --cookie-jar " & quoted form of cjpp & ¬
	" [url=http://bbs.macscripter.net/login.php?action=in]http://bbs.macscripter.net/login.php?action=in"[/url] & ¬
	" 2>&1"

Hey chrys,

Did you get my email? Just checking.

Also, I figured out why it was not working in Xcode even though it worked in SD.

I was using the same handler used to format the search words for my user name.
It puts a “+” between words like “Craig+Williams”

Once that was changed the searches work.

Bruce -

Thanks for your suggestion. BTW, are you running Leopard?

Regards,

Craig

I got your email a little while ago. I started looking through the code, but I had not yet come up with anything conclusive.
Good to hear that you god it working. The reason and fix makes sense. More detail in an email followup.

Yes.