curl question

RIRedinPA · December 10, 2008, 2:59pm

I am writing this script as part of a larger project and I can’t seem to understand an issue I am having with it:


set rsslist to {"http://www.nytimes.com/services/xml/rss/nyt/Magazine.xml"}

repeat with r from 1 to count of every item of rsslist
	set rsscontent to do shell script "curl " & item r of rsslist
	set contentlist to parsecode(rsscontent, "link") as list
	repeat with c from 1 to count of every item of contentlist
		set thisurl to item c of contentlist
		set thiscontent to do shell script "curl  " & thisurl as text
		return thiscontent
	end repeat
	
end repeat



on parsecode(code, tag)
	set opentag to "<" & tag & ">"
	set closetag to "</" & tag & ">"
	set itemlist to {}
	set AppleScript's text item delimiters to opentag
	set taglist to every text item of code as list
	set childtaglist to {}
	repeat with x from 2 to count of every item of taglist
		copy item x of taglist to end of childtaglist
	end repeat
	
	repeat with thisitem in childtaglist
		set AppleScript's text item delimiters to closetag
		copy text item 1 of thisitem to end of itemlist
		set AppleScript's text item delimiters to opentag
	end repeat
	
	return itemlist
end parsecode

In this format it works fine. When you return thiscontent it is indeed the content of the url. However, the first link in most of the pages I have returned is just the rss link I originally parsed. So I want to skip it.

When I rewrite the script like below, adding a conditional to get the value of c and ignore portion of the script if it is 2 (for some reason I get the same link twice at the start of the NYTMag feed) thiscontent returns with no value. Without variables, if I hard code the url, it works fine. (Like this: set thiscontent to do shell script “curl http://www.nytimes.com/2008/12/07/magazine/07cuba-t.html?partner=rss&emc=rss”) Is there some formatting issue I am missing? I’ve tried setting thisurl to string and text but same results.



set rsslist to {"http://www.nytimes.com/services/xml/rss/nyt/Magazine.xml"}

repeat with r from 1 to count of every item of rsslist
	set rsscontent to do shell script "curl " & item r of rsslist
	set contentlist to parsecode(rsscontent, "link") as list
	repeat with c from 1 to count of every item of contentlist
if c > 1 then
		set thisurl to item c of contentlist
		set thiscontent to do shell script "curl  " & thisurl as text
		return thiscontent
end if
	end repeat
	
end repeat



on parsecode(code, tag)
	set opentag to "<" & tag & ">"
	set closetag to "</" & tag & ">"
	set itemlist to {}
	set AppleScript's text item delimiters to opentag
	set taglist to every text item of code as list
	set childtaglist to {}
	repeat with x from 2 to count of every item of taglist
		copy item x of taglist to end of childtaglist
	end repeat
	
	repeat with thisitem in childtaglist
		set AppleScript's text item delimiters to closetag
		copy text item 1 of thisitem to end of itemlist
		set AppleScript's text item delimiters to opentag
	end repeat
	
	return itemlist
end parsecode

RIRedinPA · December 10, 2008, 3:21pm

I am at least getting to the page -i in curl returns the following header. The URL is a redirect which takes you to the url I am passing in my script.

"HTTP/1.1 301 Moved Permanently
Server: Sun-ONE-Web-Server/6.1
Date: Wed, 10 Dec 2008 15:14:35 GMT
Content-length: 0
Content-type: text/html
Location: http://www.nytimes.com/glogin?URI=http://www.nytimes.com/2008/12/07/magazine/07cuba-t.html&OQ=_rQ3D1Q26partnerQ3Drss&OP=276883e7Q2FQ7CsHZQ7CYlQ20e3ll@bQ7CbyyQ23Q7C6bQ7CyQ22Q7C1k,k-MQ3AHQ7CyQ22Q208Zkv@Q60Q3B@1n

John_M · December 10, 2008, 3:31pm

Hi,

Use cURL with the -L option to follow redirects.

Change the second ‘do shell script’ line in your script to:

set thiscontent to do shell script "curl -L " & thisurl as text

Best wishes

John Maisey

RIRedinPA · December 10, 2008, 3:32pm

I can’t seem to pull anything up in terminal either.

RIRedinPA · December 10, 2008, 3:35pm

John M:

Hi,

Use cURL with the -L option to follow redirects.

Change the second shell script line in your script to:
set thiscontent to do shell script "curl -L " & thisurl as text
Best wishes

John Maisey

I tried that and it keeps locking up script editor.

John_M · December 10, 2008, 3:40pm

Perhaps it’s a network issue? Your second script works for me with the modification.

RIRedinPA · December 10, 2008, 3:45pm

A network problem here wouldn’t be the first time.

I’m going to see if I can find another feed to test on and then revisit this site. Thanks for the help.

John_M · December 10, 2008, 3:51pm

You could use the -m option with cURL. You’d have to handle failed calls though
From the cURL manual:

Best wishes
John Maisey