Saving full content of feed

crosspost: http://www.mac-forums.com/forums/showthread.php?t=120528
NetNewsWire RSS reader can save webpages (web pages which open after clicking on say “read more” of RSS feeds) viewed through its internal browser. Is there a way to automate saving of all webpages?
If not NetNewsWire RSS reader, then any other feed reader…can it be done?

Hi chris2,

As far as I know NetNewsWire offers support for AppleScript and Automator, so chances are good that you can save webpages automatically. I don’t have the application installed, but you can look at its AppleScript dictionary by dropping NetNewsWire’s application icon onto the Script Editor.

But you can also use curl to download RSS feeds and then process them according to your very own requirements:


set feedurl to "http://images.apple.com/main/rss/hotnews/hotnews.rss"
set command to "curl " & quoted form of feedurl
set feedsource to do shell script command
set feedlines to paragraphs of feedsource
repeat with feedline in feedlines
	if feedline begins with "<link>" then
		set newsurl to (characters 7 through -8 of feedline) as Unicode text
		-- more code goes here
	end if
end repeat

DEVONagent (and DEVONthink) also offers ways to process feeds and websites:


set feedurl to "feed://images.apple.com/main/rss/hotnews/hotnews.rss"
tell application "DEVONagent"
	set feedsource to download markup from feedurl
	set feeditems to get items of feed feedsource
	repeat with feeditem in feeditems
		set articleurl to link of feeditem
		set articlesource to download markup from articleurl
		-- more code goes here
	end repeat
end tell

Thanks for the help. DEVONagent is not free.

set feedurl to "http://images.apple.com/main/rss/hotnews/hotnews.rss"
set command to "curl " & quoted form of feedurl
set feedsource to do shell script command
set feedlines to paragraphs of feedsource
repeat with feedline in feedlines
   if feedline begins with "<link>" then
       set newsurl to (characters 7 through -8 of feedline) as Unicode text
       -- more code goes here
   end if
end repeat

– more code goes here
what code to add here ?

if there was a way to do it through Mail it would be great because curl is all command line------scares me

As far as I understood it you want to save the webpages into files. But I do not know where you want to save them, which naming scheme you are using and so on. Therefore you will have to add code there that saves the source of the website into a file on your Mac. For example, you could tell Safari to open the URL and to save it for you, as Safari is quite scriptable. It’s up to you :wink:

Mail currently does not offer AppleScript support for managing feeds. But if the command line scares you, then you can also opt for URL Access Scripting, which is only a little bit more work:


on run
	set tmpfilepath to my gettmpfilepath()
	tell application "URL Access Scripting"
		set feedsource to download "http://images.apple.com/main/rss/hotnews/hotnews.rss" to tmpfilepath
	end tell
	set filecont to read file tmpfilepath
	set command to "rm " & quoted form of POSIX path of tmpfilepath
	do shell script command
	set feedlines to paragraphs of filecont
	-- deleting the temp file
	repeat with feedline in feedlines
		if feedline begins with "<link>" then
			set newsurl to (characters 7 through -8 of feedline) as Unicode text
			-- more code goes here
		end if
	end repeat
end run

on gettmpfilepath()
	set tmpfolderpath to (path to temporary items folder from user domain) as Unicode text
	repeat
		set randnum to random number from 10000 to 99999
		set tmpfilename to randnum & ".html"
		set tmpfilepath to tmpfolderpath & tmpfilename
		try
			set tmpfilealias to tmpfilepath as alias
		on error
			exit repeat
		end try
	end repeat
	return tmpfilepath
end gettmpfilepath

thanks Martin.

i used the URL access scripting script
i replaced this (path to temporary items folder from user domain) with “Users/chris/Downloads/feed”
did not change anything else in the script

i get the error “Bad name for file. Users/chris/Downloads/feed13377.html”

Hi,

AppleScript works only with colon separated paths, starting with the name of the (start) volume

"Mac HD:Users:chris:Downloads:feed:"

or, regardless of the user name

((path to home folder as text) & "Downloads:feed:")

Note: If you use a POSIX (slash separated) path, it must start with a slash

thanks StefanK
in “more code goes here” i wanted to add some code that could download the web page in my /users/chris/downloads/feed folder…so i thought of just seeing if they open properly before making them save themselves in feed folder.

i added this in “more code goes here”

tell application "Safari"
	open feedline
end tell

some 20 weird URL like file:///%3Clink%3Ehttp/::www.apple.com:iphone:softwareupdate:%3Fsr=hotnews%3Fsr=hotnews.rss%3C:link%3E
file:///%3Clink%3Ehttp/::www.apple.com:trailers:independent:theluckyones:%3Fsr=hotnews%3Fsr=hotnews.rss%3C:link%3E
opened up in Safari with the error

".
(address different at each place, obviously)

:lol:

AppleScript works only with HFS paths (colon separated), but of course literal URL’s won’t be changed,

www.apple.com/iphone/softwareupdate/.

PS: The shell works only with POSIX paths (slash separated)

in simple words, u mean the above script is not the solution yet

It’s probably the solution, but please don’t mix up the HFS paths (like your download folder)
and the POSIX paths (like a URL)

this works on my machine except iTunes store links


on run
	set tmpfilepath to my gettmpfilepath()
	tell application "URL Access Scripting"
		set feedsource to download "http://images.apple.com/main/rss/hotnews/hotnews.rss" to tmpfilepath
	end tell
	set filecont to read file tmpfilepath
	set command to "rm " & quoted form of POSIX path of tmpfilepath
	do shell script command
	set feedlines to paragraphs of filecont
	-- deleting the temp file
	repeat with feedline in feedlines
		if feedline begins with "<link>" then
			open location (characters 7 through -8 of feedline) as Unicode text -- this opens each URL with the default browser
		end if
	end repeat
end run
.

i googled around…saw this…http://docs.info.apple.com/article.html?path=AppleScript/2.1/en/as208.html
and accordingly did this

i get the error

i m sorry…if i m annoying you…maybe i will learn some more of applescript and then come back

you need the handler, which I skipped for saving space and indicated with “.”
Here’s the whole script


on run
	set tmpfilepath to my gettmpfilepath()
	tell application "URL Access Scripting"
		set feedsource to download "http://images.apple.com/main/rss/hotnews/hotnews.rss" to tmpfilepath
	end tell
	set filecont to read file tmpfilepath
	set command to "rm " & quoted form of POSIX path of tmpfilepath
	do shell script command
	set feedlines to paragraphs of filecont
	-- deleting the temp file
	repeat with feedline in feedlines
		if feedline begins with "<link>" then
			open location (text 7 through -8 of feedline) -- this opens each URL with the default browser
		end if
	end repeat
end run

on gettmpfilepath()
	set tmpfolderpath to (path to temporary items folder as text)
	repeat
		set randnum to random number from 10000 to 99999
		set tmpfilename to randnum & ".html"
		set tmpfilepath to tmpfolderpath & tmpfilename
		try
			set tmpfilealias to tmpfilepath as alias
		on error
			exit repeat
		end try
	end repeat
	return tmpfilepath
end gettmpfilepath

Note: path to temporary items is used only for the temporary file, which will be deleted afterwards