Visible text

My script fetches html-files using curl.

The problem is that i need only visible text and curl returns full html-file.

I need something like this but without Safari:

tell application "Safari" to get text of tab 1 of window 1

Cirno;

Before Arc90’s Readability plugin for Safari was released, they had a javascript to do roughly the same thing. I still use it for NetNewsWire. You might be able to modify it to suit your needs – it produces a text-only copy of the text in a web page to replace the page but I’ve long forgotten how it does it.

(*
The following JavaScript from Arc90 "makes reading on the web more enjoyable by removing the clutter around what you're reading" and the Readability site allows you to set Style, Size and Margin in the material. It is normally used by dragging a link from the Readability page to your Bookmarks Bar, and that puts the JavaScript below as the content of the bookmark. The page thus created in your browser has a link on it to return to the original source.

Visit "http://lab.arc90.com/experiments/readability/" for your own copy.

Because I'm an avid Quicksilver fan (but there are lots of alternatives), I prefer to set up a trigger to run the script below so I can enhance the readability of an article I'm looking at with a simple key combination. Although it cannot render every site I visit, it does remarkably well.
-- Note that the script below has been altered for readability. The original is all one string.
*)

set JS to "readStyle='style-novel';
	readSize='size-large';
	readMargin='margin-wide';
	_readability_script=document.createElement('SCRIPT');
	_readability_script.type='text/javascript';
	_readability_script.src='http://lab.arc90.com/experiments/readability/js/readability.js?x='+(Math.random());
	document.getElementsByTagName('head')[0].appendChild(_readability_script);
	_readability_css=document.createElement('LINK');
	_readability_css.rel='stylesheet';
	_readability_css.href='http://lab.arc90.com/experiments/readability/css/readability.css';
	_readability_css.type='text/css';
	_readability_css.media='screen';
	document.getElementsByTagName('head')[0].appendChild(_readability_css);
	_readability_print_css=document.createElement('LINK');
	_readability_print_css.rel='stylesh  eet';
	_readability_print_css.href='http://lab.arc90.com/experiments/readability/css/readability-print.css';
	_readability_print_css.media='print';
	_readability_print_css.type='text/css';
	document.getElementsByTagName('head')[0].appendChild(_readability_print_css);"

tell application "System Events" to set last_app to item 1 of (get name of processes whose frontmost is true)
if last_app is "Safari" then
	tell document 1 of application "Safari" to do JavaScript JS
else if last_app is "NetNewsWire" then
	tell document 1 of application "NetNewsWire" to do JavaScript JS
else
	beep 3
end if

Hello.

I also do think that the command utility textutil is up to do the conversion to plain text of the text contents.

Maybe you also can open the file with text edit, and then save it as plain text afterwords?

Just pointing to some alternatives.

Thanks

Your example script uses Safari and NetNewsWire. How i can tell my own AppleScript to run this javascript?

I don’t think you can, I’m sorry to say – do javascript must address a tab in Safari or NetNewsWire as I’ve discovered: here

Hello if you do like this, then you’ll end up with the text.

From a terminal window:

textutil -convert rtf your.html textutil -convert txt your.rtf

Thanks. It would be super cool if there would be a way to remove all extra stuff from webpage using AppleScripts same way than Readability do. Reading webpages is easier this way.

Hi,

as McUsr mentioned above, textutil is able to convert html formatted text to plain text
Try this


do shell script "curl http://macscripter.net/viewtopic.php?pid=157045 | textutil -stdin -stdout -format html -convert txt -encoding UTF-8 " -- convert html to txt


Thanks. I tried it and i can use it in one of my scripts, but i have one script which needs to have more easier to read output. Textutil output is too rough for this script.

then you need curl in conjunction with sed