html to text

Hi,

Notes app returns the ‘body’ property as html. I can’t remember how to change the html to plain text and rtf. Here’s a script for getting the body:

display dialog "Search:" default answer "word1 word2"
set user_text to text returned of result
set search_words to words of user_text
set found_notes to {}
tell application "Notes"
	repeat with this_word in search_words
		set temp_notes to (name of every note of folder "Scripts" whose body contains this_word)
		repeat with this_note in temp_notes
			if this_note is not in found_notes then
				set end of found_notes to (contents of this_note)
			end if
		end repeat
	end repeat
	-- process found_notes
	if found_notes is not {} then
		set script_as_html to (body of note (item 1 of found_notes))
	end if
-- etc.
end tell

The “Scripts” folder in my Notes app contains scripts and I’m trying to find a script that contains any word entered in the dialog.

I don’t think it was curl.

Edited: this is just a rough draft. Still thinking about it.

Thanks,
kel

Hi,

/usr/bin/textutil can do it

Hi Stefan,

That sounds familiar. I’ll look at the man page.

Thanks a lot,
kel

Hi Stefan,

That wasn’t it. textutil only converts files if I read the man page right.

I know there was a simple way to convert html text to its text.

Thanks,
kel

textutil can also read from stdin


set htmlText to "<html><body>This is a text</body></html>"
set htmlSource to do shell script "echo " & quoted form of htmlText & " | textutil -stdin -stdout -format html -convert txt -encoding UTF-8"


Wow, that’s amazing! I can’t see how you know that you must need all those options from the man page. That still doesn’t look familiar, but it works.

Edited: oh I see now. The ‘html -convert txt -encoding UTF-8’ is formatting.

Thanks a lot,
kel

One last question. If you wanted the output encoding to be rtf, then what would you use besides UTF-8 and how do you know what all the output encodings are?

Thanks,
kel

the man page says

-encoding IANA_name | NSStringEncoding

http://www.iana.org/assignments/character-sets/character-sets.xhtml

The NSStringEncoding representations are described in the NSString Class Reference

PS:

rtf is a format, not an encoding

I found on the internet:

txt, html, rtf, rtfd, doc, wordml, or webarchive

These don’t look like parameters. I don’t know why they didn’t mention UTF-8 (maybe because it is the default). I’ll check out NSString Class Reference.

Thanks a lot,
kel

read the man page, encoding and format are two different things

-convert fmt Convert the specified files to the indicated format and
write each one back to the file system.

.

fmt is one of: txt, html, rtf, rtfd, doc, docx, wordml,
odt, or webarchive

I can’t find the site I got that list from. Maybe it’s from a different unix. The main thing is getting the text though.

Thanks,
kel

So what you’re saying is that you need to write to file in order to convert to rtf?

no, instead of

-convert txt

use

-convert rtf

textutil -stdin -stdout -format html -convert txt -encoding UTF-8" means:

read text from stdin with format html, convert it to format txt using encoding UTF8 and write it to stdout

Hi Stefan,

I see it now! And one other thing I found is that the parameters can be written in many ways. I tried it with lower case utf8 and it works according to the NSString Class Reference encodings.

I was wondering why your code was in that order. Now I understand.

Thanks,
kel

It works!

set htmlText to "<html><body>This is a text</body></html>"
set htmlSource to do shell script "echo " & quoted form of htmlText & " | textutil -stdin -stdout -format html -convert rtf -encoding UTF-8"

Thanks a lot Stefan!

normally the options are treated as key/value pairs so the order doesn’t matter

Ah, they’re in key value pairs. I get that mixed up with flags.

Here’s the rough script (hardly any error checking) if anybody wants to use it.

display dialog "Search:" default answer "word1 word2"
set user_text to text returned of result
set search_words to words of user_text
set found_notes to {}
tell application "Notes"
	repeat with this_word in search_words
		set temp_notes to (name of every note of folder "Scripts" whose body contains this_word)
		repeat with this_note in temp_notes
			if this_note is not in found_notes then
				set end of found_notes to (contents of this_note)
			end if
		end repeat
	end repeat
	-- process found_notes
	if found_notes is not {} then
		set script_html to (body of note (item 1 of found_notes))
		set script_text to do shell script "echo " & quoted form of script_html & " | textutil -stdin -stdout -format html -convert txt -encoding utf8"
	end if
end tell

Note that it only searches in a folder called “Scripts” in the Notes app and, as is, only reads the first note.

Thanks to Stefan!

gl,
kel