Parsing webpage text and making a list

I want to parse text from a web page and create a list delimited by paragraph returns. I have most of it done after finding various snips of code on this site. :slight_smile:

I have one thing I have not been able to sort out though. I cannot get the list to work. It doesn’t delimit on the returns, it just makes a list with all the text as one list item. I know my code is okay, because if I try to build the list using any character from the text other than the return it works.

Here is my code:

set returnVar to (ASCII character 13)
set temp to ((path to desktop as Unicode text) & “html_test.txt”)
set Ptemp to quoted form of POSIX path of temp
do shell script “curl http://www.herpjournal.com/ -o " & Ptemp
do shell script “textutil -format html -inputencoding iso-8859-1 -convert txt -encoding UTF-16 " & Ptemp
set theText to (read file temp as Unicode text)
set AppleScript’s text item delimiters to {returnVar}
set theList to text items of theText as list
set AppleScript’s text item delimiters to {””}

Any ideas?

Thanks!!!

I did just notice that when I watch the result window in Script Debugger the returns show up as “\n”, not “\r”. I suppose that is the problem…

Researching now…

Edit - That did it, sorry for the premature post.

Applescript will treat \n and \r as returns, so you could do this:

set x to "testing\nOne\rTwo\nThree\rFour"
set y to every paragraph of x as list

That way you don’t have to worry about what kind of line endings you have. :wink:

Schweeet!!

That fixed the new problem that popped up right after my last post.

Thanks!!