Reading html text with line breaks

What I would like to do is to modify the text selected in a browser window. (I’m using Firefox 30.0.)

My issue is that, in the text that gets copied into the clipboard, any number of sequential
s are treated as a single carriage return character (id 13). (I don’t have any control over whether
s or

s are used on the page.)

For example, with selected html text

[b]Text

more text[/b]

(where the line break is generated by

), if I write
tell application “System Events” to keystroke “c” using command down
delay 0.03
set theText to text of (the clipboard)

then character 5 of theText is “{carriage return}” and character 6 of theText is “m”, whereas I would like character 6 of theText to be that second carriage return (or line feed).

I need to be able to distinguish between Scenarios 1 and 2 below.

I notice that the situation is different in Chrome. Copying text to the clipboard results in one carriage return for each
. I need a solution that works for all browsers.

Any assistance much appreciated.


Scenario 1:
Line 1

Line 2

which renders as
Line 1
Line 2


Scenario 2:
Line 1


Line 2

which renders as
[b]Line 1

Line 2[/b]