How do you get the second occurance of a word or paragraph

The word documents I have been given have three seperate offers in them, divided by Page Break. I have been getting the text by

		set firstPart to (get every paragraph whose third word is "interest")
		set secondPart to (get every paragraph whose first word is "Whatever")
		set thirdPart to (get every paragraph whose first word is "client")
		set fourthPart to (get last paragraph)

But now the information appears three times, though the information in each occurance is slightly different, depending on the offer. How do I tell TextEdit to get the second occurance or the third occurance of the paragraph instead of every paragraph?

Are you scripting TextEdit to do this or reading the text into an AppleScript and doing it?

Try something like this, skip:

tell application "TextEdit" to tell document 1
	set firstPart to third paragraph whose third word is "interest"
	set secondPart to first paragraph whose first word is "Whatever"
	set thirdPart to second paragraph whose first word is "client"
	set fourthPart to last paragraph
end tell

Or:

tell application "TextEdit" to tell document 1
	set firstPart to paragraph 3 whose word 3 is "interest"
	set secondPart to paragraph 1 whose word 1 is "Whatever"
	set thirdPart to paragraph 2 whose word 1 is "client"
	set fourthPart to paragraph -1
end tell

Sorry Adam - didn’t realise you were here when I threw in that response. Because of the conditional filters, I figured the former. :slight_smile:

Kai, thank you very much, that works fantastic. I thought I was sunk when the copyediting crew threw me that curve.

Hi,

Darn. I just spent a couple of hours trying to figure out why this wouldn’t work.

Here’s the text in the TextEdit front document:

The rain in Spain.

The quick brown fox.

Here’s the script:

tell application “TextEdit”
first paragraph of front document where second word of it is “quick”
end tell

Here’s the puzzle. Why doesn’t this work?

gl,

Nobody? Here’s a solution I just thought of.

tell application “TextEdit”
first paragraph of front document whose words is not {} and second word is “quick”
end tell

gl,

No, that doesn’t work either, if a paragraph before the target paragraph contains one word.

The rain in Spain
and
The quick brown fox

Back to the drawing board.

I got this to work with this text in TE:

set ks to {}
tell application "TextEdit" to tell document 1
	repeat with k from 1 to count of paragraphs of it
		try -- so blank lines don't choke it.
			if words of paragraph k contains "quick" then
				set end of ks to k
			end if
		end try
	end repeat
end tell
ks --> {3, 7}

Also, if you put the try block in, your script works with the last line above changed to “I’m quick off the mark”:

tell application "TextEdit"
	try
		second paragraph of front document where second word is "quick"
	end try
end tell
--> "I'm quick off the mark"

This even works for the first text above:

tell application "TextEdit"
	try
		second paragraph of front document where second word is "quick" or third word is "quick"
	end try
end tell

:cool: I guess the rule is that if any of the lines are blank, then a try block is essential

Hi guys.

The main problem here may have more to do with text content, rather than with any particular scripting approach.

For example, text copied from a web page like this one, and then pasted into TextEdit, might look innocuous enough. But without manually retyping it (or crucial parts of it) in TextEdit, I get similar results to kel from his example - and Adam’s routine returns {1} instead of his {3, 7}.

Don’t you just love a mystery? :wink:

Looking at this a little more closely, I suspect the confusion is probably caused by the use of line separators instead of line endings.

While line separators may have a similar appearance to paragraph separators (LF/CR), they’re not treated as equivalent. They don’t, for instance, feature in AppleScript’s paragraph delimiters. So if they’re the only separator in town, the text will be considered to contain only a single paragraph (consisting of the entire text):

tell application "TextEdit" to count paragraphs of front document --> 1

Because of this, a conditional filter for matching paragraphs will generally return an empty list (unless, that is, the entire text matches the condition). So, having copied kel’s original text to TextEdit, I get this result:

tell application "TextEdit" to paragraphs of front document whose second word is "quick" --> {}

An attempt to isolate a single paragraph that meets the same condition will therefore not return a result - because no match exists:

tell application "TextEdit" to first paragraph of front document whose second word is "quick" --> [no result]

Of course, if we change the condition so that “rain” is the required match, we’ll get a result from this example - because that’s the second word of the entire text:

tell application "TextEdit" to first paragraph of front document whose second word is "rain" --> "The rain in Spain [etc...]

How can we get around this?

The most effective approach might be to replace every line separator with a line ending (since that’s what we’re expecting anyway). We should then be able to carry out any further operations in TextEdit as normal:

tell application "TextEdit" to tell front document
	set characters where it is «data utxt2028» to return
	first paragraph whose second word is "quick"
end tell

--> "The quick brown fox."

:slight_smile:

Hi Adam,

I guess that’s the only way to do it. It seems that TextEdit cannot deal with all types of references in the filter reference form. On thing I’m stuck on is why this doesn’t work:

tell application “TextEdit”
tell front document
set x to words of paragraph 2
first paragraph whose ¬
words is not {} and ¬
words is not x and ¬
second word is “quick”
end tell
end tell

on text:

the rain in spain
hello
the quick brown fox

It would be good if you could use something like:

first word is not last word

This is similar to the example in AppleScriptLanguageGuide.PDF under the fiter reference form section for the Scriptable Text Editor.

Have a good day,

I guess I don’t understand, Kai. When I copied the 4 lines at the top of my post directly from the web page, pasted them into a clean page in TE, and ran

tell application "TextEdit"
	try
		second paragraph of front document where second word is "quick" or third word is "quick"
	end try
end tell

it returned “I was quick off the mark” for me.

I can see that if the text has been “soft” returned by the window of the document, then neither my solution nor yours will get the first word, because to TextEdit it’s all one long string terminated by a return.

I am not able to repeat what you found, however, so haven’t been able to find a case where your last script worked - perhaps some instruction on how you got the text into TextEdit will help.

My TE defaults were set to rich text and, from kel’s original description, I assumed we were experiencing the same behaviour. Your script works as advertised, of course, with the defaults set to plain text. Apologies if I added to the confusion. :slight_smile:

But, but, but — so are my TE defaults! My “New Document Attributes” are set to rtf and the default text encodings are set to UTF-8 for both open and save. It still works. Are we having a 10.4.5, 10.4.6 moment?

Actually, it looks like another Safari issue (with a little TextEdit thrown in for good measure).

I can quite happily copy from Camino to a rich text document - and a script can see the individual paragraphs without a hitch.

When copying from Safari, however, a rich text document in TextEdit consistently returns a single paragraph to the script. Subsequently converting the TE document from rtf to plain doesn’t improve matters, either. The text needs to be either pasted into a new plain text document, or re-pasted (from source) into a converted document. [color=gray][i]( Copy methods so far have included keystrokes, main menu, contextual menu and drag & drop. Maybe I just haven’t tried hard enough to find the right alternative…)[/i][/color]

Now tell me you’ve been using Safari… :wink:

Nope. :slight_smile: :smiley:

BUT - Interesting (at least to me) the problem originates with the engine Safarii uses because Shiira, which uses the same engine, has the same problem.

Interestinginly too, however, Shiira doesn’t have the Safari "favicon-dragged-to-desktop = screwed up .webloc file problem as Safari has. They must have fixed it.

As a final word, I don’t normally use Shiira, but I keep it around and test stuff in it occasionally because it uses the same engine but not the same GUI implementation as Safari does.

To add further mystery, in the Sunrise Browser, both of these work when I copy the four lines from earilier:

tell application "TextEdit" to tell front document
	set characters where it is «data utxt2028» to return
	first paragraph whose second word is "quick"
end tell

--> "The quick brown fox.
"
tell application "TextEdit"
	try
		first paragraph of front document where second word is "quick"
	end try
end tell --> "The quick brown fox
"

Note that they both leave a return in the line

That was the whole point of the set characters… statement in my script; to replace soft returns with hard ones. (If there are no soft returns, the line does nothing.)

Both scripts do essentially the same thing. The first one should fix soft returns if it finds them, the second one won’t. The second one has a try statement (although I’m not sure what error it’s intended to trap). After that, what’s left in each case is a statement to return some text element matching given criteria. In spite of the syntactical variations, the event in both cases amounts to simply this: get paragraph 1 of document 1 whose word 2 = “quick”. If such a paragraph exists, it’s returned. If it doesn’t, nothing’s returned.

That’s TextEdit. While AppleScript doesn’t include return characters in a paragraph delimited list, some applications consider them as part of the ‘owning’ paragraph.

You know, I’ve just been looking again at kel’s original question and script (message #6). If the returns in TextEdit are soft, his original script doesn’t work. If the returns are hard, it does. When I run my suggestion here, it fixes the soft return problem - and then goes on to do the paragraph stuff. So I guess I really don’t get what the problem was…

Ah well - time to call it a day, methinks. :slight_smile:

The try was necessary because blank lines (two returns in a row) errored - there were no words.

Agree we’ve worked this to death, but for me, it’s been an education. Thanks for the explanation.

AB