How do I get a word in TextEdit that isn't in the same place every tim

skipdidthis · March 28, 2006, 11:04pm

I have a working applescript that finds a paragraph of copy based on the first word of that paragraph.
This works fine.

tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to (get every paragraph whose first word is "something")
	end tell
end tell

But I have one line of text I need to get that always begins with PQ- (as in PQ-FLW or PQ-MINT) and its position changes from document to document, as does its number of characters, so I can’t get it by its location.
This script seems logical and compiles, but it doesn’t return anything.

tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to get every word whose characters 1 thru 3 is "PQ-"
	end tell
end tell

I’ve tried it using “every word” and “every paragraph”. Can someone shed some light on this for me?

Bruce_Phillips · March 28, 2006, 11:27pm

Try something like this:

tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to paragraphs whose (first word is "PQ") and (third character is "-")
	end tell
end tell

kel · March 28, 2006, 11:45pm

Aren’t hyphenated words one word? If Bruce’s doesn’t work, then there are several other ways. You can use the ‘contained by’ property.

tell application “TextEdit”
tell front document
set the_paragraph to first paragraph where first word begins with “PQ”
end tell
end tell

Adam_Bell · March 28, 2006, 11:55pm

How is a line distinguished from a paragraph - do only paragraphs have returns as their last character and the lines are just wrapped by TextEdit? You use the word “line”, but I don’t know what a “line” is in your text.

Adam_Bell · March 29, 2006, 12:00am

If you just want to isolate any paragraph containing “PQ-” anywhere and there will be only one, then offset works:

set Text1 to "This is several lines of text
with a PQ-Something embedded in it.
PQ-other is another example
The text goes on."

set Text2 to "This text doesn't contain the
magic characters."

set char to "PQ-" as string

offset of char in Text1 --> 38

offset of char in Text2 --> 0

kai · March 29, 2006, 12:44am

You could try:

tell application "TextEdit" to tell document 1 to set textToCopy to paragraphs where it starts with "PQ-"

Adam_Bell · March 29, 2006, 1:11am

The challenge, though, is that if a paragraph (a group of sentences wrapped on the screen but containing only one return character - i.e. a big string) is part of a group of paragraphs and contains the characters PQ-ArbitraryText, what’s the easiest way to identify that paragraph. As I interpreted his quandary, the paragraph doesn’t necessarily start with those letters. The first part of his post is a red herring - it works. It seemed to me that the second is the question I’ve posed. TIDs could find out, but there must be a simpler way. If it truly is just a string, then contains works. I’ll await clarification.

kel · March 29, 2006, 1:38am

Yes, you may be right Adam.

You could get the paragraph that contains a word beginning with PQ- with somehting like this:

set t to “hello
The rain in Spain falls mainly in the plain.
The quick brown fox PQ-Mint jumped over the lazy dog.
bye”

set o to offset of “PQ-” in t
set sub_t to text 1 thru o of t
set c to (count paragraphs of sub_t)
set the_paragraph to paragraph c of t

Of course to get the text in TextEdit

tell app “TextEdit” to set t to text of front document.

Edited: no that wouldn’t work because “PQ-” may not be the first 3 characters.

hmmm

kel · March 29, 2006, 1:47am

Ok this should do it if the word may not be the first word.

tell application “TextEdit”
tell front document
set the_word to first word where it begins with “PQ-”
end tell
set t to text of front document
end tell
set o to offset of the_word in t
set sub_t to text 1 thru o of t
set c to (count paragraphs of sub_t)
set the_paragraph to paragraph c of t

kai · March 29, 2006, 1:51am

The inference that I (and apparently others) drew, differs slightly. But I agree with your final point - we can speculate about different interpretations until the cows come home (or the OP clarifies the position). I was merely offering a suggestion that followed an existing line of reasoning - but that had not yet been specifically covered.

kel · March 29, 2006, 2:01am

My last script still wasn’t right. Now I’ve got it. This gets the paragraph that contains some word beginning with “PQ-”

tell application “TextEdit”
set the_word to first word of front document where it begins with “PQ-”
set p to first paragraph of front document where some word of it is the_word
end tell

Edited: now to crunch it up:

tell application “TextEdit”
set p to first paragraph of front document where some word of it begins with “PQ-”
end tell

Adam_Bell · March 29, 2006, 4:31am

So Kai solves the problem neatly if the paragraph starts with “PQ-”, and Kel does if the problem is as I saw it. Neat both ways, but I think Kel solves it no matter what. Nice.

kai · March 29, 2006, 7:31pm

Not here, strangely enough. OMM, AppleScript appears to regard hyphens as word separators in Unicode text, and connectors in plain or international text. Since my text comes from TextEdit as Unicode text, the script sees no words starting with “PQ-”, but will readily identify any that are “PQ”.

Nevertheless, Kel’s script evidently works for both Kel and Adam, so I need to do some more homework to identify why my setup (Mac OS X 10.4.5, AS 1.10.3, TextEdit 1.4) should behave differently. In this situation, though, a reasonably effective method is to pull the text into AppleScript and perform any extractions there, instead:

on words_started by s from t
	set d to text item delimiters
	set text item delimiters to s
	set t to rest of t's text items
	repeat with i in t
		tell i's word 1 to if it is i's text 1 thru word 1 then
			set i's contents to {""} & it as Unicode text
		else
			set i's contents to false
		end if
	end repeat
	set text item delimiters to d
	t's every Unicode text
end words_started

words_started by "PQ-" from text of document 1 of application "TextEdit"

Adam_Bell · March 29, 2006, 7:33pm

I mislead you, Kai. I didn’t try it with text from TextEdit - I just used an AppleScript variable with a bunch of dummy stuff and PQ-stuff stuck in the middle. Sorry. Didn’t test it with Unicode text.

skipdidthis · March 30, 2006, 4:34am

Thanks for all the help, its a little overwhelming. Allow me to clarify my quest.
My text document contains info for each customers letter and two-sided insert. So when I pull the copy I need into the insert I don’t need all the copy, just a couple of paragraphs and a code that is PQ- (for all customers) proceeded by the unique client code. So, the code is PQ-FLW or PQ-MINT, etc, etc. The code is on different lines due to variances in the amount of text per client.
The way I am locating the text is by having the copywriter put a full return before every paragraph as well as the code. The code is on a line by itself, but not the same line from document to document.

My solution is to have the copywriter put the code as the very last line of text, I can get that.

I am still interested, however in being able to locate that darned code by the PQ- which is common to all codes, and then select the entire text on that line. Then it won’t matter where the code falls.

Adam_Bell · March 30, 2006, 2:49pm

Can’t you dummy up a sample of the text you start with for us? (by dummy, I mean remove sensitive info).

kel · March 30, 2006, 4:06pm

That’s strange. I can’t get TextEdit to return text that treats hyphens as word separators. It must be the version 1.2 unless it’s some setting on my computer for default text encoding.

Edited: if I run this in the Script Editor:

set t to “hello-bye” as Unicode text
words of t

I get one word of unicode text.

Krioni · March 30, 2006, 4:35pm

I never trust “words” to return the same thing, or even what I expect. I view the words command as something you use when you don’t care about precise work, just returning a simple bit of text whose precision is unimportant. In other words, I almsot never use it.

What I use when I want to extract a substring is a custom handler I wrote called getTextBetween. Now, it lets you have source text and look for the first occurrence of some starting text and find everything from there up to some ending text. One important note: it does NOT return the “starting text” as part of the result - you’d have to add that back on yourself. In the example I included, if you wanted “Contact:Jeff Robertson” you need to put the “Contact:” back on to the beginning of what this returns. Also in my example, note that you have multiple places where there is text between “:” and a return. The handler would actually let you specify to get other occurrences. Since what the handler does is split the text up into pieces, the first “between” text is the 2nd text item. So, you wanted to get “856-321-4567” in my example, you call the handler like this:


set someText to "Contact:Jeff Robertson" & return & "Phone:856-321-4567" & return & "Fax:856-321-9999"

getTextBetween({sourceText:someText, beforeText:":", afterText:return})
--> "Jeff Robertson"

on getTextBetween(prefs)
	-- version 1.4, Daniel A. Shockley <http://www.danshockley.com>
	-- gets the text between specified occurrence of beforeText and afterText in sourceText
	-- the default textItemNum should be 2
	set defaultPrefs to {textItemNum:2}
	
	if (class of prefs is not list) and (class of prefs is not record) then
		error "getTextBetween FAILED: parameter should be a record or list. If it is multiple items, just make it into a list to upgrade to this handler." number 1024
	end if
	if class of prefs is list then
		if (count of prefs) is 4 then
			set textItemNum of defaultPrefs to item 4 of prefs
		end if
		set prefs to {sourceText:item 1 of prefs, beforeText:item 2 of prefs, afterText:item 3 of prefs}
	end if
	set prefs to prefs & defaultPrefs -- add on default preferences, if needed
	set sourceText to sourceText of prefs
	set beforeText to beforeText of prefs
	set afterText to afterText of prefs
	set textItemNum to textItemNum of prefs
	try
		set oldDelims to AppleScript's text item delimiters
		set AppleScript's text item delimiters to the beforeText
		set the prefixRemoved to text item textItemNum of sourceText
		set AppleScript's text item delimiters to afterText
		set the finalResult to text item 1 of prefixRemoved
		set AppleScript's text item delimiters to oldDelims
		
	on error errMsg number errNum
		set AppleScript's text item delimiters to oldDelims
		-- 	tell me to log "Error in getTextBetween() : " & errMsg
		set the finalResult to "" -- return nothing if the surrounding text is not found
	end try
	
	return finalResult
	
end getTextBetween

You can also call it with a simple list of three items as the parameter:
getTextBetween({someText, “:”, return})

where the list is {source, before, after}

skipdidthis · March 31, 2006, 4:41pm

That becomes a little too troublesome for what I need, so I chose to have my copywriter place the elusive text at the end of the word document and I can get it as “last paragraph”, clean and sweet. Here’s my script.


--this gets the word file and flows the text into the text box--
tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to (get every paragraph whose first word is "Pay")
	end tell
end tell

--this opens the Quark file and flows the text into the text box--
tell application "Finder"
	activate
	open document file "test_file.qxd" of folder "Desktop" of folder "me" of folder "Users" of startup disk
end tell
tell application "QuarkXPress"
	tell document 1
		set story 1 of text box 1 to textToCopy
	end tell
end tell

tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to (get every paragraph whose first word is "Whatever")
	end tell
end tell

tell application "QuarkXPress"
	tell document 1
		set story 1 of text box 2 to textToCopy
	end tell
end tell

--This particular text field doesnt get printed on press, but it identifies this piece by the customer code on the print-outs on page 1(there are over 100 customers). The customer code doesn't appear by itself anywhere in the word document but it does appear in the email address. So I flow the email address into the Quark text box and have Quark delete the email info that preceeds the customer code number -- 
tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to (get every paragraph whose first word is "clientname")
	end tell
end tell
 
tell application "QuarkXPress"
	tell document 1
		set story 1 of text box 3 to textToCopy
		delete (characters 1 thru 11) of story 1 of text box 3
	end tell
end tell

--here is where I get the "PQ-blahblah" code that I had the copywriter move to the last line so I could find it.
tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to (get last paragraph)
	end tell
end tell

tell application "QuarkXPress"
	tell document 1
		tell page 2
			set story 1 of text box 1 to textToCopy
		end tell
	end tell
end tell

--this text is where I insert the actual email address. I use it below again just to seperate out the customer code--
tell application "TextEdit"
	activate
	tell document 1
		set textToCopy to (get every paragraph whose first word is "clientname")
	end tell
end tell

--This particular text field doesnt get printed on press, but it identifies this piece by the customer code on the print-outs on page 2. The customer code doesn't appear by itself anywhere in the word document but it does appear in the email address. So I flow the email address into the Quark text box and have Quark delete the email info that preceeds the customer code number -- 
--this opens the Quark file and flows the text into the text box
tell application "QuarkXPress"
	tell document 1
		tell page 2
			set story 1 of text box 2 to textToCopy
		end tell
	end tell
end tell
tell application "QuarkXPress"
	tell document 1
		tell page 2
			set story 1 of text box 3 to textToCopy
			delete (characters 1 thru 11) of story 1 of text box 3
		end tell
	end tell
end tell
tell application "TextEdit"
	close document 1
end tell

--this action locates the Quark file opened above for Quark to access
tell application "Finder"
	activate
	open document file "Test_file.qxd" of folder "Desktop" of folder "dskipworth" of folder "Users" of startup disk
end tell

--this action opens a dialog box for the user to name the file. Then it saves the file in the appropriate folder and closes it
tell application "QuarkXPress"
	activate
	get document 1
	set doc to document 1
	display dialog ("save doc as") default answer "8483_" with icon note
	set nom to text returned of result
	save doc in ("Macintosh HD:Users:me:Desktop:FileTarget:" & nom)
	close document 1
end tell

Now, here’s my next delimna. I am using a drag & drop to drag each Microsoft Word document onto the script.
In order for TextEdit to get the text and flow it into Quark I need to get it into Plain Text. I can either save the text to a new TextEdit document or to the clipboard and pull it off in a script and put it into the Quark doc without creating an itermediary TextEdit document.

Here’s what I have so far and I can’t seem to get the word doc text and do anything with it once I select it and get it highlighted. I think I need to send it to the clipboard but this script returns nothing


(*on open some_items
	tell application "TextEdit"
		open every item of some_items
	end tell
end open*)

--I need a script here to tell Microsoft Word to get this file which was drag & dropped thru the script above--

tell application "Microsoft Word"
	launch
	activate
	do Visual Basic "Selection.WholeStory"
	--this script to set the text to the clipboard returns nothing--
set the clipboard to selection as string
end tell
end tell
tell application "TextEdit"
	set the clipboard to «class ktxt» of ((the clipboard as text) as record)
	return (the clipboard)
	set document 1 to the clipboard
end tell

Adam_Bell · March 31, 2006, 6:00pm

I think Microsoft Word uses its own Clipboard rather than the System’s. I don’t have the latest Word, so can’t see how to get at theirs.