Find & Replace script problem with TextEdit

Hi there

This is my first post in a scripting forum… evah!

So… I’m still on 10.3, and I tried to make a script that helps me translate certain recurring expressions & words from English texts into Romanian. I managed to solve only half of the problem. Here is the (abridged) script:

-- TRANSLATOR HELPER 4 TEXTEDIT.

tell application "TextEdit"
	tell text of document 1
		 
		set every word where it is "The publication" to "Publicarea"
		set every word where it is "the publication" to "publicarea"
		set every word where it is "publication" to "publicare"
		
		beep
		display dialog "            Uh!... I replaced all I could :P"
		
	end tell
end tell

Unfortunately, I can’t get it to take into account the string of two or more words (that is, The publication) before the single word (publication). I tried replacing ‘word’ with ‘string’, then with ‘text’, but to no avail. I have no idea. What can i do? :expressionless:

bogdan

Model: eMac
Operating System: Mac OS X (10.3.9)

I think you’d be better off with text item delimiters:


set T to "This is the publication of choice."
considering case
	set AppleScript's text item delimiters to "the publication"
	set temp to text items of T --> {"This is", "of choice"}
	set AppleScript's text item delimiters to "publicaria"
	set N to (temp as string)
	set AppleScript's text item delimiters to ""
end considering
N --> "This is publicaria of choice"

For a large number of such replacements, something like this handler:


set tt to "This isn't a test."
findAndReplace("isn't", "is", tt)

on findAndReplace(toFind, toReplace, theText)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to toFind
	set textItems to theText's text items
	set AppleScript's text item delimiters to toReplace
	tell textItems to set editedText to beginning & toReplace & rest
	set AppleScript's text item delimiters to astid
	return editedText
end findAndReplace

Which leads to:



set theText to "The publication is new
but it is the publication of choice
for a publication of this sort"


considering case
	set findList to {"The publication", "the publication", "a publication"}
	set replaceList to {"Publicarea", "publicaria", "publicare"}
	repeat with k from 1 to count findList
		set theText to findAndReplace(item k of findList, item k of replaceList, theText)
	end repeat
end considering
theText
(*"Publicarea is new
but it is publicaria of choice
for publicare of this sort"*)

on findAndReplace(toFind, toReplace, theText)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to toFind
	set textItems to theText's text items
	set AppleScript's text item delimiters to toReplace
	tell textItems to set editedText to beginning & toReplace & rest
	set AppleScript's text item delimiters to astid
	return editedText
end findAndReplace

edit: Ahhh, I see someone has already beat me to it with a much better answer…

I experimented with this and couldn’t get the one line approach to work. It seem like you can reference a single character, a single word, or a single paragraph. A string of words doesn’t seem directly able to be referenced.

I tried to make something work. I got this far:

tell application "TextEdit"
	tell text of document 1
		repeat with x from 1 to the count of words
			try
				if word x = "The" then
					if word (x + 1) = "publication" then
						set word x to "Publicarea"
						set word (x + 1) to ""
					end if
				end if
			end try
			try
				if word x = "the" then
					if word (x + 1) = "publication" then
						set word x to "publicarea"
						set word (x + 1) to ""
					end if
				end if
			end try
		end repeat
		
		set every word where it is "publication" to "publicare"
		
		repeat with y from 1 to the count of characters
			try  --this gets rid of the double spaces resulting from the above
				if character y = " " then
					if character (y + 1) = " " then
						set character y to ""
					end if
				end if
			end try
		end repeat
		
	end tell
end tell

The only problem is that it doesn’t seem to recognize that “The” is different than “the”. Maybe you could add another check in where after it finds “The”, it looks at the fist character and checks it against it’s unicode number or ASCII number or something to confirm that it is a capital T.

Or maybe someone else out there has a simpler solution…

Model: G5 Tower (not Intel) - Script Editor ver 2.1.1
AppleScript: 1.10.7
Browser: Safari 419.3
Operating System: Mac OS X (10.4)

Sorry for the late reply… I’ve been pretty busy this last week. :smiley:

Thanks for the tips, but unfortunately the proposed scripts gave me an error…:confused:

Anyway, I realized the power of find & replace with grep in TextWrangler, so I got past this problem. I’m trying the same idea of scripting with TW, and it’s working much better.
The only remaining problem is convincing AppleScript to work with two specific Romanian Unicode characters found in the Latin Extended-B section/Additions to Romanian section on page 7 in the pdf file found here: http://www.unicode.org/charts/PDF/U0180.pdf.

I’ll take the same example:

-- TRANSLATOR HELPER 4 TEXTEDIT.

tell application "TextEdit"
   tell text of document 1
       
       set every word where it is "and" to "[Ux0219]i"
       
   end tell
end tell

I tried [Ux0219], [U+0219], [Utxt0219], [\0219], and some other variations, to no avail. (Yeah, I searched the help and forums also).

What’s the trick to make this character (and the others) appear? Please enlighten me.

Or the Apple Script in Tiger is better in this regard (i.e. no tricks needed, just plain input from the keyboard)?

bogdan

Model: eMac
Browser: Firefox 1.5.0.6
Operating System: Mac OS X (10.3.9)

This seems to display the correct symbol (not in the Script Editor) in a dialog box


property andSymbol : «data utxt0219» as Unicode text
display dialog andSymbol

It does not work in TextEdit however - there a question mark is displayed.

This works fine here, in TextEdit:

tell application "TextEdit" to tell document 1 to set every word where it is "and" to «data utxt02190069»

What font/encoding preferences are you using?

Didn’t think of that approach - I was trying to copy andSymbol to the clipboard and then paste it. I don’t use TextEdit, so not familiar with it’s syntax. Glad to see it here. :slight_smile:

There shouldn’t be a problem with pasting Unicode text per se ” although, if you were using text item delimiters to manipulate the text before pasting, that would explain the behaviour. There’s a bug that currently causes problems when using text item delimiters with Unicode-only characters. To avoid this issue, a workaround (such as the technique used in the translate_phrases handler below) is necessary.

I suppose one might also consider a hybrid routine here, using text item delimiters to handle phrases and case-sensitive text ” and TextEdit to search and replace individual words. However, the example below doesn’t attempt to deal with issues like capitalisation at the beginning of sentences and paragraphs (which is left as an exercise for the reader).

I hope there aren’t too many folks around here who speak Romanian ” or there’s gonna be a whole lotta wincing goin’ on…

property phrase_list : {¬
	{"The publication", "Publicarea"}, ¬
	{"the publication", "publicarea"}, ¬
	{"a publication", "publicare"}, ¬
	{"of choice", ("de preferin" as Unicode text) & «data utxt01630103»}}

property word_list : {¬
	{"and", «data utxt02190069»}, ¬
	{"new", "nou"}, ¬
	{"for", "pentru"}, ¬
	{"of", "de"}, ¬
	{"it", "el"}, ¬
	{"is", "se afla"}, ¬
	{"this", "acest"}, ¬
	{"sort", "gen"}}

to translate_phrases from t
	set tid to text item delimiters
	considering case
		repeat with p in phrase_list
			set {text item delimiters, r} to p
			tell t's text items to set {t, l} to {beginning, rest}
			repeat with i in l
				set t to t & r & i
			end repeat
		end repeat
		set text item delimiters to tid
	end considering
	t
end translate_phrases

set t to "The publication is new and it is the publication of choice for a publication of this sort."

tell application "TextEdit"
	activate
	make new document with properties {text:return & t}
	display alert "Ready to translate this document?"
	tell document 1
		set its text to my (translate_phrases from its text)
		repeat with i in word_list
			set (every word where it is i's beginning) to i's end
		end repeat
	end tell
end tell

Well, I’ve been hammering away at the translation part of the script (more than a thousand phrases and word already down ;)), but for the script to be more “interactive” (i.e., to search only the equivalent of the words present in text, not for all 11200 words beginning with “a”, all the 7612 beginning with “b”, and so on), I managed to come up with this script runner that loads another script that translates all the words beginning with, say, “gre” (Great Britain, great, greenhouse, greenish, green, etc) if such a word is found in the text.

So, I have the text “Green apples are my favorites. Great Britain is a constitutional monarchy.” loaded in a TextWrangler window, then I run this script:

tell application "TextWrangler"
	
	set blocText to the text of document 1
	set nrWords to (count words of blocText)
	set currentWord to word 1 of blocText --that's "green"
set next_word to (a reference to word after currentWord)
	
	set myList to (every text item of blocText)
	set first3 to (characters 1 thru 3 of currentWord) --that's "gre"
	
	repeat with i from 1 to nrWords
		set first3ch to load script alias (((path to desktop as string) & "script-uri de lucru:" & first3 as string) & ".scpt")
		tell first3ch to run
	end repeat
	
	display dialog ("Initial text had " & nrWords & " words" as string) with icon 1
end tell

The text changes to “”[adj]verde apples are my favorites. [GP]Marea>Britanie is a constitutional monarchy."

So when it meets the word “green”, it gets the first 3 characters of it and loads the already prepared script titled “gre.scpt” (stored in Desktop/script-uri de lucru) that translates every word that begins with “gre”, including “Great Britain” (by the way, I need to tag the words with an identifier – [adj] or [noun] or [vb] – so that I can jugle them with grep later).

Trouble is I can’t make it jump to the next word (“apples” in this example) to make it load the “app.scpt”, and then jump at the next, and so on. Anybody wanna help with this (reference to word after currentWord) thing?

The same question I would have even if I would make queries to a CoreData db. (I have even less experience with that :/… beginner here :D)

Heck, I didn’t ever figure out a way to use Search-n-Replace using AppleScript directly, so you did better than I right off the bat! I ended-up UI-ing my way around it with this routine:


--find a phrase in TextEdit (use in TextEdit tell block)
on dataGrabber(search_string)
	tell application "System Events"
		tell process "TextEdit"
			-- move cursor to start of file
			keystroke (ASCII character 30) using {command down}
			
			--initiate Find
			keystroke "f" using {command down}
			delay small_delay --allow time for dialog to appear
			keystroke search_string --enter what to find
			keystroke return --start Find
			delay small_delay -- wait for find to complete
			keystroke "." using {command down} --close Find dialog
		end tell
	end tell

I generally use either 0.2 or 0.4 for the value of small_delay. Even on a G5, leaving them out causes the script to act whacko.

Then inside any TextEdit tell block:


set some_variable to my dataGrabber("Any phrase you want here")

These setup was used for parsing a form e-mail

Not sure if that helps or not…

Well, I kept reading the threads in the forums, learning, and I figured it out at last:

tell application "TextWrangler"
	
	set textBlock to the text of document 1
	set nrWords to (count words of textBlock) --gets the number of words in the text
	set myTID to ""
	set wordList to words of textBlock -- gives a list of all the words
	
	set AppleScript's text item delimiters to myTID 
	
	repeat with myItem in wordList
		set nrRepeats to 1
		
		set searchNr to "([0-9]+)"
		set s_options to {search mode:grep, starting at top:true, wrap around:false, backwards:false, case sensitive:false, match words:false, extend selection:false}
		if found of (find searchNr searching in text window 1 options s_options with selecting match) then
			set foundNr to contents of selection
			set numerale to load script alias ((path to desktop as string) & "script-uri de lucru:primele3:03. numere.scpt")
			tell numerale to run
		end if
		
		if myItem is not in {"a", "à ", "á", "â", "ã", "ä", "Ã¥", "ā", "Ä…", "ă", "æ", "â", "b", "c", "d", "e", "é", "f", "g", "h", "i", "î", "j", "k", "l", "m", "n", "o", "ö", "p", "q", "r", "s", "ÅŸ", "È™", "t", "Å£", "È›", "u", "û", "ü", "Å«", "v", "w", "x", "y", "z", "$", "¢", "£", "¥", "฿", "€", "%", "?", "!", "&", "AM", "an", "as", "at", "do", "in", "is", "no", "of", "on", "PM", "so", "to", foundNr} then
			set first3 to (characters 1 thru 3 of myItem)
			set first3ch to load script alias (((path to desktop as string) & "script-uri de lucru:primele3:" & first3 as string) & ".scpt")
			tell first3ch to run
		end if
	end repeat
	
	set adjsubst to load script alias ((path to desktop as string) & "script-uri de lucru:primele3:02. adjective & substantive.scpt")
	tell adjsubst to run
	
	set ff to load script alias ((path to desktop as string) & "script-uri de lucru:primele3:04. finisare finala.scpt")
	tell ff to run
	
	display dialog ("Initial text had " & nrWords & " words" as string) with icon 1 giving up after 120
end tell

Woo-hoo! Macscripter is really helpful. Thank you all!:slight_smile: The only remaining problem is that it searches for a “bri.scpt” (from “Marea>Britanie”), and I didn’t yet figured out how to skip that.

I have yet another question: Perl comes standard on all Macs. Is it possible to use perl commands for grepping the text with TextEdit, even though it doesn’t support grep by itself? The only shortcoming with Textwrangler is that it misses the formatting of the text (all the bold, italics and size differences are gone). I’m already pretty good with regular expressions, so what do you think? Which is the best route?

p.s. I found out why some of the scripts suggested at the beginning of this thread didn’t work on my Mac. I was still on Panther, that’s why. Sorry :confused: