relative text selection with TextEdit or Tex-Edit Plus

Hi All,

First post here!

Anywhoo, I’m trying to select text relative to a string to be searched for, so after I find the string in the document, I can then select the n number of characters that are l lines down and s spaces to the right. That kind of thing. I’ve “accomplished” this using UI scripting in TextEdit, but the delays and failure rate are killing me, plus it takes forever if you have a lot of things to find, and you can’t background it while the scripts running. Here’s an email I’ve sent to Tom, the maker of Tex-Edit Plus. No response yet, so in the meantime I thought I’d share with you…

From email:

In a file full of potentially thousands of segments that look like the following example, I need to copy the bold bits to four variable lists (rank/signature, 1st eigan, 2nd eigan, third eigan).

I’ve accomplished this using ui scripting with TextEdit (the eigan values are always 7 lines down from the CP #, and start five places to the right, etc…), but that can take a long time depending on the number of data you are taking, and also has a high failure rate depending on the speed and number of background processes of the computer running the script (which I corrected for by having the script do checked of the last copied data and restart with a higher delay in between System Events commands).

I want to use Tex-Edit to do this because using built-in scriptability should eliminate the need for checking the data before moving to the next data, and also allow the script to run while Tex-Edit is in the background, allowing me to do other things while the data is being fetched.

So far, the (tiny) script I’ve written for Tex-Edit will open the file, convert it from unix to mac, and search for the first CP to find. After that, I can’t seem to figure out how to change the selection to where I want. From there, I’m not exactly sure how to set the n’th value of a list to the selection.

Lastly, on the chance I want all of the CP information, not just the eigan values, I wrote another script that just copies the line of the search result (CP # 423) and the next 32 lines to a variable by way of ui scripting.

So what I’m asking is how to change the selection to relatively known objects (sometimes lines, sometimes strings of text) in the file (I also tried identifying the line # of the search result, but I’m just no good), and from there how to directly set the value of a variable to the selected text (I assume "set variable_ to selection as (text or whatever).

Thanks,

Tim Wilson

PS I’d also like to do this using pages, word, textedit, or any other scriptable text editor or word processor.

Here’s an example of the kind of text I’m working with:

CP # 423
(RANK,SIGNATURE): b[/b]

CP COORDINATES: 3.273403 0.072249 1.034378

EIGENVALUES OF HESSIAN MATRIX:

 -0.6777485E+00  -0.6772667E+00   0.5110175E+00

EIGENVECTORS (ORTHONORMAL) OF HESSIAN MATRIX (COLUMNS):

  0.9029738E+00   0.2339380E+00  -0.3604321E+00
 -0.4256971E+00   0.6011940E+00  -0.6762749E+00
  0.5848318E-01   0.7640934E+00   0.6424492E+00

VALUES OF SOME FUNCTIONS AT CP(a.u.):

             Rho =   0.2617350E+00
    |GRAD(Rho)| =    0.2565964E-13
     GRAD(Rho)x =   -0.2297099E-13
     GRAD(Rho)y =    0.1136114E-13
     GRAD(Rho)z =    0.1294321E-14
      Laplacian =   -0.8439977E+00

(-1/4)Del**2(Rho)) = 0.2109994E+00

HESSIAN MATRIX:

 -0.5232880E+00   0.2898309E+00  -0.2751837E+00
                 -0.1338949E+00  -0.5162645E+00
                                 -0.1868148E+00

Hello!

If I were you, I’d check out TextWrangler, and have alook at its dictionary, and scripts from it and BBEdit.
It might be easier, than processing text in TextEdit, as pure ascii, that is. :slight_smile:

In text wrangler you have access to regular expressions, and the ability to set markers in your text, which may prove helpful.

Awesome. Will do. If I were you, I’d already have known about TextWrangler, and then I’d be in a better place! Namely, Norway! Vikings rule!!

But you see my intention right? I will always know the relative position of the text I want vs the text I can search for. will TextWrangler let me make a selection based on relative location?

-The lowly anglo-saxon

Hello

My idea was to enter a search pattern for your “anchors” in the text.
Then you would process relative to that marker. Say for instance you searched for “HESSIAN MATRIX” and marked those, then you could iterate over the collection of markers, and get the text relative to the markers position.

I think TextWrangler is smart enough to update marker and position as you insert and delete text, so you would always be able to act relatively to that marker.

Of course it isn’t just that easy, you’ll have to figure out the difference of hardlines and soft lines and such, but I think you would be far better off, by initially getting positioning by a marker, and then do stuff from there, ( looping over the markers) than moving through runs of characters in TextEdit.

I also think recording scripts in TextWrangler should work, but not totally sure about that.

The problems I have understanding this query are:

  1. The bold bits in the example are not the eigenvalues seven lines down from the CP #.
  2. In some places it says the aim is to select things in the text (which depends on the text application used), in others that it’s to assign them to variables (which is just text parsing and doesn’t require an application at all).
  3. There’s no indication of whether the search will be for a particular CP entry or all such in the text.

Not sure if you like it in the winter though! say no more. :smiley:

But you see my intention right? I will always know the relative position of the text I want vs the text I can search for.

As I have understood it, you have lots of the same heading, with data below. those are positioned relative to the heading, so you go the heading, and harvest!

Nigel,

I accidentally made the wrong numbers bold.

Currently, using TextEdit I can accomplish what I want though UI scripting. I have the textedit window in focus, then send a command-f, then set the clipboard to “CP # 423” and send command-f in the find input. after if is found i esc the find bar, and at this point the searched text “CP # 423” is selected. I then send a down arrow 7 times, putting the cursor on the same line as the eigenvalues of the hessian. Then I send a right arrow 5 times to place the cursor at the beginning of the first eigenvalue, “-0.6777485E+00”, and send a shift-right arrow 11 times to select the first eigenvalue. At this point I send a command-c to copy the eigenvalue to the clipboard, and from there set item i of a firstEigenValue variable to the clipboard. from there move the cursor to the right, shift-right arrow 11 more times, copy the second eigenvalue and set item i of a secondEigenValue variable to the clipboard.

This is what I mean in my query by “assign values to a variable”. I have a list of maybe 100 CP numbers, and for each I wish to select certain data of those CP’s, which is always in the same relative position to the CP number in the text file containing all the desired information.

As you can (painfully) see in the attached script. I want to produce list variables containing just the eigenvalues.

When doing this with UI scripting in textedit, however, there is a high error rate that can be remedied by increasing the delay between subsequent commands. The time required to fetch all the information I want then becomes quite large for long lists of CP numbers.

Here’s a script I use specifically to get the hessian matrix eigenvalues (my god, it’s a monster!):

on getOutputEiganValues(outFile, numberString)
	set path_ to path to scripts folder
	set genLib to (path_ as string) & "genLib.scpt" as text
	set genLib to (load script file genLib)
	if (count of numberString) > 25 then
		display dialog "This could take a while, so be patient." giving up after 5 with icon note
	end if
	set startTime to {}
	set done_ to 0
	set errorCount to 0
	set updateMsgFreq to 15
	set delayMultiplier to 13
	copy numberString to cpTypes
	copy numberString to firstEiganValues
	copy numberString to secondEiganValues
	copy numberString to thirdEiganValues
	tell application "TextEdit"
		activate
		open outFile
	end tell
	set outputContents to read outFile
	repeat with i_ from 1 to count of numberString
		if not (outputContents contains item i_ of numberString) then
			tell application "TextEdit"
				activate
				display dialog "Error: " & (item i_ of numberString) & ¬
					" doesn't exist in .out file. Please check that CP number and" & ¬
					" .out file are correct and try again." with icon stop
			end tell
			return "Error"
		end if
	end repeat
	set delay_ to 0.15
	set restart_ to true
	repeat until restart_ = false
		set restart_ to false
		set j_ to 1
		repeat with i_ from 1 to count of numberString
			set end of startTime to current date
			if done_ mod updateMsgFreq = 0 and done_ ≠ 0 then
				set remaining to ((count of numberString) - i_) as text
				tell genLib to set elapsedTime to timeElapsed(startTime)
				tell genLib to set remainingTime to timeDelayRemaining(remaining, updateMsgFreq, delay_, delayMultiplier)
				if remaining = "1" then
					display dialog remaining & " number to go. Thanks for waiting.
Time elapsed: " & elapsedTime & "
Time remaining (appx): " & remainingTime & "
Number of restarts: " & (errorCount as text) with icon note giving up after 3
				else
					display dialog remaining & " numbers to go. Thanks for waiting.
Time elapsed: " & elapsedTime & "
Time remaining (appx): " & remainingTime & "
Number of restarts: " & (errorCount as text) with icon note giving up after 3
				end if
			end if
			if not item i_ of numberString = "Skip" then
				set the clipboard to item i_ of numberString
				delay delay_ / 2
				tell application "TextEdit"
					activate
					tell application "System Events"
						key code 53
						delay delay_ / 2
						key code 53
						delay delay_ / 2
						keystroke "f" using [command down] -- open find bar
						delay delay_ / 2
						keystroke "a" using [command down]
						key code 51
						keystroke "v" using [command down] -- paste cp number
						delay delay_
						key code 36 -- press return key
						if j_ = 1 then
							delay delay_ + 1.2
						else
							delay delay_ + 0.5
						end if
						set j_ to j_ + 1
						key code 53
						delay delay_
						key code 125 -- move cursor down
						repeat 22 times -- move cursor to the right
							key code 124
						end repeat
						repeat 6 times -- selecting CP rank and signature
							key code 124 using [shift down]
						end repeat
						delay delay_
						keystroke "c" using [command down]
						delay delay_
						set item i_ of cpTypes to the clipboard
						if item i_ of cpTypes = "(3,-3)" then
							tell application "TextEdit"
								display dialog item i_ of cpTypes & ¬
									" is an atomic CP, so no eigan values are available.
Moving to next value." with icon caution giving up after 4
							end tell
							set item i_ of numberString to "Skip"
						else if not ((item i_ of cpTypes starts with "(3,") and ¬
							(item i_ of cpTypes ends with ")")) then
							set delay_ to delay_ + 0.05
							set errorCount to errorCount + 1
							set restart_ to true
							exit repeat
						else
							tell application "TextEdit" to activate
							key code 123 using [command down]
							repeat 6 times -- move cursor down
								key code 125
							end repeat
							repeat 5 times -- move cursor to the right
								key code 124
							end repeat
							repeat 14 times -- selecting first eigan value
								key code 124 using [shift down]
							end repeat
							delay delay_
							keystroke "c" using [command down]
							delay delay_
							set item i_ of firstEiganValues to the clipboard
							if not ((offset of "E" in (item i_ of firstEiganValues)) = 11) then
								set delay_ to delay_ + 0.05
								set errorCount to errorCount + 1
								set restart_ to true
								exit repeat
							end if
							tell application "TextEdit" to activate
							repeat 3 times -- move cursor to the right
								key code 124
							end repeat
							repeat 14 times -- selecting second eigan value
								key code 124 using [shift down]
							end repeat
							delay delay_
							keystroke "c" using [command down]
							delay delay_
							set item i_ of secondEiganValues to the clipboard
							if not ((offset of "E" in (item i_ of secondEiganValues)) = 11) then
								set delay_ to delay_ + 0.05
								set errorCount to errorCount + 1
								set restart_ to true
								exit repeat
							end if
							tell application "TextEdit" to activate
							repeat 3 times -- move cursor to the right
								key code 124
							end repeat
							repeat 14 times -- selecting third eigan value
								key code 124 using [shift down]
							end repeat
							delay delay_
							keystroke "c" using [command down]
							delay delay_
							set item i_ of thirdEiganValues to the clipboard
							if not ((offset of "E" in (item i_ of thirdEiganValues)) = 11) then
								set delay_ to delay_ + 0.05
								set errorCount to errorCount + 1
								set restart_ to true
								exit repeat
							end if
						end if
						tell application "TextEdit" to activate
					end tell
				end tell
			end if
			set done_ to done_ + 1
		end repeat
		set j_ to j_ + 1
	end repeat
	tell application "TextEdit" to close document 1 saving no
	return {cpTypes, firstEiganValues, secondEiganValues, thirdEiganValues}
end getOutputEiganValues

Hi. You really shouldn’t have to resort to GUI scripting. This needs further work, but it will get you started.


set AppleScript's text item delimiters to space

tell application "TextWrangler"'s document 1
	zap gremlins it nulls 1 controls 1 non ASCII characters 1 --purify the text
	
	repeat with lineTarget in (get (lines whose contents begins with "CP # ")'s startLine) --identify each CP #'s line (index values)
	set {A, B, C, D, E} to (line ((lineTarget's contents) + 7) as text)'s text items --obtain the eigenvalues, 7 lines down
	return {C, D, E} --> {"-0.6777485E+00", "-0.6772667E+00", "0.5110175E+00"} --return is here as a visual stop, showing values; A & B are empty texts
	end repeat
	
end tell

set AppleScript's text item delimiters to ""	

That looks just right. I’ll bet I can fiddle with that real good!
I don’t understand how someone, like you, could look at the text wrangler dictionary and figure that kind of functionality out. I have minimal programming experience, as is evident in my code, so I can only imagine that you and others have experience that translates well to AppleScript.

Thanks a lot for the starting point though. I’ll share the finished product here once it gets there. I’ve big plans for this set of scripts. There’s tons of thing I have to do with highly predictable files, and a multitude of applications that I want to receive the data being fetched. The script the above handler’s a part of uses over 15 handlers to get the job done. I’m really enjoying myself but theres times whenninwonder how far it can really go without some genuine programming knowledge, which I don’t really plan on attaining with any level of direction.

Thanks again,

Tim

And McUsr, Norway rules! It gets to -20 degrees C here in Denver. And super hot in the summer. A few days of 40 degrees C already this year. And we have no Vikings as all.

Hi, Tim.

Thanks for your clarification in post #7. Sorry I couldn’t reply yesterday. My Internet connection was down for most of the day. I see Marc’s provided a solution, although it’s somewhat obfuscated, returns prematurely, and doesn’t give the correct result with my de-gremlined version of the text. :wink:

Here’s what I was working on. It doesn’t require any application at all, but if you’re not sure what sort of text the file contains (eg. 8-bit, UTF-8 or UTF-16 Unicode, or RTF), you could use TextEdit to open and interpret if for you. You pass the main handler the HFS path to your text file and a list of the CP numbers of interest. (Or leave the list empty to mean “every CP number in the file”.) It returns a list of records, each record containing a “CP” number, the corresponding “(RANK,SIGNATURE)” value, and a list of the “EIGENVALUES OF HESSIAN MATRIX”. I’ve taken it on trust that your data are reliably formatted as described.

-- Pick out the entries in a given line.
-- Assumes the spaces between them are spaces or tabs, but can easily be adapted.
on parseLine(theLine)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {space, tab}
	set theTextItems to theLine's text items
	set AppleScript's text item delimiters to astid
	
	repeat with i from 1 to (count theTextItems)
		if (item i of theTextItems is "") then set item i of theTextItems to missing value
	end repeat
	
	return theTextItems's every text
end parseLine

-- Given the HFS path to a suitable text file and a list of "CP" numbers, return the (RANK,SIGNATURE) and EIGENVALUES OF HESSIAN MATRIX values given under those numbers in the text.
-- If the passed numbers list is empty, return all such values in the file.
-- The output is a list of records: one record per CP number.
-- Written on the assurance that the relevant lines are always the first and seventh respectively after the CP # line.
on getRankSigAndEigenValues(filePath, CPNumbers)
	set fileText to (read file filePath from 1 as «class utf8») -- or 'as string' or 'as Unicode text', whichever's relevant.
	-- Alternatively:
	-- tell application "TextEdit" to set fileText to text of (open file filePath)
	
	set gettingAll to (CPNumbers is {}) -- Pre-test the emptiness or otherwise of the passed numbers list.
	set outputRecords to {} -- Initialise the output list.
	
	-- Split the text at every instance of "CP # " in it.
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "CP # "
	set CPSections to fileText's text items 2 thru -1
	set AppleScript's text item delimiters to astid
	
	-- With each section of text, if the data from that section are required, make up a record containing the CP number, the (RANK,SIGNATURE) value, and a list of the EIGENVALUES OF HESSIAN MATRIX, and append the record to the output list.
	repeat with CPSection in CPSections
		set numberString to first word of CPSection
		if ((gettingAll) or (CPNumbers contains numberString)) then
			set rankSignature to end of parseLine(paragraph 2 of CPSection)
			set eigenvalues to parseLine(paragraph 8 of CPSection)
			set end of outputRecords to {CP:numberString, rankSignature:rankSignature, eigenvalues:eigenvalues}
		end if
	end repeat
	
	return outputRecords
end getRankSigAndEigenValues

getRankSigAndEigenValues((path to desktop as text) & "Test.txt", {"423"})
--> {{CP:"423", rankSignature:"(3,-1)", eigenvalues:{"-0.6777485E+00", "-0.6772667E+00", "0.5110175E+00"}}}

Thanks Nigel,

I’ll be back in the lab today do I’ll try that out.

Thanks,

Tim

Hi, Nigel.
Obfuscation is an eye of the beholder issue. :slight_smile: I did note that the return was merely a visual stop to demonstrate the result being obtained. It should be removed in the final version, or else the loop ends.

I’m curious as to what was returned in your result; I don’t see any version-specific statements, and it tested as working. Your getRankSigAndEigenValues handler actually returned an error on my machine:

“Can’t get text items 2 thru -1 of "CP # 423
(RANK,SIGNATURE): (3,-1)”…

I haven’t had a chance to try working with either solution provided here yet.

However, Here’s a longer bit of a file to work with for testing.

Just to reiterate, I’m looking to have, by the time this particular handler finishes, lists or records (or anything really) that contain the desired information (rank and signature: (3, -1), and the three eigen vlaues of the hessian matrix as separate items so they can be put into excel easily, -0.1603863E+01 as one, and -0.1603863E+01 and -0.1603863E+01)

Furthermore, note that in the script I showed above, there’s a check for (3, -3) CP’s (an atom point in the molecule, rather than a bond, ring, or cage point), and if one is found, then that CP # is skipped over, since there are no eigen values for atomic CPs. This gives me an error when I try to use that data, because it results in the lists not being the right length after the atom CPs are excluded. I haven’t yet put time into fixing this, but I’d like to.

You can use this list as the search material:

and this as the text to be searched, so you would extract the Eigen values from here:

Well. There’s an obvious observation to make here about “the eigan values are always 7 lines down from the CP #”. :expressionless:

Here’s an adjusted version of my script:

-- Pick out the entries in a given line.
-- Assumes the spaces between them are spaces or tabs, but can easily be adapted.
on parseLine(theLine)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {space, tab}
	set theTextItems to theLine's text items
	set AppleScript's text item delimiters to astid
	
	repeat with i from 1 to (count theTextItems)
		if (item i of theTextItems is "") then set item i of theTextItems to missing value
	end repeat
	
	return theTextItems's every text
end parseLine

-- Given the HFS path to a suitable text file and a list of "CP" numbers, return the (RANK,SIGNATURE) and EIGENVALUES OF HESSIAN MATRIX values given under those numbers in the text.
-- If the passed numbers list is empty, return all such values in the file.
-- The output is a list of records: one record per CP number.
-- Written on the assurance that the relevant lines are always the first and seventh respectively after the CP # line.
on getRankSigAndEigenValues(filePath, CPNumbers)
	set fileText to (read file filePath from 1 as «class utf8») -- or 'as string' or 'as Unicode text', whichever's relevant.
	-- Alternatively:
	-- tell application "TextEdit" to set fileText to text of (open file filePath)
	
	set gettingAll to (CPNumbers is {}) -- Pre-test the emptiness or otherwise of the passed numbers list.
	set outputRecords to {} -- Initialise the output list.
	
	-- Split the text at every instance of "CP # " in it.
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to "CP # "
	set CPSections to fileText's text items 2 thru -1
	set AppleScript's text item delimiters to astid
	
	-- With each section of text, if the data from that section are required, make up a record containing the CP number, the (RANK,SIGNATURE) value, and a list of the EIGENVALUES OF HESSIAN MATRIX, and append the record to the output list.
	repeat with CPSection in CPSections
		set numberString to first word of CPSection
		if (((gettingAll) or (CPNumbers contains numberString)) and (CPSection contains "EIGENVALUES OF HESSIAN MATRIX")) then
			set rankSignature to end of parseLine(paragraph 2 of CPSection)
			set eigenvalues to parseLine(paragraph 8 of CPSection)
			set end of outputRecords to {CP:numberString, rankSignature:rankSignature, eigenvalues:eigenvalues}
		end if
	end repeat
	
	return outputRecords
end getRankSigAndEigenValues

getRankSigAndEigenValues((path to desktop as text) & "Test.txt", {})
--> {{CP:"222", rankSignature:"(3,-1)", eigenvalues:{"-0.1603863E+01", "-0.1556503E+01", "0.1338003E+01"}}, {CP:"223", rankSignature:"(3,+1)", eigenvalues:{"-0.1058508E-02", "0.1230207E-02", "0.1155629E-01"}}, {CP:"235", rankSignature:"(3,-1)", eigenvalues:{"-0.6310378E+00", "-0.6095227E+00", "0.5037408E+00"}}, {CP:"315", rankSignature:"(3,+1)", eigenvalues:{"-0.8892249E-03", "0.3160164E-02", "0.7322449E-02"}}, {CP:"325", rankSignature:"(3,-1)", eigenvalues:{"-0.1621305E+01", "-0.1574477E+01", "0.1345618E+01"}}}

To quote myself:

But that looks awesome!!! Even in the presence of flat smiley faces!!

Say I also wanted to take Rho (17 lines down from the CP # for bond, ring, and cage, and 7 lines down for atoms).

I would use something like:

set rho to parseLine(text item 3 of paragraph 18 of CPSection) -- for bond, ring, cage

and

set rho to parseLine(text item 3 of paragraph 8 of CPSection) -- for atom

Or something like that?

-Tim

Ahh. Item 3 of parseLine()… :smiley:

Cool. This works awesome. And so fast!!

I’m gonna work in a check and notification if a (3,-3) CP is in the number list, and add the ability for the user to select which information to include in the output by changing the arguments of the handler.

This is totally awesome…

Thanks Nigel, you rule.

You rule too Mark, but less than Nigel in this case…

I rule the least…

-Tim

One last question!

I have another script that just takes all of the CP information from one of these output files for the CPs bring searched. Right now it’s still UI sripting, but it just copies 32 lines starting with the CP # 123 part, if it’s a (3,+3) (3,+1) or (3,-1) CP, and 9 lines if it’s a (3,-3) CP.

I can see how to do a lot of things after looking to the script you bestowed upon me, but how would I set a variable to something like:

repeat with CPSection in CPSections
set CPInfo to (text of paragraph 1 though paragraph 32 of CPSection)
end repeat

-Tim

If you just want the data as blocks of text, the relevant code to insert will be:

repeat with CPSection in CPSections
	set numberString to first word of CPSection
	if ((gettingAll) or (CPNumbers contains numberString)) then
		set rankSignature to end of parseLine(paragraph 2 of CPSection)
		if (rankSignature is "(3,-3)") then
			set end of outputRecords to "CP # " & text 1 thru paragraph 9 of CPSection
		else if (rankSignature is in "(3,+3) (3,+1) (3,-1)") then
			set end of outputRecords to "CP # " & text 1 thru paragraph 32 of CPSection
		end if
	end if
end repeat

. assuming that no additional checks are needed for the number of available lines… :wink: