Format A Marked Text

I am working with a file that contains a very simple Markup language. The file needs to be proof read based upon the original document, and I would like to have the digital copy emulate the hard copy as closely as possible. Thus, I need to duplicate the file (which I’ve figured out easy enough), but I also need to format certain aspects. I’ve run into a wall on formatting the superscript and subscript. Here is what I’ve come up with thus far:

tell application "Tex-Edit Plus"
	-- Duplicate Document to Preserve Original
	
	if not (window 1 exists) then
		beep
	else
		duplicate window 1 to front
	end if
	
	-- Format Supscript

	tell window 1
	       set convert_Supscr to search looking for "<supscr>^*</supscr>" as string
	       copy (size of word in front of convert_Subscr) / 2 to selSize
	       copy ((baseline ascent of word in front of convert_Subscr) + selSize) to selVertShift
	
	if selSize < 9 then
		set selSize to 9
	end if
	
	set size of convert_Subscr to selSize
	set baseline ascent of convert_Subscr to selVertShift
end tell

	end tell

I need to figure out how to have the search loop through the document, thus upon finding the search criteria it converts the hit to the desired format, then moves on to the next occurrence. Any help or pointers would help. Thanks!

Ps. The file I am working with is .rtf

This works for superscript in Tex-Edit Plus:


on applySuperscript()
	-- This code is in a script supplied with Tex-Edit Plus and is attributed to T. Bender.
	tell window 1 of application "Tex-Edit Plus"
		copy (size of word in front of selection) / 2 to selSize
		copy ((baseline ascent of word in front of selection) + selSize) to selVertShift
		
		if selSize < 9 then
			set selSize to 9
		end if
		
		set size of selection to selSize
		set baseline ascent of selection to selVertShift
	end tell
end applySuperscript

set tag1 to "<supscr>"
set afterTag1 to (count tag1) + 1
set tag2 to "</supscr>"
set beforeTag2 to -1 - (count tag2)

tell application "Tex-Edit Plus"
	tell window 1
		repeat while (search looking for tag1 & "^*" & tag2) -- Find and select each occurrence 
			set selText to contents of selection
			set newSelText to text afterTag1 thru beforeTag2 of selText
			my applySuperscript()
			set contents of selection to newSelText -- This has to be done last as it cancels the selection.
		end repeat
	end tell
end tell

And you’ll need something similar but different for subscript.

If your file’s an RTF one as you say ” and as long as your markup tags themselves aren’t formatted ” a simpler appoach would be to replace the markup tags with the RTF equivalents in the duplicate file. But for some reason, Tex-Edit Plus (4.9.9 Beta) displays the text of duplicate files (created as below) as plain text with tags rather than as formatted text, even when the duplicate file’s identical to the original. :confused: TextEdit does OK though.


on searchNreplace(txt, searchStr, replaceStr)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to searchStr
	set txt to txt's text items
	set AppleScript's text item delimiters to replaceStr
	set txt to txt as text
	set AppleScript's text item delimiters to astid
	
	return txt
end searchNreplace

set dsktp to (path to desktop)
set docs to (path to documents folder)

set originalText to (read (choose file of type "rtf" default location docs) as string)

set newText to searchNreplace(originalText, "<supscr>", "\\super ")
set newText to searchNreplace(newText, "</supscr>", "\\nosupersub ")
set newText to searchNreplace(newText, "<subscr>", "\\sub ")
set newText to searchNreplace(newText, "</subscr>", "\\nosupersub ")

set editedFile to (choose file name default location dsktp default name "Edited file.rtf")
set fRef to (open for access editedFile with write permission)
try
	set eof fRef to 0
	write newText to fRef as string
end try
close access fRef

tell application "TextEdit" to open editedFile

Thanks, for the help. I understand what you’ve done with making the handler, but I am curious about the tag variables you’ve assigned. I commented out + 1 and --1 -, and I see that the results are chomped. How did you know (what caused you) to account for the offset?

Hi, James.

I’m not sure what you’re asking, but in these four lines .

set tag1 to "<supscr>"
set afterTag1 to (count tag1) + 1
set tag2 to "</supscr>"
set beforeTag2 to -1 - (count tag2)

. I’ve assigned your mark-up tags to the variables ‘tag1’ and ‘tag2’ in order to work with them and (possibly) to make some of the rest of the code reusable for handling subscript. Each match found and selected by the ‘search’ command will begin with “” and end with “”. We just want to keep the text between them, ie. text ((count tag1) + 1) thru -((count tag2) + 1) of the found match. I’ve precalculated the range indices and stored them in the variables ‘afterTag1’ and ‘beforeTag2’ before entering the repeat, although it’s something of a false economy here. If you’re definitely not going to reuse the code for anything else and the markup code for “superscript” is never likely to change, you could do the calculation in your head and write the numbers directly into the script:

set newSelText to text 9 thru -10 of selText

This doesn’t allow for the possibly that there may be no text at all between the tags. But since it’s for proofreading, you could leave it as it is and let a superscripted “><” in the result alert you to a superfluous mark-up in the original.

Thanks, Nigel. Your explanation was most helpful.

Thank you for abstracting the supscr with assigning it to a variable. These are exactly the sort of things that enable my scripting capabilities to grow and mature.

I have a subset question, and if you don’t want to answer that’s fine. It probably merits a post of its own.

Let’s say I am working with a selection of text. Let’s say this selection is: “V-88-oII> 403Æ 410 538 Eus.Tht.Cyr.”

Now, some items in this list need special formatting. I’ve gotten my script to point of returning the selection as words thus creating: {“V”, “88”, “oII”, “403Æ”, “410”, “538”, “Eus.Tht.Cyr.”}.

Here is what I would like my desired string to look like: V-88-oII 403Æ 410 538 Eus. Tht. Cyr.

Any insights you can afford are welcomed. Thanks!

**Disregard the above. I figured out how to process a string.