Pages: Replace character in font A w/ char in font B?

Thanks to the very helpful messages in this post -

http://macscripter.net/viewtopic.php?id=23896

  • I’ve figured out how to find and replace specific characters in Pages. I have a slightly more complex problem for which I would be grateful for any help.

I am trying to update some older files that used specially-modified fonts to display non-Roman characters. For example, these ancient files used an old non-Unicode font named Times New Roman Sanskrit, in which the keystroke that normally inserts the è character instead inserted (for example) the Sanskrit इ character - but the Sanskrit character इ appeared only in printed text; if I copied the plain text to some other application, the other application saw the è character. not the इ character.

Now, using Pages, I want to replace every instance of (for example) character è in a font named “Times New Roman Sanskrit” with the Sanskrit character इ - but of course I want to format that Sanskrit character not in “Times New Roman Sanskrit,” but whatever font is used in the surrounding text (for example, the Unicode-based OS X version of Times New Roman). It would be sufficient to remove all character formatting from the Sanskrit character इ - the goal is to get it into a Unicode-based font in the same size and weight as the surrounding text.

I hope I’ve made the problem clear. I’m not at all certain this is even possible. But I would be grateful for any help.

Further to this, I see that the Pages dictionary includes Character Style which includes the property Font Name, but I don’t see how to work that into an Applescript. Any help would be gratefully received.

I guess that it may be done but I wish to know more details.

There is no way to guess the correspondence between the code used to ‘describe’ a character in the old font and the unicode value describing it in modern fonts.

Here is my mail address :
koenigyvan (at) mac (period) com

replace " (at) " by @
replace " (period) " by “.”

May you send a copy of this old font which I don’t have (and know) and a sample Pages document to treat.
With the font available, I guess that I will be able to build a conversion table allowing a script to do the trick.
At this time I’m running doing this kind of task. It’s awfully slow.

Yvan KOENIG (VALLAURIS, France) lundi 2 mai 2011 18:30:29

Thank you, Yvan. I will prepare a document that illustrates the problem and will send it along later today.

With Yvan’s help, I’ve put together a solution to this problem. Here’s a further explanation. Back in the 1980s and early 1990s, romanized transcripts of Sanskrit texts often used a freely-available font called Sanskrit Times NewRoman GE. This was a non-Unicode font that placed romanized Sanskrit characters where normal fonts placed different characters from the Latin-1 character set.

The problem: How to convert documents that used Sanskrit Times NewRoman GE into modern documents that use standard unicode-based fonts. One solution is this (note that the list of characters and replacement characters is still INCOMPLETE):

property sanList : {"š", "¡", "ž", "¸", "ı", "Â", "Á", "Ã’", "ë", "¯", "Ëœ", "˜", "'"} -- still incomplete list of characters in Sanskrit font as they appear on screen in Pages (which displays the character that matches the code point in the font)
property uniList : {"Ä€", "ā", "Ä«", "ṃ", "ṇ", "Å«", "á¹›", "á¹­", "ō", "Åš", "Å›", "˜", "'"}  -- still incomplete list of unicode equivalents

on run
	tell application "Pages"
		activate
		with timeout of 300 seconds --5 minutes
			tell body text of document 1
				repeat with i from 1 to the count of sanList
					considering case
						set offsetList to (character offset of every character whose (contents is (the contents of item i of sanList)) and (font name is "SanskritTimesNewRomanGE"))
					end considering
					repeat with anOffset in offsetList
						set font name of character (anOffset) to "Times New Roman"
						set character (anOffset) to the contents of item i of uniList
					end repeat
				end repeat
			end tell
		end timeout
		set font name of (every character of body text of document 1 whose font name is "SanskritTimesNewRomanGE") to "Times New Roman"
	end tell
end run

After replacing characters that occur only in Sanskrit Times NewRoman GE, the script then replaces all other Sanskrit Times NewRoman GE with Times New Roman, for consistent appearance and high portability.

Do not blame Yvan for the faults in this code - I used his ideas, but developed a different script. The script is obviously unfinished; a final version would let you select a file to open in Pages, or work with a file or files dropped on the script itself.

This script is very slow, and if anyone can suggest speed improvements, I’ll be grateful.