Timeout in long Pages document

Here is my solution.

It cycles through the pages and then uses delimiters to split the text on the desired character. It then uses the length of each text item to locate the target, rejoins the text and changes the font/colour of each match. If the match was multi-character, it would likely require a change to the set ch to ch … line. Not sure whether multi-byte characters would be affected. I change the colour to make the results more noticeable.

As @nickpassmore notes, if the different font ends up moving text to another page while that page is being processed, then it will fail when it seeks the moved sample. I thought of reversing the order but it’s more of a pain with my method so I’ll simply note the issue although I may revisit this.

My mac is kinda slow. I’m working with a 66 page, 160k text (with 24k words) that has 128 matches and it takes about 20 seconds to complete the change of font and colour. Using post23’s script took 1:55 for me, as a reference.

If you manually set the value of ‘pct’ in the repeat line, then you can limit the pages processed so you can test it on a smaller sample.

set black to {0, 0, 0}
set red to {64764, 10794, 7196}
set origFont to "Helvetica Neue"
set nextFont to "Arial"
set delim to "»"

tell application "Pages"
	activate
	
	set pageList to pages of document 1
	set pct to count of pageList
	-- cycle pages
	repeat with pp from 1 to pct
		tell page pp of document 1
			set AppleScript's text item delimiters to ""
			set sheep to body text
			set AppleScript's text item delimiters to delim
			set sheepList to my pageBreakdown(sheep) -- list of delimited text of page
			
			-- reset for each page
			set sheepLength to {} -- length of each text item
			set ch to 0 -- index of matching character 
			
			-- cycle text items to get each length
			repeat with x in sheepList
				set end of sheepLength to count of x
			end repeat
			
			-- cycle text items 
			repeat with l from 1 to ((count of sheepLength) - 1) -- exclude trailing text item
				set ch to ch + (item l of sheepLength) + 1 -- index of matching character
				-- set chr to character ch of body text -- matching character, used for logging
				set color of character ch of body text to red -- {64764, 10794, 7196}, {0, 0, 0}
				set font of character ch of body text to nextFont
			end repeat
			
		end tell
	end repeat
end tell
set AppleScript's text item delimiters to ""

-- break page's text into list, splitting on desired string
on pageBreakdown(eachpage)
	set sheepList to a reference to (get text items of eachpage)
end pageBreakdown

That’s my thinking as well.

I found it odd though that it both moved the text above the page being processed —even when going backwards— and did so regardless of whether w&o was set.

Quirky that it didn’t error out when unchecked. The couple of pages there must have been optimally bad.

Here’s a combination of our two approaches. This minimises what Pages has to do and takes about 36 seconds to make >2000 changes in 66k words.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

tell application "Pages"
	activate
	tell front document
		set BodyText to body text
	end tell
end tell

set TextLength to length of BodyText

set AppleScript's text item delimiters to {"»"}

set AllTextItems to every text item of BodyText
set CountKeep to TextLength

repeat with i from (count of items of AllTextItems) to 1 by -1
	set NewLength to (count of characters of item i of AllTextItems)
	set NewPosition to (CountKeep - (NewLength))
	ChangeCharacter(NewPosition) of me
	set CountKeep to NewPosition - 1
end repeat


on ChangeCharacter(z)
	tell application "Pages"
		tell front document
			set font of character z of body text to "Times-Roman"
		end tell
	end tell
end ChangeCharacter

It works well but is slower for me.

For my test file, it took 28 seconds whereas my earlier script took 16.

That’s interesting. When I run yours on my Dubliners text I get an error where it cannot find character x of page y.

If I trap the error and carry on it seems to work but the last few pages do not get processed. These are pages that don’t exist before the script is run but are created because the text gets longer when the font is changed — I am changing Helvetica to Times and the chevron character is wider.

I assume this would be fixed by reversing the change order though.

Wow. I did not expect this much help.

The last version posted by nickpassmore worked well on my document. It took about two minutes, which is fine with me.

I now have two scripts that work well. This is great.

Thank you all, Mockman, nickpassmore, robertfern, VikingOSX, Nigel_Garvey, and paulskinner!

1 Like

By the way, the idea to use delimiters seems brilliiant to me.

1 Like

I think you’re probably right. Maybe I’ll take a look at it tonight.

I think I fixed my script…

use scripting additions

set charList to {"?", "!"}
set theFont to "BookAntiqua-Bold"
set theColor to "blue"
set fontSize to 24

tell application "Pages"
	activate
	tell front document
		set pageCount to count pages
		set my progress description to "Changing Font (" & name & ")"
		set my progress total steps to pageCount
		repeat with pageIdx from 1 to pageCount
			set my progress additional description to "Page " & pageIdx & " of " & pageCount
			set my progress completed steps to pageIdx
			tell body text of page pageIdx
				repeat with achar in charList
					set achar to contents of achar
					repeat with fChar in (characters of it where it is achar)
						try
							set font of fChar to theFont
							set color of fChar to theColor
							set size of fChar to fontSize
						end try
					end repeat
				end repeat
			end tell
		end repeat
	end tell
end tell

This version doesn’t use Pages to locate the characters, using AS TIDs to locate each characters index on each page, accurately updates all targeted characters in the entire document regardless of text reflow caused by font or font-size changes. This code should handle multiple target characters more efficiently than looping through them.

Duration .73 seconds for “War and Peace.pages” with no matching characters ( “«” and “»” ) to “Impact”.
Duration 6.39 seconds for “War and Peace.pages” updating 1 "["s to “Impact”.
Duration 248.48 seconds for “War and Peace.pages” updating 154 "è"s font to “Impact”.

Duration seems driven primarily by Pages reflowing the subsequent text.

Duration 81.25 seconds for “War and Peace.pages” updating 1876 "?"s font to “Arial”.
Duration 10.15 seconds for “War and Peace.pages” updating 1876 "?"s to font “Arial” again.
Duration 331.32 seconds for “War and Peace.pages” updating 1876 "?"s font to “Impact”
Duration 10.71 seconds for “War and Peace.pages” updating 1876 "?"s font to Impact again.
Duration 25.3 seconds for “War and Peace.pages” updating 1876 "?"s colors.

--Running under AppleScript 2.8, MacOS 15.5, Pages version 14.2, 
tell application "Pages"
	set theCharactersList to {"«", "»"}
	tell document 1
		activate
		repeat with pageIndx from 1 to length of (get pages)
			tell page pageIndx
				set characterLocationlist to my characterOffsets(body text, theCharactersList)
				repeat with i in characterLocationlist
					try
						tell character i of body text
							set font to "Impact"
							--set its color to {4387, 36250, 65437}
							--set size to 77
						end tell
					on error e
						--if font or font--size changes push a located target character off the current page it will be located on a later page.
					end try
				end repeat
			end tell
		end repeat
	end tell
end tell



on characterOffsets(theText, theCharactersList)
	set {previousDelimiter, AppleScript's text item delimiters} to {AppleScript's text item delimiters, theCharactersList}
	set {theTextItems, AppleScript's text item delimiters} to {text items of theText, previousDelimiter}
	if length of theTextItems > 1 then
		set {offsetsList, totalOffset} to {{}, 0}
		repeat with thisTextItem in (items 1 thru -2 of theTextItems)
			set totalOffset to totalOffset + (length of text of thisTextItem) + 1
			set the end of offsetsList to totalOffset
		end repeat
		return offsetsList
	else
		return {}
	end if
end characterOffsets
1 Like

robertfern: It worked! It took about three-and-a-half minutes. Thanks for your good work.

I now have three scripts that work. :slight_smile:

Edited: I see that you corrected the typo.

paulskinner: It worked! I believe that your script is the fastest at about a minute-and-a-half.

This version doesn’t use Pages to locate the characters, using AS TIDs to locate each characters index on each page

Would you please explain what TIDs are and what a character index is?

I now have four scripts that work. You all are geniuses. I could not have done any of this without you. Occasionally over the last twenty-five years, or however long AppleScript has been available, I have tried to learn some of it, but have not made any progress. So I am very grateful for your help.

Melvin,

The character index is just the integer number of the character.
So in this case the character whose index is 9 is an “r”.

set sourceText to "Four score and seven years ago"
character 9 of sourceText --> "r"

By TIDs I’m referring to AppleScript’s Text item delimiters. To explain, I think it might be easiest to just demonstrate.

set sourceText to "Four score and seven years ago"

AppleScript's text item delimiters -->{""} is the default
text items of sourceText -->{"F", "o", "u", "r", " ", "s", "c", "o", "r", "e", " ", "a", "n", "d", " ", "s", "e", "v", "e", "n", " ", "y", "e", "a", "r", "s", " ", "a", "g", "o"}

set AppleScript's text item delimiters to " "
text items of sourceText -->{"Four", "score", "and", "seven", "years", "ago"}

set AppleScript's text item delimiters to "and"
text items of sourceText -->{"Four score ", " seven years ago"}

--so to find the index of every occurance of a character we can...

set AppleScript's text item delimiters to "a"
set theTextItems to text items of sourceText -->{"Four score ", "nd seven ye", "rs ", "go"}
--After each item in theTextItems there was previously an "a"

set theCharacterList to {}
set currentCharacterIndex to 0 --> this will be the running tally of all previously counted characters.

--we can loop through theTextItems ( disregard the last one because there is not an "a" following it. ) and count their characters to determine where the "a"s are in the sourceText 
repeat with thisTextItem in (items 1 thru -2 of theTextItems)
	set theLengthOfThisTextItem to length of thisTextItem
	set currentCharacterIndex to theLengthOfThisTextItem + 1 + currentCharacterIndex
	set the end of theCharacterList to character currentCharacterIndex of sourceText
end repeat

theCharacterList -->{"a", "a", "a"}

In my latest script I am just grabbing the body text and using TIDs like this to locate the indices for the characters we need to update. This is faster than asking Pages to locate them. Also, TIDs can be a list, so I can locate every instance of multiple characters without needing to loop through them.

set AppleScript's text item delimiters to {"a", "e", "i", "o", "u"} --" "
text items of sourceText -->{"F", "", "r sc", "r", " ", "nd s", "v", "n y", "", "rs ", "g", ""}

Hope that helps.

From the Language Guide:

AppleScript provides the text item delimiters property for use in processing text. This property consists of a list of strings used as delimiters by AppleScript when it coerces a list to text or gets text items from text strings. When getting text items of text, all of the strings are used as separators. When coercing a list to text, the first item is used as a separator.

In essence, you can set a delimiter (or a list of them) and then split or separate a string on it, creating a list of the resulting text items.

In this scenario, the scripts are splitting the text on the chevron character and creating a list of the resulting strings.

You can also rejoin items in a list around the delimiter by coercing the list with as text. That does not occur in this scenario since the purpose of the split is simply to get the length of each string rather than to alter it.

Here is a simple example to demonstrate changing every instance of the left chevron to a right chevron.

set str to "first second « third fourth « fifth sixth"

set text item delimiters to "«"

set strList to text items of str
--> {"first second ", " third fourth ", " fifth sixth"}

set text item delimiters to "»"

set str to strList as text
--> "first second » third fourth » fifth sixth"

An index is simply the position of an ordered element within its grouping. For example, the letter ‘e’ is the 5th character of the string ‘apple’, so its index is 5.

character 5 of “apple”
→ e

You can read more about index here: Reference forms

Thank you for the explanation!

Here’s an example using RegEx

Getting the locations is fast.
Changing the font / color / size is slow

use AppleScript version "2.7"
use framework "Foundation"
use scripting additions


-- classes, constants, and enums used
property NSRegularExpression : a reference to current application's NSRegularExpression
property NSMutableArray : a reference to current application's NSMutableArray
property NSString : a reference to current application's NSString

--property thePattern : "(«|»)"
property thePattern : "(\\?)"
property theRegEx : missing value
property newFormatCharRef : missing value

set theRegEx to NSRegularExpression's regularExpressionWithPattern:thePattern options:0 |error|:(missing value)
set newFormatCharRef to {font:"Impact", color:{4386, 36250, 65437}, size:11.0}


tell application "Pages"
	
	tell document 1
		activate
		
		set pageIndex to 0
		set thePages to a reference to every page
		
		repeat with aPage in thePages
		set pageIndex to pageIndex + 1
		--	log {"pageIndex:", pageIndex}
		set theText to (a reference to body text)
		set characterLocationlist to my getLocationsInText(theText)
		
		repeat with i in characterLocationlist
			try
				set properties of character i of theText to newFormatCharRef
			on error e
				--if font or font--size changes push a located target character off the current page it will be located on a later page.
			end try
		end repeat
		
		end repeat
	end tell
end tell


on getLocationsInText(theText)
	set theString to NSString's stringWithString:theText
	set regexResults to theRegEx's matchesInString:theString options:0 range:{location:0, |length|:theString's |length|()}
	set theRanges to (regexResults's valueForKey:"range")
	set theLocations to NSMutableArray's new()
	
	repeat with aRange in theRanges
		set aRange to aRange as record
		set aLocation to aRange's location
		set aLocation to aLocation + 1
		--(theLocations's addObject:aLocation)
		(theLocations's insertObject:aLocation atIndex:0) -- add locations in reverse order
	end repeat
	
	return theLocations as list
end getLocationsInText



Thanks, technomorph!

I received four scripts from you all that worked. But not one of them changes the double chevrons (guillemets) in footnotes.

Would anyone care to rise to that challenge? :slight_smile:

Unfortunately, it looks as if footnotes are not exposed to scripting in Pages. Short of attempting to do this with GUI scripting — which would be laborious and pretty fragile — I think you may be out of luck.

I know this is of no immediate help but, if you need to manipulate the content of documents like this regularly, Pages is probably not the right tool. Adobe InDesign, for instance, is scriptable to a pretty remarkable degree and even without resorting to scripting its find & replace and its paragraph and character styles are very powerful and could easily have solved this problem.

InDesign is very much not free though!

Thanks, Nick.

I’ve used InDesign. I won’t use Adobe products anymore. I can change the fonts in the footnotes by hand without much effort. It is only one document.