Apply character style to numbers in Pages 09

I need to apply a character style to all numbers in a Pages 09 document. Any help with how to do this would be appreciated.

Hi ercross. Welcome to MacScripter.

The main problems are of course identifying numbers in the text and doing it fast enough to make it worthwhile using a script!

In the script below, since I don’t know exactly what you mean by a number, a number is a “word” (as understood by both Pages and AppleScript) which can be coerced to number. That means ordinals aren’t included because of their suffixes, but individual numbers in times (eg. “12:30:45”) are recognised as separate entities because colons don’t count as parts of words.

To avoid the huge time overheads involved in scripting the minutiae in Pages, the data are dumped into script variables in comparatively large units ([ie.] paragraphs) and are parsed as AppleScript objects rather than as Pages ones. It may be possible (though I haven’t tried it) to fine-tune the number search to excude things like times, but it depends on being able to match the results to particular parts of the text in Pages.

local o -- So that the script object and its property values are discarded when the script finishes.

script
	property paras : missing value
	property wrds : missing value
end script
set o to result

-- Dump all the paragraphs of the Pages document into a variable in the script. (It's faster to work through this list than through the paragraphs in the document itself.) The "paragraphs" Pages returns are actually lists, each containing the text of a paragraph.
tell application "Pages" to set o's paras to (get paragraphs of body text of document 1)

considering case
	-- Iterate through the list of "paragraphs".
	repeat with p from 1 to (count o's paras)
		-- Extract the words (as understood by AppleScript) of each paragraph.
		set o's wrds to words of ((item p of o's paras) as text)
		-- Iterate through the list of words.
		repeat with w from 1 to (count o's wrds)
			set thisWord to (item w of o's wrds)
			-- If a word begins with a digit character, it may be a number. Try coercing it to one. If there's no error, set the character style of the corresponding word in the corresponding paragraph in the Pages document.
			if (character 1 of thisWord is in "0123456789") then
				try
					thisWord as number
					tell application "Pages"
						tell document 1
							set character style of word w of paragraph p of body text to character style "Strikethrough" -- Or whatever your required style's called.
						end tell
					end tell
				end try
			end if
		end repeat
	end repeat
end considering

Nigel,

That is simply amazing and really fast.
Can I ask one more thing? Is it possible to have the script only select numbers from paragraphs that have a particular paragraph styling applied?

I don’t think it’s possible to get the paragraphs’ paragraph styles in bulk. The only way is to get them individually. Fortunately, it doesn’t seem to take too long:

local o -- So that the script object and its property values are discarded when the script finishes.

script
	property paras : missing value
	property wrds : missing value
end script
set o to result

-- Dump all the paragraphs of the Pages document into a variable in the script. (It's faster to work through this list than through the paragraphs in the document itself.) The "paragraphs" Pages returns are actually lists, each containing the text of a paragraph.
tell application "Pages" to set o's paras to (get paragraphs of body text of document 1)

considering case
	-- Iterate through the list of "paragraphs".
	repeat with p from 1 to (count o's paras)
		-- Only examine paragraphs with paragraph style "Body" (say).
		tell application "Pages" to set rightParagraphStyle to (name of paragraph style of paragraph p of body text of front document is "Body")
		if (rightParagraphStyle) then
			-- Extract the words (as understood by AppleScript) of each paragraph.
			set o's wrds to words of ((item p of o's paras) as text)
			-- Iterate through the list of words.
			repeat with w from 1 to (count o's wrds)
				set thisWord to (item w of o's wrds)
				-- If a word begins with a digit character, it may be a number. Try coercing it to one. If there's no error, set the character style of the corresponding word in the corresponding paragraph in the Pages document.
				if (character 1 of thisWord is in "0123456789") then
					try
						thisWord as number
						tell application "Pages"
							tell document 1
								set character style of word w of paragraph p of body text to character style "Strikethrough" -- Or whatever your required style's called.
							end tell
						end tell
					end try
				end if
			end repeat
		end if
	end repeat
end considering

Hello Nigel

(1) I’m puzzled.
What’s the need for “considering case” in a script treating only digit values ?

(2) I guess that you know that but some users may don’t, the predefined styles name are localized.
So, searching paragraphs whose style is “Body” would return nothing on a non English system.

It’s easy to get the localized value.


tell application "Pages"
	set Body_loc to localized string "Body"
end tell

Yvan KOENIG (VALLAURIS, France) jeudi 14 mars 2013 10:01:06

Hi Yvan.

It’s an attempt to speed things up, though I doubt it makes much difference here. String comparisons are faster when ‘considering case’ because the default ‘ignoring case’ routine has to check to see if different characters are in fact different cases of the same letter. When ‘considering case’, characters are either the same or not (at least as far as case is concerned), which obviously takes less time to decide. So if you know in advance that you won’t need to make allowances for case, you may as well consider it!

The style names I’ve used are just place holders for ercross to replace with whatever his styles are called.

Nigel,

Thanks again, the script works like a charm. This will be a big help to my current workflow.

Pastor Ed Cross

Thanks, I read that once but forgot it.

For the style I imagined that the question was asked to drop components with non-standard style(s) and apply to the body one.

Yvan KOENIG (VALLAURIS, France) jeudi 14 mars 2013 14:30:18

Nigel,

Here’s one more challenge if you’ve got the time, I will be more than willing to make a donation.

Is it possible for the script to ignore numbers that are within parenthesis at the beginning of the paragraph?

So in this example, I want it to ignore (Eph 3:1-5). It would have to ignore this only at the beginning of a paragraph because there is the 4 later in the paragraph that is between parenthesis, so it can’t ignore that one.

(Eph 3:1“5) “1 For this cause I Paul, the prisoner of Jesus Christ for you Gentiles, 2 If ye have heard of the dispensation of the grace of God which is given me to you-ward: 3 How that by revelation he made known unto me the mystery; (as I wrote afore in few words, 4 Whereby, when ye read, ye may understand my knowledge in the mystery of Christ) 5 Which in other ages was not made known unto the sons of men, as it is now revealed unto his holy apostles and prophets by the Spirit;”

local o -- So that the script object and its property values are discarded when the script finishes.

script
	property paras : missing value
	property wrds : missing value
end script
set o to result

-- Dump all the paragraphs of the Pages document into a variable in the script. (It's faster to work through this list than through the paragraphs in the document itself.) The "paragraphs" Pages returns are actually lists, each containing the text of a paragraph.
tell application "Pages" to set o's paras to (get paragraphs of body text of document 1)

considering case
	-- Iterate through the list of "paragraphs".
	repeat with p from 1 to (count o's paras)
		-- Only examine paragraphs with paragraph style "Body" (say).
		tell application "Pages" to set rightParagraphStyle to (name of paragraph style of paragraph p of body text of front document is "Body")
		if (rightParagraphStyle) then
			-- Extract the words (as understood by AppleScript) from each paragraph.
			set thisPara to (item p of o's paras) as text
			set o's wrds to thisPara's words
			-- By default, begin the word checks at the first word in the paragraph .
			set startWord to 1
			-- . but if the paragraph begins with "(", start at the word after the corresponding ")".
			if (thisPara begins with "(") then set startWord to startWord + (count (words from character 1 to character (offset of ")" in thisPara)) of thisPara)
			-- Iterate through the list of words, starting at the appropriate one.
			repeat with w from startWord to (count o's wrds)
				set thisWord to (item w of o's wrds)
				-- If a word begins with a digit character, it may be a number. Try coercing it to one. If there's no error, set the character style of the corresponding word in the corresponding paragraph in the Pages document.
				if (character 1 of thisWord is in "0123456789") then
					try
						thisWord as number
						tell application "Pages"
							tell document 1
								set character style of word w of paragraph p of body text to character style "Strikethrough" -- Or whatever your required style's called.
							end tell
						end tell
					end try
				end if
			end repeat
		end if
	end repeat
end considering

Nigel,

Your the man!! Script works great. Can I make a donation to MacScripter or to you?

Is there anyway this can be ported to iBooks Author, saying there is so much integration between Pages and iBooks Author.