How to get list of page numbers from Word document?

I need to get the list of page numbers from a Word document.

Word seems to have a totally different approach from InDesign and Quark in this area.

I think I did find out how to find the first and last page number of each section - which solves the problem for documents with numerical page numbers:


--first page of section
set myRange to create range active document start (start of content of text object of section 1 of active document) end (start of content of text object of section 1 of active document)

	set firstPage to get range information myRange information type active end adjusted page number
	
--last page of section
	set myRange to create range active document start (start of content of text object of section 2 of active document) end (end of content of text object of section 2 of active document)
	
	set lastPage to get range information myRange information type active end adjusted page number


However, this method always returns numerical numbers - even if the pages of a section are numbered as A, B, C etc.

How can I get a list of non-numerical page numbers?

Thanks for any help,
Leo

Hello.

I have only looked into words dictionary, and not actually tried something.

I see there is a number style in the page number options pane. If you have tried get the page number out of the page number on a page, but can’t get in its displayed format, then you’ll have to generate the page number symbol, like roman numeral, or character, or number of i’s.

This is really a much better way to have things organized, than having to access pages by their displayed page number.

Roman numerals are easy:

on romanNumeralString for aNumber
	local s
	set s to ""
	repeat while aNumber > 0
		if aNumber ≥ 1000 then
			set {s, aNumber} to {s & "M", aNumber - 1000}
		else if aNumber ≥ 900 then
			set {s, aNumber} to {s & "CM", aNumber - 900}
		else if aNumber ≥ 500 then
			set {s, aNumber} to {s & "D", aNumber - 500}
		else if aNumber ≥ 400 then
			set {s, aNumber} to {s & "CD", aNumber - 400}
		else if aNumber ≥ 100 then
			set {s, aNumber} to {s & "C", aNumber - 100}
		else if aNumber ≥ 90 then
			set {s, aNumber} to {s & "XC", aNumber - 90}
		else if aNumber ≥ 50 then
			set {s, aNumber} to {s & "L", aNumber - 50}
		else if aNumber ≥ 40 then
			set {s, aNumber} to {s & "XL", aNumber - 40}
		else if aNumber ≥ 10 then
			set {s, aNumber} to {s & "X", aNumber - 10}
		else if aNumber ≥ 9 then
			set {s, aNumber} to {s & "IX", aNumber - 9}
		else if aNumber ≥ 5 then
			set {s, aNumber} to {s & "V", aNumber - 5}
		else if aNumber ≥ 4 then
			set {s, aNumber} to {s & "IV", aNumber - 4}
		else if aNumber ≥ 1 then
			set {s, aNumber} to {s & "I", aNumber - 1}
		end if
	end repeat
	return s
end romanNumeralString

The handlers you’ll need for A-Z and such are equally easy. This should get you started, should you need it:

set a to "A"'s id
set b to character id (a + 1)

Edit
For the hell of it, or in case you have to convert the other way around, here is one that converts a roman numeral string to a number:

on romanNumeralToInteger for aRomanNumeral
	local i, aNumber, l
	set {i, aNumber, l} to {1, 0, length of aRomanNumeral}
	repeat while i ≤ l
		if (offset of "M" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 1000, i + 1}
		else if ((l - i + 1) ≥ 2 and (offset of "CM" in (text i thru -1 of aRomanNumeral))) = 1 then
			set {aNumber, i} to {aNumber + 900, i + 2}
		else if (offset of "D" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 500, i + 1}
		else if (l - i + 1) ≥ 2 and (offset of "CD" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 400, i + 2}
		else if (offset of "C" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 100, i + 1}
		else if (l - i + 1) ≥ 2 and (offset of "XC" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 90, i + 2}
		else if (offset of "L" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 50, i + 1}
		else if (l - i + 1) ≥ 2 and (offset of "XL" in (text i thru -1 of aRomanNumeral = 1)) then
			set {aNumber, i} to {aNumber + 40, i + 2}
		else if (offset of "X" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 10, i + 1}
		else if (l - i + 1) ≥ 2 and (offset of "IX" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 9, i + 2}
		else if (offset of "V" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 5, i + 1}
		else if (l - i + 1) ≥ 2 and (offset of "IV" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 4, i + 2}
		else if (offset of "I" in (text i thru -1 of aRomanNumeral)) = 1 then
			set {aNumber, i} to {aNumber + 1, i + 1}
		end if
	end repeat
	return aNumber
end romanNumeralToInteger

Thanks for the detailed reply McUsrII!

I’ll save this conversion routine as I I’m sure it will prove to be valuable.

However, my goal is simpler and doesn’t require this conversion at this point.

I don’t need to access pages by their number.

I just need a list of document page numbers to use outside Word.

Something like {A, B, C, 5, 6, 7}

I know that in InDesign I’d need a short statement like this to get just what I need:



name of pages of active document


But in Word I just can’t find a way to simply grab the numbers of every page.

Any ideas would be appreciated!

Thanks,
Leo

The thing is in Quark and inDesign is that both work with a blank page(s) and can add text boxes, lines an picture boxes in it (assuming that inDesign’s document structure is still a copy of quark). Word is an text editor and you start with an empty text, according to the page setting and markup the pages in Word is nothing more than a presentation on how it will look on print. My point is that the architecture of the document of word is completely different than that from Quark (powerpoint is closer to quark). This means there is no page with an A and B side like in Quark.

Thanks for the clarification - I hope to get to the bottom of it one day!

Leo

The other complication is that the number of pages depends on the printer and page size at the time you ask, so it’s a moving target.