Consolidate Individual Pages and Page Ranges

I’m looking for a way to consolidate page-number input, so that sequential individual pages are made into page ranges or added to existing page ranges. For example:

-- AppleScript Input (a string)
"1 2 3 4 6-8 10 12 13 14 17 18-20"

-- Desired AppleScript Output (a string)
"1-4 6-8 10 12-14 17-20"

With significant help from Google AI, I wrote a solution. As presented, it’s poorly written, but I stopped without further work (which includes understanding the Google AI code), because I had to wonder if there’s an entirely different approach that might be simpler. If not, I’ll clean-up the existing script and use that. The script only took a millisecond to run, so speed is not an issue.

Thanks for any suggestions.

set oldPageString to "1 2 3 4 6-8 10 12 13 14 17 18-20 21-22"
set text item delimiters to space
set oldPageList to text items of oldPageString

--Make page ranges into individual page numbers
set tempPageList to {}
repeat with aPageItem in oldPageList
	if aPageItem does not contain "-" then
		set end of tempPageList to aPageItem as integer
	else
		set text item delimiters to "-"
		set tempRangeList to {}
		repeat with i from (text item 1 of aPageItem) to (text item 2 of aPageItem)
			set end of tempPageList to i
		end repeat
	end if
end repeat

--Most of the following is from Google AI
set rangeList to {}
set currentRangeStart to item 1 of tempPageList
set currentRangeEnd to item 1 of tempPageList

repeat with i from 2 to count of tempPageList
	set currentNum to item i of tempPageList
	set previousNum to item (i - 1) of tempPageList
	--Check if the current number is sequential to the previous
	if currentNum is equal to (previousNum + 1) then
		set currentRangeEnd to currentNum
	else
		-- If not sequential, finalize the previous range and start a new one
		addToRangeList(rangeList, currentRangeStart, currentRangeEnd)
		set currentRangeStart to currentNum
		set currentRangeEnd to currentNum
	end if
end repeat

--Add the last range after the loop finishes
addToRangeList(rangeList, currentRangeStart, currentRangeEnd)

set text item delimiters to space
set newPageList to rangeList as text
set text item delimiters to ""
return newPageList

on addToRangeList(rangeList, startNum, endNum)
	if startNum is equal to endNum then
		copy (startNum as string) to the end of rangeList
	else
		copy (startNum as string) & "-" & (endNum as string) to the end of rangeList
	end if
end addToRangeList

Hi peavine.

Here’s one possibility:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

set oldPages to "1 2 3 4 6-8 10 12 13 14 17 18-20 21-22"

set indexSet to current application's class "NSMutableIndexSet"'s indexSet()
repeat with thisEntry in splitText(oldPages, space)
	if (thisEntry contains "-") then
		set {a, z} to splitText(thisEntry, "-")
		(indexSet's addIndexesInRange:({a as integer, z - a + 1}))
	else
		(indexSet's addIndex:(thisEntry as integer))
	end if
end repeat
set theDescription to indexSet's |description|()
set substringRange to theDescription's rangeOfString:("(?<=\\()[^\\)]++(?=\\)\\])") options:(current application's NSRegularExpressionSearch) range:({0, theDescription's |length|()})
set newPages to (theDescription's substringWithRange:(substringRange)) as text

on splitText(txt, delim)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delim
	set output to txt's text items
	set AppleScript's text item delimiters to astid
	return output
end splitText
1 Like

Thanks Nigel. That works great and also takes about a millisecond to run. I’ll study that later today.

Or a simpler alternative:

set theDescription to indexSet's |description|() as text
set newPages to splitText(theDescription, {"(", ")]"})'s item -2
1 Like

Thanks Nigel. I worked through your script suggestion, and everything made sense. NSMutableIndexSet has several characteristics that make it ideal for my purpose. Its use would never have occurred to me.

Just for learning purposes, I rewrote Nigel’s excellent suggestion almost entirely in ASObjC. I changed the string input to demonstrate a few useful characteristics of the NSMutableIndexSet class.

use framework "Foundation"
use scripting additions

set oldPages to "4 6-8 10 12 13 14 17 18-20 21-22 2 3 4 1" --consider page order and page repeat
set oldPages to current application's NSString's stringWithString:oldPages
set oldPagesArray to (oldPages's componentsSeparatedByString:space)

set indexSet to current application's class "NSMutableIndexSet"'s indexSet()
repeat with thisEntry in oldPagesArray
	if ((thisEntry's containsString:"-") is true) then --a range of page numbers
		set theRange to (thisEntry's componentsSeparatedByString:"-")
		set rangeStart to (theRange's objectAtIndex:0)'s integerValue()
		set rangeEnd to (theRange's objectAtIndex:1)'s integerValue()
		(indexSet's addIndexesInRange:({rangeStart, (rangeEnd - rangeStart + 1)}))
	else --an individual page number
		(indexSet's addIndex:(thisEntry's integerValue()))
	end if
end repeat

set theDescription to indexSet's |description|() --an NSString that contains the desired data
set newPages to (theDescription's stringByReplacingOccurrencesOfString:"^.*\\((.*)\\).*$" withString:"$1" options:1024 range:{0, theDescription's |length|()}) as text --option 1024 is regex -->"1-4 6-8 10 12-14 17-22"
1 Like

And because I enjoy trying out different coding ideas, here’s another take on the almost entirely vanilla version: :slightly_smiling_face:

use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions

set oldPages to "1 2 3 4 6-8 10 12 13 14 17 18-20 21-22"

set indexSet to current application's class "NSMutableIndexSet"'s indexSet()
repeat with thisEntry in splitText(oldPages, space)
	set {a, z} to splitText(thisEntry, "-") & 1
	if (z ≠ 1) then set z to z - a + 1 -- If z is integer 1, it was appended to a single-item list above. If it's text, the list had two items.
	(indexSet's addIndexesInRange:({a as integer, z}))
end repeat
set theDescription to indexSet's |description|() as text
set newPages to splitText(theDescription, {"(", ")]"})'s item -2

on splitText(txt, delim)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delim
	set output to txt's text items
	set AppleScript's text item delimiters to astid
	return output
end splitText
1 Like

It turns out that Nigel’s excellent solution will not work with one script (a shortcut actually), which extracts pages and page ranges from a PDF. The issue is that page order and duplicates have to be retained. For example:

– AppleScript Input (a string)
“12 2 3 4 6 8 8 9-10 11 12 14”

– Desired AppleScript Output (a string)
“12 2-4 6 8 8-12 14”

–NOTES
–A page item in the format “14-12” will be tested for and will cause an error.
–Sequential pages and page ranges are combined from left to right.
–Consecutive individual pages or page ranges are allowed and should be returned.
–There’s no reason anyone would enter the above input, but I want to account for all possibilities.

Just by chance, my AppleScript in post 1 does what I want. I thought I’d post just in case anyone has any ideas (either AppleScript or shell script). The edited AppleScript included below works fine and is plenty fast (0.4 millisecond) but is so very long and involved.

BTW, I checked with Google AI for a bash script that will do what I want. The suggestion basically took the same approach as the AppleScript and was perhaps even more convoluted (plus it didn’t seem to work). It would run faster in a shortcut, though. Thanks for any suggestions.

set oldPages to "12 2 3 4 6 8 8 9-10 11 12 14" --user input
set individualPages to getIndividualPages(oldPages) --a list with expanded page ranges
set newPages to getNewPages(individualPages) --a list with sequential pages and page ranges combined
set text item delimiters to " "
set newPages to newPages as text
set text item delimiters to ""
return newPages -->"12 2-4 6 8 8-12 14"

on getIndividualPages(oldPages)
	set text item delimiters to " "
	set oldPageList to text items of oldPages
	set text item delimiters to "-"
	set individualPageList to {}
	repeat with aPageItem in oldPageList
		if aPageItem does not contain "-" then
			set end of individualPageList to aPageItem as integer
		else
			set tempRangeList to {}
			repeat with i from (text item 1 of aPageItem) to (text item 2 of aPageItem)
				set end of individualPageList to i
			end repeat
		end if
	end repeat
	return individualPageList
end getIndividualPages

on getNewPages(individualPages)
	set newPages to {}
	set startPage to item 1 of individualPages
	set endPage to item 1 of individualPages
	
	repeat with i from 2 to (count individualPages)
		set aPriorPage to item (i - 1) of individualPages
		set aCurrentPage to item i of individualPages
		if (aPriorPage + 1) is equal to aCurrentPage then --part of a range (do not add)
			set endPage to aCurrentPage
		else --not part of a range (add individual page or page range)
			if startPage is equal to endPage then --an individual page
				set end of newPages to (startPage as text)
			else --a page range
				set end of newPages to (startPage as text) & "-" & (endPage as text)
			end if
			set startPage to aCurrentPage
			set endPage to aCurrentPage
		end if
	end repeat
	
	if startPage is equal to endPage then --add last page
		set end of newPages to (startPage as text) --an individual page
	else
		set end of newPages to (startPage as text) & "-" & (endPage as text) --a page range
	end if
	
	return newPages
end getNewPages
1 Like

Hi peavine.

This version of your script has a slightly different take on the getNewPages() handler:

set oldPages to "12 2 3 4 6 8 8 9-10 11 12 14" --user input
set individualPages to getIndividualPages(oldPages) --a list with expanded page ranges
set newPages to getNewPages(individualPages) --a list with sequential pages and page ranges combined
set text item delimiters to " "
set newPages to newPages as text
set text item delimiters to ""
return newPages -->"12 2-4 6 8 8-12 14"

on getIndividualPages(oldPages)
	set text item delimiters to " "
	set oldPageList to text items of oldPages
	set text item delimiters to "-"
	set individualPageList to {}
	repeat with aPageItem in oldPageList
		if aPageItem does not contain "-" then
			set end of individualPageList to aPageItem as integer
		else
			--set tempRangeList to {}
			repeat with i from (text item 1 of aPageItem) to (text item 2 of aPageItem)
				set end of individualPageList to i
			end repeat
		end if
	end repeat
	return individualPageList
end getIndividualPages

on getNewPages(individualPages)
	set newPages to {}
	set pageCount to (count individualPages)
	set i to 1
	repeat until (i > pageCount)
		set startPage to item i of individualPages
		set k to i
		repeat with j from (i + 1) to pageCount
			if ((item j of individualPages) - startPage ≠ j - i) then exit repeat
			set k to j
		end repeat
		if (k = i) then
			set end of newPages to startPage as text
		else
			set end of newPages to (startPage as text) & "-" & item k of individualPages
		end if
		set i to k + 1
	end repeat
	
	return newPages
end getNewPages
1 Like

I was able to get the Google AI shell script working. I much prefer the AppleScript solution, though.

Combine Pages and Page Ranges.shortcut (22.3 KB)

Thanks Nigel. That works great and is much cleaner. The timing result remains less than a millisecond.

I implemented Nigel’s AppleScript from post 9 and my Bash script in post 10 in the mentioned shortcut. I then ran timing tests that extracted 34 individual pages and page ranges from a PDF that contained 159 pages. I expected the Bash solution to be faster, because shortcuts generally do a poor job of running AppleScripts. Surprisingly, the AppleScript solution was 20 milliseconds faster. I guess that’s why testing important scripts and shortcuts is a good idea.

Eat your heart out, Google! :wink:

set oldPages to "12 2 3-6 8 8 9-10 11 12"

set newPages to (do shell script ¬
	"individualPages=$(eval echo $(sed -E 's/([0-9]+)-([0-9]+)/{\\1..\\2}/g' <<<'" & oldPages & "'))
	read -a pageArray <<< \"$individualPages\"
	pageCount=${#pageArray[@]}
	output=\"\"
	
	i=0
	until [[ $i == $pageCount ]] ; do
		startPage=${pageArray[i]}
		k=$i
		for (( j=$((i + 1)) ; $j < $pageCount && $((${pageArray[j]} - startPage)) == $((j - i)) ; j++ )) ; do
			k=$j
		done		
		if [[ $k == $i ]] ; then 
			output+=\"$startPage \"
		else
			output+=\"$startPage-${pageArray[k]} \"
		fi
		i=$((k + 1))
	done
	echo  $output")
1 Like

Call me obsessive. :face_with_spiral_eyes: This solution just parses the given figures rather than expanding the ranges first:

set oldPageString to "1 2 3 4 6-8 8 10 12 13 2 14 17 18-20 21-22 23"
set newPageString to consolidateRanges(oldPageString)

on consolidateRanges(oldPageString)
	set inputList to splitText(oldPageString, space)
	set outputList to {}
	
	-- Initialise range start/end variables to the page number(s) from the input's first entry.
	set {rangeStart, rangeEnd} to splitText(inputList's beginning, "-")'s {beginning, end}
	repeat with i from 2 to (count inputList)
		-- Get the page number(s) from each subsequent entry.
		set {pn1, pn2} to splitText(inputList's item i, "-")'s {beginning, end}
		-- If the (first) number follows on from the current range end, update the range end to the (other) number.
		-- Otherwise output the current range and start a new one with this entry's figures.
		if (pn1 as integer = rangeEnd + 1) then
			set rangeEnd to pn2
		else
			if (rangeEnd ≠ rangeStart) then set rangeEnd to rangeStart & "-" & rangeEnd
			set outputList's end to rangeEnd
			set {rangeStart, rangeEnd} to {pn1, pn2}
		end if
	end repeat
	-- Output the range in progress when the repeat ended,
	if (rangeEnd ≠ rangeStart) then set rangeEnd to rangeStart & "-" & rangeEnd
	set outputList's end to rangeEnd
	
	return joinText(outputList, space)
end consolidateRanges

on splitText(txt, delim)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delim
	set output to txt's text items
	set AppleScript's text item delimiters to astid
	return output
end splitText

on joinText(lst, delim)
	set astid to AppleScript's text item delimiters
	set AppleScript's text item delimiters to delim
	set output to lst as text
	set AppleScript's text item delimiters to astid
	return output
end joinText
1 Like

Nigel. Thanks for the new suggestion, which works great.

My original interest in this topic was primarily academic, but I ended up incorporating one solution in a much-used shortcut (here). Because the solution was used in a shortcut, a shell script is preferred over an AppleScript; execution speed is important but not critical; and code brevity is more desirable then might normally the case. For these reasons, the solution I used is:

Combine Consecutive Sequential Pages.shortcut (21.8 KB)

The actual shell script (which incorporates your most-recent shell script suggestion) is:

#!/bin/bash

# Expand page ranges.
IFS=' ' read -r -a page_items <<< "12 1 2 3 5 7 8-9 11"
for item in "${page_items[@]}"; do
  if [[ $item == *-* ]]; then
    for ((i=${item%%-*}; i<=${item#*-}; i++)); do
      pages+=($i)
    done
  else
    pages+=($item)
  fi
done

# Combine consecutive sequential pages and page ranges.
page_count=${#pages[@]}
i=0
until [[ $i == $page_count ]]; do
  start_page=${pages[i]}
  k=$i
  for (( j=$((i + 1)); $j < $page_count && \
    $((${pages[j]} - start_page)) == \
    $((j - i)); j++ )); do
    k=$j
  done		
  if [[ $k == $i ]]; then 
    output+="$start_page "
  else
    output+="$start_page-${pages[k]} "
  fi
  i=$((k + 1))
done

printf "${output% }"

All of the AppleScript solutions took less than a millisecond to run in Script Geek. The timing result of the above shell script was 2.8 milliseconds, and the timing result of the my original shell script (which incorporated a Google AI suggestion) was 4.2 milliseconds.

1 Like

HI peavine.

Here’s a shell version of my AppleScript in post 14. Like that one, this dispenses with the array of individual numbers.

set oldPageString to "1 2 3 4 6-8 8 10 12 13 2 14 17 18-20 21-22 23 99 100 101"

do shell script "read -a input_array <<<" & quoted form of oldPageString & "
input_count=${#input_array[@]}
output=''

# Truly combine consecutive sequential pages and page ranges
# without generating an array of all the individual numbers.
item=${input_array[0]}
range_start=${item%%-*}
range_end=${item#*-}
for (( i=1 ; $i < $input_count ; i++ )) ; do
	item=${input_array[i]}
	pn1=${item%%-*}
	pn2=${item#*-}
	if [[ $((range_end + 1 )) == $pn1 ]] ; then
		range_end=$pn2
	else
		new_item=$range_start
		if [[ $range_end != $range_start ]] ; then { new_item+=\"-$range_end\" ; } fi
		output+=\"$new_item \"
		range_start=$pn1
		range_end=$pn2
	fi
done
new_item=$range_start
if [[ $range_end != $range_start ]] ; then { new_item+=\"-$range_end\" ; } fi
output+=\"$new_item\"
printf \"$output\""
1 Like

Thanks Nigel. That works great.

I tested your new script against the script in the prior post and both took about 2.7 milliseconds using the input string in your new script. I also tested these same scripts with user input of “1-50 100-150”. Your script took 2.3 milliseconds and the script in the prior post took 3.3 milliseconds. The difference is obviously not important, but it does demonstrate the penalty associated with separately expanding the page ranges.

BTW, I was curious as to the purpose of curly braces in the following line. The documentation states that this format is normally used to group commands, but that wouldn’t appear to be the case here.

{ new_item+="-$range_end" ; } fi

Hi peavine.

The BASH “man” shows the parameters for ‘if’ as:

Similarly with the various repeats. In some languages, the actions in an ‘if’ or repeat statement would be enclosed in braces:

if [[ $range_end != $range_start ]] ; then {
    new_item+=\"-$range_end\" ;
} fi

… so I guessed (whether correctly or not I don’t know) that this was what the manual meant by list. In BASH, these braces and semicolons appear to be optional with multi-line statements, but to be necessary with one-liners.

1 Like