I’m looking for a way to consolidate page-number input, so that sequential individual pages are made into page ranges or added to existing page ranges. For example:
With significant help from Google AI, I wrote a solution. As presented, it’s poorly written, but I stopped without further work (which includes understanding the Google AI code), because I had to wonder if there’s an entirely different approach that might be simpler. If not, I’ll clean-up the existing script and use that. The script only took a millisecond to run, so speed is not an issue.
Thanks for any suggestions.
set oldPageString to "1 2 3 4 6-8 10 12 13 14 17 18-20 21-22"
set text item delimiters to space
set oldPageList to text items of oldPageString
--Make page ranges into individual page numbers
set tempPageList to {}
repeat with aPageItem in oldPageList
if aPageItem does not contain "-" then
set end of tempPageList to aPageItem as integer
else
set text item delimiters to "-"
set tempRangeList to {}
repeat with i from (text item 1 of aPageItem) to (text item 2 of aPageItem)
set end of tempPageList to i
end repeat
end if
end repeat
--Most of the following is from Google AI
set rangeList to {}
set currentRangeStart to item 1 of tempPageList
set currentRangeEnd to item 1 of tempPageList
repeat with i from 2 to count of tempPageList
set currentNum to item i of tempPageList
set previousNum to item (i - 1) of tempPageList
--Check if the current number is sequential to the previous
if currentNum is equal to (previousNum + 1) then
set currentRangeEnd to currentNum
else
-- If not sequential, finalize the previous range and start a new one
addToRangeList(rangeList, currentRangeStart, currentRangeEnd)
set currentRangeStart to currentNum
set currentRangeEnd to currentNum
end if
end repeat
--Add the last range after the loop finishes
addToRangeList(rangeList, currentRangeStart, currentRangeEnd)
set text item delimiters to space
set newPageList to rangeList as text
set text item delimiters to ""
return newPageList
on addToRangeList(rangeList, startNum, endNum)
if startNum is equal to endNum then
copy (startNum as string) to the end of rangeList
else
copy (startNum as string) & "-" & (endNum as string) to the end of rangeList
end if
end addToRangeList
use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions
set oldPages to "1 2 3 4 6-8 10 12 13 14 17 18-20 21-22"
set indexSet to current application's class "NSMutableIndexSet"'s indexSet()
repeat with thisEntry in splitText(oldPages, space)
if (thisEntry contains "-") then
set {a, z} to splitText(thisEntry, "-")
(indexSet's addIndexesInRange:({a as integer, z - a + 1}))
else
(indexSet's addIndex:(thisEntry as integer))
end if
end repeat
set theDescription to indexSet's |description|()
set substringRange to theDescription's rangeOfString:("(?<=\\()[^\\)]++(?=\\)\\])") options:(current application's NSRegularExpressionSearch) range:({0, theDescription's |length|()})
set newPages to (theDescription's substringWithRange:(substringRange)) as text
on splitText(txt, delim)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to delim
set output to txt's text items
set AppleScript's text item delimiters to astid
return output
end splitText
Thanks Nigel. I worked through your script suggestion, and everything made sense. NSMutableIndexSet has several characteristics that make it ideal for my purpose. Its use would never have occurred to me.
Just for learning purposes, I rewrote Nigel’s excellent suggestion almost entirely in ASObjC. I changed the string input to demonstrate a few useful characteristics of the NSMutableIndexSet class.
use framework "Foundation"
use scripting additions
set oldPages to "4 6-8 10 12 13 14 17 18-20 21-22 2 3 4 1" --consider page order and page repeat
set oldPages to current application's NSString's stringWithString:oldPages
set oldPagesArray to (oldPages's componentsSeparatedByString:space)
set indexSet to current application's class "NSMutableIndexSet"'s indexSet()
repeat with thisEntry in oldPagesArray
if ((thisEntry's containsString:"-") is true) then --a range of page numbers
set theRange to (thisEntry's componentsSeparatedByString:"-")
set rangeStart to (theRange's objectAtIndex:0)'s integerValue()
set rangeEnd to (theRange's objectAtIndex:1)'s integerValue()
(indexSet's addIndexesInRange:({rangeStart, (rangeEnd - rangeStart + 1)}))
else --an individual page number
(indexSet's addIndex:(thisEntry's integerValue()))
end if
end repeat
set theDescription to indexSet's |description|() --an NSString that contains the desired data
set newPages to (theDescription's stringByReplacingOccurrencesOfString:"^.*\\((.*)\\).*$" withString:"$1" options:1024 range:{0, theDescription's |length|()}) as text --option 1024 is regex -->"1-4 6-8 10 12-14 17-22"
And because I enjoy trying out different coding ideas, here’s another take on the almost entirely vanilla version:
use AppleScript version "2.4" -- OS X 10.10 (Yosemite) or later
use framework "Foundation"
use scripting additions
set oldPages to "1 2 3 4 6-8 10 12 13 14 17 18-20 21-22"
set indexSet to current application's class "NSMutableIndexSet"'s indexSet()
repeat with thisEntry in splitText(oldPages, space)
set {a, z} to splitText(thisEntry, "-") & 1
if (z ≠ 1) then set z to z - a + 1 -- If z is integer 1, it was appended to a single-item list above. If it's text, the list had two items.
(indexSet's addIndexesInRange:({a as integer, z}))
end repeat
set theDescription to indexSet's |description|() as text
set newPages to splitText(theDescription, {"(", ")]"})'s item -2
on splitText(txt, delim)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to delim
set output to txt's text items
set AppleScript's text item delimiters to astid
return output
end splitText
It turns out that Nigel’s excellent solution will not work with one script (a shortcut actually), which extracts pages and page ranges from a PDF. The issue is that page order and duplicates have to be retained. For example:
–NOTES
–A page item in the format “14-12” will be tested for and will cause an error.
–Sequential pages and page ranges are combined from left to right.
–Consecutive individual pages or page ranges are allowed and should be returned.
–There’s no reason anyone would enter the above input, but I want to account for all possibilities.
Just by chance, my AppleScript in post 1 does what I want. I thought I’d post just in case anyone has any ideas (either AppleScript or shell script). The edited AppleScript included below works fine and is plenty fast (0.4 millisecond) but is so very long and involved.
BTW, I checked with Google AI for a bash script that will do what I want. The suggestion basically took the same approach as the AppleScript and was perhaps even more convoluted (plus it didn’t seem to work). It would run faster in a shortcut, though. Thanks for any suggestions.
set oldPages to "12 2 3 4 6 8 8 9-10 11 12 14" --user input
set individualPages to getIndividualPages(oldPages) --a list with expanded page ranges
set newPages to getNewPages(individualPages) --a list with sequential pages and page ranges combined
set text item delimiters to " "
set newPages to newPages as text
set text item delimiters to ""
return newPages -->"12 2-4 6 8 8-12 14"
on getIndividualPages(oldPages)
set text item delimiters to " "
set oldPageList to text items of oldPages
set text item delimiters to "-"
set individualPageList to {}
repeat with aPageItem in oldPageList
if aPageItem does not contain "-" then
set end of individualPageList to aPageItem as integer
else
set tempRangeList to {}
repeat with i from (text item 1 of aPageItem) to (text item 2 of aPageItem)
set end of individualPageList to i
end repeat
end if
end repeat
return individualPageList
end getIndividualPages
on getNewPages(individualPages)
set newPages to {}
set startPage to item 1 of individualPages
set endPage to item 1 of individualPages
repeat with i from 2 to (count individualPages)
set aPriorPage to item (i - 1) of individualPages
set aCurrentPage to item i of individualPages
if (aPriorPage + 1) is equal to aCurrentPage then --part of a range (do not add)
set endPage to aCurrentPage
else --not part of a range (add individual page or page range)
if startPage is equal to endPage then --an individual page
set end of newPages to (startPage as text)
else --a page range
set end of newPages to (startPage as text) & "-" & (endPage as text)
end if
set startPage to aCurrentPage
set endPage to aCurrentPage
end if
end repeat
if startPage is equal to endPage then --add last page
set end of newPages to (startPage as text) --an individual page
else
set end of newPages to (startPage as text) & "-" & (endPage as text) --a page range
end if
return newPages
end getNewPages
This version of your script has a slightly different take on the getNewPages() handler:
set oldPages to "12 2 3 4 6 8 8 9-10 11 12 14" --user input
set individualPages to getIndividualPages(oldPages) --a list with expanded page ranges
set newPages to getNewPages(individualPages) --a list with sequential pages and page ranges combined
set text item delimiters to " "
set newPages to newPages as text
set text item delimiters to ""
return newPages -->"12 2-4 6 8 8-12 14"
on getIndividualPages(oldPages)
set text item delimiters to " "
set oldPageList to text items of oldPages
set text item delimiters to "-"
set individualPageList to {}
repeat with aPageItem in oldPageList
if aPageItem does not contain "-" then
set end of individualPageList to aPageItem as integer
else
--set tempRangeList to {}
repeat with i from (text item 1 of aPageItem) to (text item 2 of aPageItem)
set end of individualPageList to i
end repeat
end if
end repeat
return individualPageList
end getIndividualPages
on getNewPages(individualPages)
set newPages to {}
set pageCount to (count individualPages)
set i to 1
repeat until (i > pageCount)
set startPage to item i of individualPages
set k to i
repeat with j from (i + 1) to pageCount
if ((item j of individualPages) - startPage ≠ j - i) then exit repeat
set k to j
end repeat
if (k = i) then
set end of newPages to startPage as text
else
set end of newPages to (startPage as text) & "-" & item k of individualPages
end if
set i to k + 1
end repeat
return newPages
end getNewPages
I implemented Nigel’s AppleScript from post 9 and my Bash script in post 10 in the mentioned shortcut. I then ran timing tests that extracted 34 individual pages and page ranges from a PDF that contained 159 pages. I expected the Bash solution to be faster, because shortcuts generally do a poor job of running AppleScripts. Surprisingly, the AppleScript solution was 20 milliseconds faster. I guess that’s why testing important scripts and shortcuts is a good idea.
Call me obsessive. This solution just parses the given figures rather than expanding the ranges first:
set oldPageString to "1 2 3 4 6-8 8 10 12 13 2 14 17 18-20 21-22 23"
set newPageString to consolidateRanges(oldPageString)
on consolidateRanges(oldPageString)
set inputList to splitText(oldPageString, space)
set outputList to {}
-- Initialise range start/end variables to the page number(s) from the input's first entry.
set {rangeStart, rangeEnd} to splitText(inputList's beginning, "-")'s {beginning, end}
repeat with i from 2 to (count inputList)
-- Get the page number(s) from each subsequent entry.
set {pn1, pn2} to splitText(inputList's item i, "-")'s {beginning, end}
-- If the (first) number follows on from the current range end, update the range end to the (other) number.
-- Otherwise output the current range and start a new one with this entry's figures.
if (pn1 as integer = rangeEnd + 1) then
set rangeEnd to pn2
else
if (rangeEnd ≠ rangeStart) then set rangeEnd to rangeStart & "-" & rangeEnd
set outputList's end to rangeEnd
set {rangeStart, rangeEnd} to {pn1, pn2}
end if
end repeat
-- Output the range in progress when the repeat ended,
if (rangeEnd ≠ rangeStart) then set rangeEnd to rangeStart & "-" & rangeEnd
set outputList's end to rangeEnd
return joinText(outputList, space)
end consolidateRanges
on splitText(txt, delim)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to delim
set output to txt's text items
set AppleScript's text item delimiters to astid
return output
end splitText
on joinText(lst, delim)
set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to delim
set output to lst as text
set AppleScript's text item delimiters to astid
return output
end joinText
Nigel. Thanks for the new suggestion, which works great.
My original interest in this topic was primarily academic, but I ended up incorporating one solution in a much-used shortcut (here). Because the solution was used in a shortcut, a shell script is preferred over an AppleScript; execution speed is important but not critical; and code brevity is more desirable then might normally the case. For these reasons, the solution I used is:
The actual shell script (which incorporates your most-recent shell script suggestion) is:
#!/bin/bash
# Expand page ranges.
IFS=' ' read -r -a page_items <<< "12 1 2 3 5 7 8-9 11"
for item in "${page_items[@]}"; do
if [[ $item == *-* ]]; then
for ((i=${item%%-*}; i<=${item#*-}; i++)); do
pages+=($i)
done
else
pages+=($item)
fi
done
# Combine consecutive sequential pages and page ranges.
page_count=${#pages[@]}
i=0
until [[ $i == $page_count ]]; do
start_page=${pages[i]}
k=$i
for (( j=$((i + 1)); $j < $page_count && \
$((${pages[j]} - start_page)) == \
$((j - i)); j++ )); do
k=$j
done
if [[ $k == $i ]]; then
output+="$start_page "
else
output+="$start_page-${pages[k]} "
fi
i=$((k + 1))
done
printf "${output% }"
All of the AppleScript solutions took less than a millisecond to run in Script Geek. The timing result of the above shell script was 2.8 milliseconds, and the timing result of the my original shell script (which incorporated a Google AI suggestion) was 4.2 milliseconds.
I tested your new script against the script in the prior post and both took about 2.7 milliseconds using the input string in your new script. I also tested these same scripts with user input of “1-50 100-150”. Your script took 2.3 milliseconds and the script in the prior post took 3.3 milliseconds. The difference is obviously not important, but it does demonstrate the penalty associated with separately expanding the page ranges.
BTW, I was curious as to the purpose of curly braces in the following line. The documentation states that this format is normally used to group commands, but that wouldn’t appear to be the case here.
Similarly with the various repeats. In some languages, the actions in an ‘if’ or repeat statement would be enclosed in braces:
if [[ $range_end != $range_start ]] ; then {
new_item+=\"-$range_end\" ;
} fi
… so I guessed (whether correctly or not I don’t know) that this was what the manual meant by list. In BASH, these braces and semicolons appear to be optional with multi-line statements, but to be necessary with one-liners.