Merge page numbers from index

Hi all,

I wrote some scripts to create an index in Quark (by character style, not with index entries goes these get lost with copying). This index will be exported to Filemaker to create a resume. Then again it will be exported to a textdocument which looks like this:

Arnica essence
	7
	8
	9
	10
	13
	19
	25
	31
Contra indicaties
	7
	13
	20
	26 and so on...

As one might see the first entry is available on page 7, 8, 9 and 10.
What I want is that lists of increasing numbers like 7, 8, 9, 10 (or 34, 35, 36, 37) look like 7-10. (Or 34-37)
So, the end result will look like this:

Arnica essence
	7-10
	13
	19
	25
	31
Contra indicaties
	7
	13
	20
	26

Can this be done by Applescript (preferable in Tex-Edit Plus)? Or maybe it is possible to do it directly in Filemaker?
Any help would be appreciated.

Furthermore:
All indents are tabs.
In the end another small script for Tex-Edit converts it to X-Tags, and the numbers will be comma-seperated. That part is already done.

Thanks in advance for thinking about this case.
Kjeld

I suppose your getting the text of all instances of a style sheet from Quark and then creating lists and then generating the output? When you’re generating the output, why not just check if each page number is the last page number + 1 and loop until you find a gap. Then you could write out firstpage - lastpage instead of each page number. Or, if you must process the finished index, loop through the paragraphs, testing the value of each to see if it’s the same as the last +1. If it is, delete the paragraph and append it to the previous one.

No?

I am creating the list with Quarks Make List. Not with a script. I can’t refer (I think) to a pagenumber within this function.

Eeh. Yes. But this sounds more simple to me than it is in working out the script, I think.
Do you have some examples? I mean, I know how to refer to the last character of a paragraph in Tex Edit, but pagenumbers can be 1, 23, 145 (more characters). I see some difficulties in this…

Kjeld

A somewhat clumsy attempt, but it’s a start:

-- s0 -> index list to start with, s1 -> compressed index
-- Explicit setup here - you'll get this from Quarks Make List
set s0 to "Arnica essence 
	7 
	8 
	9 
	10 
	13 
	19 
	25 
	31 
Contra indicaties 
	7 
	13 
	20 
	26"

set s1 to ""
set previousPage to -1
set inSequence to false
set pending to false
repeat with pp in paragraphs of s0
	if character 1 of pp != tab then -- is title
		if pending then set s1 to s1 & "-" & thisPage
		set s1 to s1 & return & pp & return
		set previousPage to -1
		set inSequence to false
		set pending to false
	else
		set thisPage to pp as integer
		if thisPage != previousPage + 1 then -- not in sequence
			if inSequence then -- close up sequence in progress
				set s1 to s1 & "-" & previousPage & return
			else
				if previousPage != -1 then set s1 to s1 & return
			end if
			set s1 to s1 & tab & thisPage
			set previousPage to thisPage
			set inSequence to false
			set pending to false
		else
			set previousPage to thisPage
			set inSequence to true
			set pending to true
		end if
	end if
end repeat
if pending then set s1 to s1 & "-" & thisPage & return
set s1 to text 2 through -1 of s1 -- remove leading return at top
s1

This assumes tabs preceding page numbers. Hope it helps…

  • Dan

Thanks for your reaction Dan.
I will give it a try one of these days when I find some time.
I will let you know if it works.

Regards,
Kjeld

Just want to remind that a simple copy-paste into Script Editor won’t work here unless (A) the spaces preceding the page numbers are replaced with a tab and (B) the 3 instances of “!=” that wound up in the compiled version must be replaced with option-“=”. It works then…

Mmh. It doesn’t work here…
I did what you wrote above but this line is giving an error:

]if character 1 of pp != tab then  

Error: Character 1 of “” can not be asked for / couldn’t be found (don’t know the exact English translation.)

Do you have any suggestions?

Kjeld - It sounds like maybe there is a blank line in s0 - Try placing a try block in the repeat:

repeat with pp in paragraphs of s0
try
...
-- body of repeat loop
...
on error errmsg
--do nothing or issue warning if you like
end try
end repeat

and see if you get proper results then. If not, please let me know…

  • Dan

Hey Dan,

Thanks for your response. The try-block worked very well.
And many thanks for this great script. I knew it was possible to edit text without a text editor, but I must admit: I hardly understand this code…
Most of the time I spent with scripting Quark or TexEdit. If you have a link where I could find some more info (dictionary?) about this way of editing text I would appreciate that. I tried to look in Code Exchange, but this part of the site doesn’t work at the moment.

I would like to improve this script on a couple of things, do you have another minute?

  1. I would like to choose the text file so I don’t need to paste it in the scriptEditor.
    I have tried things like
set s0 to contents of (choose file) 

but that didn’t work.

  1. I made a tiny mistake when I posted the question.
    In the Entry line is also a page number.
    So the first line might looke like:
Arnica essence[tab]6 
   7 

I can’t find out how to change your script to fix this.

  1. How do I add a search and replace string in this script?
    ’Cause this script is going so fast I want to replace a return & tab with ", "
    then every tab with a style sheet code
    and at last a style sheet code at the beginning of every paragraph.

Thanks again for your time and help.
kjeld

Hey Dan,

I was too quick with posting the previous questions, but still some of them remain.

Question 2 is still open and I like some more background info about question 3, but here’s what I have got so far. (Replace the ? with option+“=”)
After all I was not that bad in scripting 8-).

It creates a new file, ask for the exported file from Filemaker (which looks as mentioned above) and proceeds the new index and writes it to the new file on your desktop.


set indexFile to (choose file with prompt "Select the index file") as string
set oldIndex to read file indexFile

set fileName to (text returned of (display dialog "Set the name for the new index:" default answer ""))
set newIndex to (((path to desktop) as string) & fileName & ".txt") as file specification

set s0 to oldIndex
set s1 to ""
set previousPage to -1
set inSequence to false
set pending to false
repeat with pp in paragraphs of s0
	try
		if character 1 of pp ? tab then -- is title 
			if pending then set s1 to s1 & "–" & thisPage
			set s1 to s1 & return & "@IndexEntry:" & pp & ", "
			set previousPage to -1
			set inSequence to false
			set pending to false
		else
			set thisPage to pp as integer
			if thisPage ? previousPage + 1 then -- not in sequence 
				if inSequence then -- close up sequence in progress 
					set s1 to s1 & "–" & previousPage & ", "
				else
					if previousPage ? -1 then set s1 to s1 & ", "
				end if
				set s1 to s1 & thisPage
				set previousPage to thisPage
				set inSequence to false
				set pending to false
			else
				set previousPage to thisPage
				set inSequence to true
				set pending to true
			end if
		end if
	end try
end repeat
if pending then set s1 to s1 & "–" & thisPage & return
set s1 to text 2 through -1 of s1 -- remove leading return at top 
set s3 to searchReplace1(s1, {tab}, {"  <@IndexPageNumber>"})

--Replace tab with double space & style sheet code
to searchReplace1(thisText, searchTerm, replacement)
	set AppleScript's text item delimiters to searchTerm
	set thisText to thisText's text items
	set AppleScript's text item delimiters to replacement
	set thisText to "" & thisText
	set AppleScript's text item delimiters to {""}
	return thisText
end searchReplace1

--Write to new textfile
try
	open for access newIndex with write permission
	write (s3) to newIndex
	close access newIndex
on error
	try
		close access newIndex
	end try
end try

Kjeld - This ought to take care of your points 1and 2 and the first part of 3:

try
	set s0 to read (choose file with prompt "Where is the marks file?") -- prompt for file and read contents
on error
	return -- user hit cancel
end try

set s1 to ""
set previousPage to -1
set inSequence to false
set pending to false
repeat with pp in paragraphs of s0
	try
		if character 1 of pp != tab then -- is title 
			if pending then set s1 to s1 & "-" & thisPage
			
			-- separate title and page number
			set ofst to offset of tab in pp
			set title to text 1 through (ofst - 1) of pp
			set pg to text (ofst + 1) through -1 of pp
			
			set s1 to s1 & return & title & return & tab & pg
			set previousPage to pg as integer
			set inSequence to false
			set pending to false
		else
			set thisPage to pp as integer
			if thisPage != previousPage + 1 then -- not in sequence 
				if inSequence then -- close up sequence in progress 
					set s1 to s1 & "-" & previousPage & return
				else
					if previousPage != -1 then set s1 to s1 & return
				end if
				set s1 to s1 & tab & thisPage
				set inSequence to false
				set pending to false
			else
				set inSequence to true
				set pending to true
			end if
			set previousPage to thisPage
		end if
	end try
end repeat
if pending then set s1 to s1 & "-" & thisPage & return
set s1 to text 2 through -1 of s1 -- remove leading return at top 

-- substitute commas for return & tab
set {saveDelims, text item delimiters} to {text item delimiters, return & tab}
set lst to text items of s1
set text item delimiters to ","
set s2 to lst as text
set text item delimiters to saveDelims
s2

I’m not sure about substituting tabs with style sheet codes since as I see it there are no tabs left after the “commas for return&tab” substitution takes place. If you’d care to elaborate I’d be glad to address it.

As far as a link to info on scripting text editing, I wish I knew of one, but it’s basically just a matter of sometimes making judicious use of AppleScript’s commands and classes such as “offset” and “text item delimiters” above, and sometimes (more often for me) throwing stuff against the wall until something sticks.

Please let me know if this doesn’t work…

  • Dan

Hi Kjeld - I think our posts passed each other somewhere over the Atlantic. I was wrong in my last comment: there is of course a remaining tab immediately after the title which I realized after running your script and seeing the results. Obviously you know all about text item delimiters. Yep! Looks like you got it!

  • Dan

Seems like a standardization is not always the same. I knew about the offset thing, but only for Quark and texEdit. It looks like these apps need some more textual context than the OS X build in textEditor.

Anyway, I am pretty content with the script. It saves me and the editor a lot of work.
I will see if I can link some stuff together to make it one script and then post it in the scriptbuildersection.

Thanks man,
kjeld