Get PDF slug information

krish · April 14, 2008, 9:31am

Hi,

I have given one task. My task is,

I have given a bunch of pdf files, where each file have only one page in it. This each page have slug line at bottom saying,
Document name, Page number, Date and Time modified. This pdf can be created in any date. I want to get the slug details and copied in to the text file.

The example look of slug information is:
RXENL08ATE601_CH01_Aloud.indd T294 01/12.08 10:00:12 AM —>(in one pdf)
RXENL08ATE601_AS01_Aloud.indd T43 01/12.08 10:00:12 AM —>(in another pdf)

I am sucessfully able to get Document name and Date & Time modified. But I found difficulty in getting the page number (T294, T43) of page.

My script is below:

choose folder “Get information for PDFs in this folder:”
tell application “Finder” to set thesePDFs to (files of (result) whose name extension is “pdf”) as alias list

choose file name with prompt “Save information in this file:” default name “PDF Info.txt”
set outputFile to result

– The output will start off as a list
set theOutput to {}
set theOutput’s end to (“Title Name” & tab & “Total pages” & tab & “Date” & tab & “Time”)
set theOutput’s end to (“”)

repeat with thisItem in thesePDFs
set thisName to name of (info for thisItem)
set cdate to modification date of (info for thisItem)
set Ctime to time string of cdate
set cday to day of cdate
set cmonth to month of cdate as integer
set cyear to year of cdate as string
set cyear to characters 3 thru 4 of cyear

do shell script "/usr/bin/mdls -name kMDItemNumberOfPages -name kMDItemTitle " & (quoted form of POSIX path of thisItem)
set thisMeta to paragraphs of result
try
	set thisTitle to text 24 thru -1 of (item 3 of thisMeta)
on error
	-- This file has not title
	set thisTitle to ""
end try
-- To get page count of all pdfs	

tell application "Finder" to open thisItem
tell application "Adobe Acrobat Professional"
	tell document 1
		--	-----------------------------------------------------------------------------------------	
		--	      Here I want to get  the page number of current open PDF file. (Ex:  T45)
		--	-----------------------------------------------------------------------------------------	
		
		set theOutput's end to (thisTitle & tab & totpage & tab & cmonth & "/" & cday & "/" & cyear & tab & Ctime)
	end tell
	close document 1
end tell

end repeat

set ASTID to AppleScript’s text item delimiters
set AppleScript’s text item delimiters to {ASCII character 10}
set theOutput to theOutput as Unicode text
set AppleScript’s text item delimiters to ASTID
try
open for access outputFile with write permission
set fileRef to result
write theOutput to fileRef
close access fileRef
end try
display dialog “Script finished!” buttons {“View Output”, “OK”} default button 2

if (button returned of result) is “View Output” then
tell application “Finder” to open outputFile
end if

Can anyone help me to complete the script.

Thanks,
Krishnan

Matt-Boy · April 14, 2008, 1:04pm

I found this javascript a little while ago that will extract all the text from the current page of a PDF file. Maybe that could get you started. Once you have all the text you can figure out a way to find that slug information.

tell application "Adobe Acrobat Professional"
	set theJava to "var p = this.pageNum;
var n = this.getPageNumWords(p);
var str = \"\";
for(var i=0;i<n;i++) {
var wd = this.getPageNthWord(p, i, false);   
if(wd != \"\") str = str + wd;}"
	set theText to (do script theJava)
end tell

Mark67 · April 14, 2008, 4:14pm

Krish, I suspect your Slug is in fact a “Footer” added by Acrobat after the PDF has been made. You can do this via the menubar, batch sequence or JavaScript. If you know how the “Footer” text is formated you should be able to get it as a string using matt-boy’s JavaScript.

tell application "Adobe Acrobat 7.0 Professional"
	set The_Text to do script "var p = this.pageNum; var n = this.getPageNumWords(p); var str = ''; for(var i=0;i<n;i++) {var wd = this.getPageNthWord(p, i, false); if(wd != '') str = str + wd;}"
	set The_Footer to paragraph ((count of paragraphs of The_Text) - 1) of The_Text
end tell

krish · April 17, 2008, 6:33am

Hi,

Thanks for your answer.

Krishnan