Skim now allows to grab part of a PDF page as TIFF or PDF data

Back then when I worked for an electroplating company I needed a script, which extracted part of a PDF document (patent) and saved it into a new file to be managed in a special database. I wrote a complicated Python script to do this, but now Skim - an excellent, free and well scriptable PDF viewer - supports this functionality out of the box:


tell application "Skim"
	tell document 1
		set tiffdata to grab page 1 for {0, 300, 300, 0} with TIFF format
		set pdfdata to grab page 1 for {0, 300, 300, 0} without TIFF format
		-- add your handler to save the data to a file
	end tell
end tell

I found this to be extremely helpful.

I found your post really interesting, i’m trying to extract a page from a multipage PDF file and save it to a new (single page) PDF file.
I want to do this through AppleScript or Automator, but really can’t figure out how. Searching a bit the forum (and google) figured out that the preview application isn’t really scriptable, so i installed skim thinking it could do the job easily, but can’t find out how to “paste” the selected page to a new document (and save it).

Thanks in advance

Giovanni

Hi Giovanni,

I have written a small foundation tool for you, which extracts a given page number from a PDF input file and writes it to a new PDF output file. I named it getpage and you can have a look at the short source code here.

In order to get the idea how to use it in combination with AppleScript, I built a sample droplet named pagegetter for you, which can be downloaded for free here. It also contains the ready-for-use command line tool.

If you drop a bunch of PDF files onto the sample script, it will ask you for the page number to extract (e.g. 15) and then process all dropped files, extracting this page number and writing it to a new file.

The new files are saved in the same location as the original files, but their naming scheme is a bit different:

Original file name: test.pdf
Page number to extract: 15
New file name: test_15.pdf

Currently, an existing file path will never be overwritten by the script.

The command line tool getpage will run on Mac OS X 10.5 or higher, I tested the sample script on 10.6.

I hope you can actually make good use of this.

Best regards from snowy Berlin,

Martin


property mytitle : "pagegetter"

-- I am called when the user opens the script with a double click
on run
	tell me
		activate
		display dialog "Please drop a bunch of PDF files onto my icon to extract a certain page number from each of them into a new file." buttons {"OK"} default button 1 with icon note with title mytitle
	end tell
end run

-- I am called when the user drops Finder items onto the script's icon
on open finderitems
	try
		-- searching for PDF files
		set pdffiles to {}
		repeat with finderitem in finderitems
			set finderiteminfo to info for finderitem
			if (not folder of finderiteminfo) and (name of finderiteminfo ends with ".pdf") then
				set pdffiles to pdffiles & finderitem
			end if
		end repeat
		
		-- no PDF files found :(
		if pdffiles is {} then
			set errmsg to "Could not find any PDF documents in the dropped Finder items."
			my dsperrmsg(errmsg, "--")
			return
		end if
		
		-- which page number should be extracted?
		set pagenumber to my askforpagenumber()
		
		-- locating the command line tool inside my bundle...
		set toolpath to ((path to me) as text) & "Contents:Resources:getpage"
		set qtdtoolpath to quoted form of POSIX path of toolpath
		
		-- processing the found PDF files
		repeat with pdffile in pdffiles
			set pdffileinfo to info for pdffile
			set pdffilename to (name of pdffileinfo) as text
			set outputpdffilename to ((characters 1 through -5 of pdffilename) & "_" & pagenumber & ".pdf") as text
			set outputpdffilepath to (my getparentfolderpath((pdffile as text)) & outputpdffilename) as text
			
			if not my itempathexists(outputpdffilepath) then
				set command to qtdtoolpath & " -i " & quoted form of POSIX path of pdffile & " -p " & pagenumber & " -o " & quoted form of POSIX path of outputpdffilepath
				try
					do shell script command
				on error errmsg number errnum
					my dsperrmsg(errmsg, errnum)
				end try
			end if
		end repeat
	on error errmsg number errnum
		if errnum is not equal to -128 then
			my dsperrmsg(errmsg, errnum)
		end if
	end try
end open

-- I am asking the user to choose the page number to be extracted from the PDF files
on askforpagenumber()
	try
		tell me
			activate
			display dialog "Which page number should be extracted from the PDF files?" default answer "" buttons {"Cancel", "Enter"} default button 2 with icon note with title mytitle
			set dlgresult to result
		end tell
		set answer to text returned of dlgresult
		if answer is "" then
			my askforpagenumber()
		else
			try
				set pagenumber to answer as integer
				if pagenumber is equal to 0 then
					my askforpagenumber()
				else
					-- no more calls: we have a winner!
					return pagenumber
				end if
			on error
				my askforpagenumber()
			end try
		end if
	on error
		return missing value
	end try
end askforpagenumber

-- I am indicating if a given item path already exists
on itempathexists(itempath)
	try
		set itemalias to itempath as alias
		return true
	on error
		return false
	end try
end itempathexists

-- I am returning the parent folder path of a given item path
on getparentfolderpath(itempath)
	set olddelims to AppleScript's text item delimiters
	set AppleScript's text item delimiters to ":"
	set itemcount to (count text items of itempath)
	set lastitem to the last text item of itempath
	if lastitem = "" then
		set itemcount to itemcount - 2 -- folder path
	else
		set itemcount to itemcount - 1 -- file path
	end if
	set parentfolderpath to text 1 thru text item itemcount of itempath & ":"
	set AppleScript's text item delimiters to olddelims
	return parentfolderpath
end getparentfolderpath

-- I am displaying error messages to the user
on dsperrmsg(errmsg, errnum)
	tell me
		activate
		display dialog "Sorry, an error occurred:" & return & return & errmsg & " (" & errnum & ")" buttons {"OK"} default button 1 with icon stop with title mytitle
	end tell
end dsperrmsg

Sorry for the late answer! Wow looks great! Thank you so much, I’ve been working on something similar, and came out with a script that might be used to send each page of a PDF to Word in PDF format.
It gives the user the chance to choose the PDF’s size on the Word Page, and adda custom caption. Then I modified it to be used as a Custom Service (through Automator).

Take a look at http://giovannimedici.altervista.org/