Using Skim to Crop PDF into Defined Segments

Dear MacScripters,

Full Discolsure - I am a Noob

Every day I receive a rail timetable - It’s my job to transpose the data into an Excel Spreadsheet. The timetable is uniform in layout, and if I could crop set areas each into separate PDF files I could extract the text from those areas and infer the data.

As Preview isn’t scriptable :rolleyes:, I’m trying to use Skim, but haven’t got very far (basically nowhere).

Could anyone kick me off? Let’s suppose I have a A4 PDF open and I want to crop out a box 100px by 100px, 50px by 50px in from the top left corner, and open that cropped section as a new PDF.

Any help gratefully received,

Ben

Hey Ben,

I’m not acquainted with scripting Skim, but it appears to have possibilities:


set AppleScript's text item delimiters to linefeed & linefeed
tell application "Skim"
	tell front document
		set _text to text of pages 6 thru 8 as text
	end tell
end tell

It looks like the command-line tool pdftotext does a better job of formatting the output though.

http://www.bluem.net/files/pdftotext.dmg

If your pdf is not locked and the data you need is regular enough, it should be easy enough to parse it out.

You may try with this code :


tell application "Skim"
	tell document 1
		# set {yTop, yBottom, xRight, xLeft} to get bounds for page 1
		set {rectWidth, rectHight} to {100, 100}
		# set cropBounds to {yTop, rectHight, rectWidth, xLeft}
		set cropBounds to {0, rectHight, rectWidth, 0}
		grab page 1 for cropBounds as PDF
	end tell
	set the clipboard to result
end tell
my activateguiscripting()
my selectmenu("Skim", 3, 1)

#=====

on activateguiscripting()
	(* to be sure than GUI scripting will be active *)
	tell application "System Events"
		if not (UI elements enabled) then set (UI elements enabled) to true
	end tell
end activateguiscripting

#=====
(*
my selectMenu("Pages",5, 12)
==== Uses GUIscripting ====
*)
on selectmenu(theApp, mt, mi)
	activate application theApp
	tell application "System Events" to tell process theApp to tell menu bar 1 to ¬
		tell menu bar item mt to tell menu 1 to click menu item mi
end selectmenu

#=====

The only obscure part was to understand what Skim defines as bounds properties which is a bit odd.
I kept (commented) the instructions which I used to get the wanted infos and cleaned the instructions to the really needed ones.

KOENIG Yvan (VALLAURIS, France) mercredi 15 mai 2013 16:54:45

That’s pretty spiffy Yvan.

Even so when you pull the text out it doesn’t seem to have much in the way of logical sequence.

I’m wondering if the OP might have better luck with an OCR program that would preserve the table.

Yvan, Chris,

Many thanks to you both for all your help.

Chris and I have been having a email conversation in which I’ve shared an example timetable - the conclusion being that finding any a discernible pattern in the text, as exported from the PDF as a whole is pretty difficult.

I’m very happily playing around with Yvan’s script and am able to place the cropping rectangle wherever desired. Once I’ve got the various sections defined I’ll be onto the next section of the challenge.

Thanks again to you both for helping me move this forwards, Scripting is great fun when people give their time to help you learn.

Hello.

I hope you make it with Yvan’s smart script! I’ll present an alternative anyway, and that is to convert the pdf into html, with the utility pdftohtml, which can be installed from macports and homebrew. (I prefer macports, but homebbrew is simpler), and maybe other package managers as well.

Converting PDF to html is a tricky business I have read, so if it work’s with Yvan’s script, I’d look no further!

Ps. It interested me to see if there were any scripts for converting pdf to text with skim, and I found an applescript bundle that contains a commandline utility called pdf2text. here.
D.s

The posted script did its duty flawlessly here.

More, as it doesn’t use the CUT featue, I guess that it may apply if the area to grab is in the page but is not visible on the sceen (mainly page bottom).

For sure, I took care to leave the extended instructions visible (but commented) to give an easy way to define an other area to extract.

KOENIG Yvan (VALLAURIS, France) mercredi 15 mai 2013 21:08:02