Crop PDFs, but keeping the ability to extract text from the result

alexlaske · April 8, 2010, 8:45am

Hi everyone,

I am looking for a way to resize PDF files into multiple parts using Applescript. This alone is not that hard using sips or “Image Events”. But the problem with these two methods is that it seems to convert the original file into image files so that I cannot extract the text after the conversion. Does anyone have an idea on how to resize PDFs without losing the ability to extract the text afterwards?

Any help is highly appreciated!

Thanks, Alex

Martin_Michel · April 11, 2010, 12:22pm

Hi Alex,

One of our customers needed a second version of a PDF file used for printing to publish it on his website. This second version featured different crop marks, so I wrote a small command line utility in ObjC to create it. You can find the script code here, maybe it can help as a starting point.

Of course you will also be able to crop your PDF files with Adobe Acrobat and Apple-/JavaScript or with a utility like PDFClerk Pro, which is scriptable.

Best regards,

Martin

Hans-Gerd_Classen · September 30, 2010, 6:36pm

Hi Martin,

silly Question: how do I run such a ObjC-Script¿
terminal: gcc-comand to compile it runs into errors …

Many thanks

Hans

clemhoff · October 2, 2010, 6:12am

works a treat with:
gcc -o croppdf croppdf.m -Wall -g -framework Foundation -framework Quartz

Hans-Gerd_Classen · October 2, 2010, 8:22am

Updated Xcode and it runs
thx for the help
Hans

hi,

perhaps I have to say that I try this on an tiger-machine (10.4.11) …

gcc -o /Resources/croppdf /Resources/croppdf.m -Wall -g -framework Foundation -framework Quartz

got this error belonging to the Quraz-framework:

/Resources/croppdf.m:2: header file 'Quartz/Quartz.h' not found
/Resources/croppdf.m:15: undefined type, found `PDFDocument'
/Resources/croppdf.m:28: undefined type, found `PDFPage'
cpp-precomp: warning: errors during smart preprocessing, retrying in basic mode

thanks

Hans-Gerd_Classen · October 3, 2010, 4:35pm

Me again

Will such a Unix-executable of kind PowerPC compiled on 10.4.11 using XCode 2.4.1 run on Intel machines ((snow-)leo)?

Hans-Gerd_Classen · October 13, 2010, 8:47am

Hallo,

thanks to Martin Mitchel for joining his ObjC-Script.
I’ve made a few changes to set the cropmarks in the shell command.

Perhaps someone else will find this useful too …

AS-command would be like this:

		do shell script "" & PathtoObjCScript & " -x 0.0 -y 0.0 -w 400.0 -h 150.0 -pdf " & FilePath

The ObjC-Script:

#import <Foundation/Foundation.h>
#import <Quartz/Quartz.h>

int main (int argc, const char * argv[]) {
    NSAutoreleasePool * pool = [[NSAutoreleasePool alloc] init];
	// reading command line arguments
	NSUserDefaults *userDefaults = [NSUserDefaults standardUserDefaults];
	NSString *pdfFilePath = [userDefaults stringForKey:@"pdf"];
		NSString *XursprungString = [userDefaults stringForKey:@"x"];
	NSString *YursprungString = [userDefaults stringForKey:@"y"];
	NSString *thewidthString = [userDefaults stringForKey:@"w"];
	NSString *theheightString = [userDefaults stringForKey:@"h"];

	if ((pdfFilePath == NULL) || ([pdfFilePath isEqualToString:@""])) {
		printf("ERROR: Missing PDF file path (-pdf)\n");
		return 1;
	}
        
        	float Xursprung;
         {
		Xursprung = [XursprungString floatValue];
	}
        
        float Yursprung;
          {
		Yursprung = [YursprungString floatValue];
	}
        
        float thewidth;
          {
		thewidth = [thewidthString floatValue];
	}
        
         float theheight;
         {
		theheight = [theheightString floatValue];
	}
        
	// loading the pdf file
	NSURL *pdfFileURL = [NSURL fileURLWithPath:pdfFilePath];
	PDFDocument *pdfDoc	= [[PDFDocument alloc] initWithURL:pdfFileURL];
	if (pdfDoc == NULL) {
		printf("ERROR: Could not create a PDF object from the given path:\n\t%s\n", [pdfFilePath UTF8String]);
		return 1;
	}
	// 
	int pagecount = [pdfDoc pageCount];
			int i;
	for (i = 0; i < pagecount; i++) {
		PDFPage *pdfPage = [pdfDoc pageAtIndex:i];
					NSRect cropBounds = NSMakeRect(Xursprung, Yursprung, thewidth, theheight);
			[pdfPage setBounds:cropBounds forBox:kPDFDisplayBoxCropBox];
			[pdfDoc insertPage:pdfPage atIndex:i];
			[pdfDoc removePageAtIndex:i];
	}
	BOOL writesuccess = [pdfDoc writeToFile:pdfFilePath];
	if (writesuccess == NO) {
		printf("ERROR: Could not save the PDF to the given path:\n%s\n", [pdfFilePath UTF8String]);
		return 1;
	}
	[pdfDoc release];
    [pool release];
    return 0;
}

Thanks

Hans

StefanK · October 13, 2010, 9:49am

Hi,

you can get a float value directly from user defaults:

CGFloat Xursprung = (CGFloat)[userDefaults floatForKey:@"x"];

For 64 bit compatibility the type cast is changed to CGFloat