Rasterize pdf

Hi, guys!
I’m newbie in AppleScript and didn’t have any coding experience. I’m trying to make a script to rasterize pdf files. Basically i try to first convert selected pdf file to tiff format, and second convert this tiff image back to pdf with the same name in the same folder, i.e. overwrite file. I try to use preview app to convert pdf to tiff, but no luck. And second i don’t see how to make it work when i choose several pdfs at ones (i use automator quick action for Finder, to select pdf, and then send it to AppleScript).
Will be very grateful for any help.

ayden27. The Preview app does have the ability to save a PDF as TIFF–at least with Catalina. Just open the PDF in Preview, hold down the option key and select “File” > “Save As”, and select TIFF as the file format. Then, repeat this to save the TIFF as a PDF. You could automate this process with GUI scripting. It’s important to note that Catalina’s Preview will save a multipage PDF as a multipage TIFF.

The following will do what you want with an AppleScript but it has a fatal flaw, which is that the sips utility appears unable to create a multipage TIFF. The same appears to be the case with Image Events.

set sourceFiles to (choose file with multiple selections allowed)

repeat with aFile in sourceFiles
	set aFile to quoted form of POSIX path of aFile
	repeat with aFormat in {"tiff", "pdf"}
		do shell script "sips --setProperty format " & (aFormat as text) & " " & aFile & " --out " & aFile
		delay 0.5 -- test different values
	end repeat
end repeat

My guess is that ASObjC could be used to do what you want but I don’t have the knowledge level to accomplish this. Perhaps another forum member will let you know if this is possible.

I was interested in this article.

As I see it, scripting the “Preview” interface is a good way to convert PDF to multipage TIFF. There is an AsObjC script for reverse conversion. But I would like to understand what is the meaning of these operations - there and back. What is the advantage of this?

Other point. As I see, sips and Image Events can export single TIFFs. I think, AsObjC can convert each page of PDF to single TIFFs set, as well. And, as I know, exists AsObjC script for merging TIFFs. So, “Preview” GUI scripting may be avoided. But firstly I want know: what is meaning.

NOTE: on Catalina “Save As…” is “Export…” menu item of interface.

If you convert a PDF to a TIFF and back again, then it is no longer searchable and can’t be indexed. Possibly that is the reason, but certainly ayden27 can say more about it.

I wrote here AsObjC solution for me and users, without involving Preview GUI scripting. I improve script’s speed further (creating and using RAM-disk instead of using Temporary Items folder of user domain):


use scripting additions
use framework "Foundation"
use framework "AppKit"
use framework "Quartz"
use framework "QuartzCore"

property NSString : a reference to NSString of current application
property |NSURL| : a reference to |NSURL| of current application
property PDFDocument : a reference to PDFDocument of current application
property NSImage : a reference to NSImage of current application
property NSImageView : a reference to NSImageView of current application
property NSBitmapImageRep : a reference to NSBitmapImageRep of current application
property NSTIFFFileType : a reference to NSTIFFFileType of current application
property desktopFolder : path to desktop folder

-- Create RAM disk
my makeRAMdisk()
-- Choose some pdf file
set aPDF to choose file of type "pdf"
-- Make tempopary job folder at RAM disk
tell application "System Events" to try
	make new folder at folder "Ram disk:" with properties {name:"TIFFs"}
end try
-- Save PDF as multiple TIFFs at the tempopary job folder
set TIFFs to my splitPDFasTIFFs(aPDF)
-- Merge TIFFs back to single PDF file, saved on the desktop
my combineFiles:TIFFs savingToPDF:(POSIX path of desktopFolder & "Combined.pdf")
-- Now, we can delete unneeded temporary folder
tell application "System Events" to delete folder "Ram disk:TIFFs:"
-- eject RAM disk (if need)
tell application "Finder" to eject "Ram disk:"


--=================================== HANDLERS =======================================

on makeRAMdisk()
	set dName to "RAM Disk"
	set dCapacity to 512 * 2 * 2000 --1GB
	set aCmd to "diskutil erasevolume HFS+ '" & dName & "' `hdiutil attach -nomount ram://" & (dCapacity as string) & "`"
	do shell script aCmd
end makeRAMdisk


on splitPDFasTIFFs(aPDF)
	set aURL to (|NSURL|'s fileURLWithPath:(POSIX path of aPDF))
	set aPDFdoc to PDFDocument's alloc()'s initWithURL:aURL
	set pCount to aPDFdoc's pageCount()
	set TIFFs to {}
	-- Split the PDF into pages exported as Tiff files
	repeat with i from 0 to (pCount - 1)
		set thisPage to (aPDFdoc's pageAtIndex:i)
		set thisDoc to (NSImage's alloc()'s initWithData:(thisPage's dataRepresentation()))
		if thisDoc = missing value then error "Error in getting imagerep from PDF in page:" & (i as string)
		set theData to thisDoc's TIFFRepresentation()
		set newRep to (NSBitmapImageRep's imageRepWithData:theData)
		set targData to (newRep's representationUsingType:NSTIFFFileType |properties|:{NSTIFFCompressionNone:1})
		set nextPath to "/Volumes/RAM Disk/TIFFs/" & i & ".tiff"
		set end of TIFFs to nextPath
		set outPath to (NSString's stringWithString:nextPath)
		(targData's writeToFile:outPath atomically:true) -- Export
	end repeat
	return TIFFs
end splitPDFasTIFFs


on combineFiles:TIFFs savingToPDF:destPosixPath
	-- make new empty PDF document
	set theDoc to PDFDocument's alloc()'s init()
	repeat with i from 0 to (count TIFFs) - 1
		-- make URL of the next PDF
		set inNSURL to (|NSURL|'s fileURLWithPath:(item (i + 1) of TIFFs))
		-- make PDF document from the URL
		set newDoc to (my pdfDocFromImageURL:inNSURL)
		-- get page of PDF
		set thePDFPage to (newDoc's pageAtIndex:0) -- zero-based indexes
		-- insert the page into main PDF
		(theDoc's insertPage:thePDFPage atIndex:i)
	end repeat
	set outNSURL to |NSURL|'s fileURLWithPath:destPosixPath
	-- save the main PDF
	(theDoc's writeToURL:outNSURL)
end combineFiles:savingToPDF:


on pdfDocFromImageURL:inNSURL
	set theImage to NSImage's alloc()'s initWithContentsOfURL:inNSURL
	set theSize to theImage's |size|()
	set theRect to {{0, 0}, theSize}
	set theImageView to NSImageView's alloc()'s initWithFrame:theRect
	theImageView's setImage:theImage
	set theData to theImageView's dataWithPDFInsideRect:theRect
	return PDFDocument's alloc()'s initWithData:theData
end pdfDocFromImageURL:

I wouldn’t call it a fatal flaw.

sips is a tool for working with raster images and colour profiles. Image Events provides access to its functionality.

PDF is a vector format, and thus sips is not built for such a purpose.

You can probably skip the intermediate files and RAM disk altogether, like this:

on rasterPDF:aPDF savingTo:destPosixPath
	set aURL to (|NSURL|'s fileURLWithPath:(POSIX path of aPDF))
	set aPDFdoc to PDFDocument's alloc()'s initWithURL:aURL
	set pCount to aPDFdoc's pageCount()
	repeat with i from 0 to (pCount - 1)
		set thisPage to (aPDFdoc's pageAtIndex:i)
		set thisDoc to (NSImage's alloc()'s initWithData:(thisPage's dataRepresentation()))
		if thisDoc = missing value then error "Error in getting image from PDF in page:" & (i as string)
		set theData to thisDoc's TIFFRepresentation()
		set theImage to (NSImage's alloc()'s initWithData:theData)
		set newPage to (current application's PDFPage's alloc's initWithImage:theImage)
		(aPDFdoc's removePageAtIndex:i)
		(aPDFdoc's insertPage:newPage atIndex:i)
	end repeat
	set outNSURL to |NSURL|'s fileURLWithPath:destPosixPath
	aPDFdoc's writeToURL:outNSURL
end rasterPDF:savingTo:

There are no words. Great. I will keep both scripts for myself, as a keepsake. Your script, Shane, is what I call optimal. By the way, I haven’t found the slightest information on PDF rasterization using AsObjC before.

I tested two scripts with 168-pages PDF. The speed is almost same (my is slightly faster), but resulting PDF of Shane script is 42 MB and resulting PDF of my script is 9 MB. I don’t understand why so big difference between them.

Is it possible to increase the resolution within the script? Say… to 150 dpi? The output seems like it’s about 72 dpi.

@Shane. I agree with KniazidisR–your script is outstanding. Very useful and beautifully compact.

BTW, the script did not work until I inserted “current application’s” in several spots. Is there some reason these are not needed?

@Mockman. I meant the words, fatal flaw, to refer to my script and its inability to fulfill the OP’s needs. Perhaps I should have been more clear.

FWIW, I tested Shane’s and KniazidisR’s scripts and used as a test document Shane’s ASObjC book (a PDF). I also tested with Preview (save as TIFF at 72 dpi and then as PDF) The file sizes were:

Original - 2.4 MB

With Shane’s script - 104.5 MB

With KniazidisR’s script - 21.5 MB

With Preview - 18.1 MB

I looked at the new PDF’s and Shane’s was as expected but the pages of the PDF created by KniazidisR’s script were out of order. This appears to be fixed by padding the counter used with the naming of the TIFF files.

set j to text -3 thru -1 of ("000" & i as text)
set outPath to (NSString's stringWithString:("/Volumes/RAM Disk/TIFFs/" & j & ".tiff"))

The PDFs created by KniazidisR’s and Shane’s scripts were both 72 dpi.

This version lets you specify the resolution:

on rasterPDF:aPDF savingTo:destPosixPath resolution:theDpi
	set aURL to (|NSURL|'s fileURLWithPath:(POSIX path of aPDF))
	set aPDFdoc to PDFDocument's alloc()'s initWithURL:aURL
	set pCount to aPDFdoc's pageCount()
	repeat with i from 0 to (pCount - 1)
		set thisPage to (aPDFdoc's pageAtIndex:i)
		-- do size calculations
		set pageSize to (thisPage's boundsForBox:(current application's kPDFDisplayBoxMediaBox))
		set pageWidth to current application's NSWidth(pageSize)
		set pageHeight to current application's NSHeight(pageSize)
		set pixelWidth to (pageWidth * theDpi / 72) div 1
		set pixelHeight to (pageHeight * theDpi / 72) div 1
		-- make bitmaps
		set theImageRep to (current application's NSPDFImageRep's imageRepWithData:(thisPage's dataRepresentation()))
		set newRep to (current application's NSBitmapImageRep's alloc()'s initWithBitmapDataPlanes:(missing value) pixelsWide:pixelWidth pixelsHigh:pixelHeight bitsPerSample:8 samplesPerPixel:4 hasAlpha:yes isPlanar:false colorSpaceName:(current application's NSDeviceRGBColorSpace) bytesPerRow:0 bitsPerPixel:32)
		-- store the existing graphics context
		current application's NSGraphicsContext's saveGraphicsState()
		-- set graphics context to new context based on the new bitmapImageRep
		(current application's NSGraphicsContext's setCurrentContext:(current application's NSGraphicsContext's graphicsContextWithBitmapImageRep:newRep))
		(theImageRep's drawInRect:{origin:{x:0, y:0}, |size|:{width:pixelWidth, height:pixelHeight}} fromRect:(current application's NSZeroRect) operation:(current application's NSCompositeSourceOver) fraction:1.0 respectFlipped:false hints:(missing value))
		-- restore state
		current application's NSGraphicsContext's restoreGraphicsState()
		-- make new image and page
		(newRep's setSize:{pageWidth, pageHeight})
		set theData to newRep's TIFFRepresentation()
		set theImage to (NSImage's alloc()'s initWithData:theData)
		set newPage to (current application's PDFPage's alloc's initWithImage:theImage)
		(aPDFdoc's removePageAtIndex:i)
		(aPDFdoc's insertPage:newPage atIndex:i)
	end repeat
	set outNSURL to |NSURL|'s fileURLWithPath:destPosixPath
	aPDFdoc's writeToURL:outNSURL
end rasterPDF:savingTo:resolution:

Thank you, Peavine, for your consideration. My script was not in the order of the pages. I made the correct fix in post #5, only slightly more efficient than padding the filename.

Also, I removed the unnecessary repeat loop (in the combineFiles handler) and now the script is 1.5 times faster. (And creates the PDF with size close to size of PDF created by Preview method.)

Shane, thanks for your last script. I ran 2 tests.

Your script successfully worked with a 5-page PDF and DPI = 600. With a 168-page PDF and DPI = 600, the script hangs and I get a message that the Script Debugger is not responding and takes 62 GB of memory !!!

It looks like a memory leak somewhere in the script. What do you say?

I tried your handler as following. Maybe alloc() statements need some additional parentheses?


use scripting additions
use framework "Foundation"
use framework "AppKit"
use framework "Quartz"
use framework "QuartzCore"

set aPDF to choose file of type "pdf"
set destPosixPath to (POSIX path of (path to desktop folder)) & "/Rasterized.pdf"
my rasterPDF:aPDF savingTo:destPosixPath resolution:600

on rasterPDF:aPDF savingTo:destPosixPath resolution:theDpi
	set aURL to (current application's |NSURL|'s fileURLWithPath:(POSIX path of aPDF))
	set aPDFdoc to current application's PDFDocument's alloc()'s initWithURL:aURL
	set pCount to aPDFdoc's pageCount()
	-- do size calculations
	set thisPage to (aPDFdoc's pageAtIndex:0)
	set pageSize to (thisPage's boundsForBox:(current application's kPDFDisplayBoxMediaBox))
	set pageWidth to current application's NSWidth(pageSize)
	set pageHeight to current application's NSHeight(pageSize)
	set pixelWidth to (pageWidth * theDpi / 72) div 1
	set pixelHeight to (pageHeight * theDpi / 72) div 1
	repeat with i from 0 to (pCount - 1)
		set thisPage to (aPDFdoc's pageAtIndex:i)
		-- make bitmaps
		set theImageRep to (current application's NSPDFImageRep's imageRepWithData:(thisPage's dataRepresentation()))
		set newRep to (current application's NSBitmapImageRep's alloc()'s initWithBitmapDataPlanes:(missing value) pixelsWide:pixelWidth pixelsHigh:pixelHeight bitsPerSample:8 samplesPerPixel:4 hasAlpha:yes isPlanar:false colorSpaceName:(current application's NSDeviceRGBColorSpace) bytesPerRow:0 bitsPerPixel:32)
		-- store the existing graphics context
		current application's NSGraphicsContext's saveGraphicsState()
		-- set graphics context to new context based on the new bitmapImageRep
		(current application's NSGraphicsContext's setCurrentContext:(current application's NSGraphicsContext's graphicsContextWithBitmapImageRep:newRep))
		(theImageRep's drawInRect:{origin:{x:0, y:0}, |size|:{width:pixelWidth, height:pixelHeight}} fromRect:(current application's NSZeroRect) operation:(current application's NSCompositeSourceOver) fraction:1.0 respectFlipped:false hints:(missing value))
		-- restore state
		current application's NSGraphicsContext's restoreGraphicsState()
		-- make new image and page
		(newRep's setSize:{pageWidth, pageHeight})
		set theData to newRep's TIFFRepresentation()
		set theImage to (current application's NSImage's alloc()'s initWithData:theData)
		set newPage to (current application's PDFPage's alloc()'s initWithImage:theImage)
		(aPDFdoc's removePageAtIndex:i)
		(aPDFdoc's insertPage:newPage atIndex:i)
	end repeat
	set outNSURL to current application's |NSURL|'s fileURLWithPath:destPosixPath
	aPDFdoc's writeToURL:outNSURL
end rasterPDF:savingTo:resolution:

The issue, sadly, is that ASObjC leaks memory badly, period. Initially it relied on automatic garbage collection, but when that was abandoned memory management was presumably just tacked on to AppleScript’s own, periodic, garbage collection. But even that doesn’t seem to clear everything out.

In most cases, the OS’s efficient overall memory management means it doesn’t matter much. But when you push it hard – which you’re doing in that test – it tends to bog down more or less completely.

(The leaking is such that if you run a batch of tests, the memory use is cumulative. I suspect that’s a straight bug.)

Further to that: the poor memory management is one of the reasons I withdrew my book on how to write ASObjC-based apps in Xcode. It’s just too easy to write apps that then crash intermittently because of memory problems (sometimes because a clean-up appears to have happened).

But it’s generally fine in applets, which mostly just run and quit.

Yes, this is a very unpleasant incident. Thanks for the clarification. I was coding the movie (97% of CPU, about), when your script tested.

I wishes for other Title: ASObjC-based apps in PyObjC :), or maybe it has the same fate.