Acrobat Metadata?

Hi,

I’m new to Apple script and was hoping that I could use it to help with this task I have.

I have a folder full of quark pdf s (they’re named numerically e.g. 3345678.pdf). I also have an excel file with the page sizes and colours of these pdf s.

What is the best way to compare the page size and colour of the pdfs in the folder to the page size and colour report in the excel file? Can I use apple script to automate this? Perhaps extract metadata from the pdf s?

Thanks

J.
OS X 10.4

Depending on what colors you are trying to compare, you may be better off using Illustrator instead of Acrobat, or try reading the raw PDF code.

-N

I was mainly thinking of comparing CMYK to CMYK or Grayscale to Grayscale. No specific colours (green to yellow - not like that). So more like a colour scheme or profile. Thanks for your question, it helped me clarify.

John,

If I remember correctly, in theory a PDF can have objects of different color spaces embedded, so I don’t think there is a color space for the document as a whole. You may want to look into this.

-N

Good point. However, I do run these PDFs through Kodak’s Prinergy Compression program. Among other things, it standardizes the colour to a CMYK or a Grayscale. I’m just trying to find out if I can check the file if it’s colour or bw (CMYK or Grayscale) by using the metadata in the PDF. :slight_smile:

John,

Again, I don’t think there is such a thing as a CMYK document…there are CMYK objects within the document, but the document itself does not have a color. (Not sure about the PDF-X workflows and what info gets included)

Lastly, when you open a PDF now, where in your metadata does it tell you the color mode? I’m still on Acrobat 6, so I may be missing something in newer versions.

-N

Good point, I’m beginning to see what you’re saying. Is there a program that is intelligently able to differentiate between colour and black-white/grayscale documents?

Thanks so much for your input.

John :smiley:

There’s probably other tools out there, but without researching, the only one I’d personally use in this situation is Flightcheck.

I’ve got a serious love/hate relationship with it, but it’ ll do what you need.

-N

I have no idea how you do this. I tried following a thread by Calvin Ford which I think went in this direction but got to complex for my understanding. But a grayscale PDF has no Separation object in the file head where as a CMYK file has “Separation/All/DeviceCMYK” and multi Ink Spot colour will have list of objects for each "DeviceN’ Separation. I have a few instances where I would like to check for info held here. Is there an easier way may be using Satimage (i’ve yet to look at this)

I wonder if I can setup a check list for this. For example, if the doc. contains objects that have a colour profile - then I can map it - as a colour or grayscale doc. or spot document by matching the objects in the PDF to a list of different object colour profiles (e.g. DOT GAIN 22 or Adobe RGB), and then accordingly labelling the doc. Do you think this will work?

Is there an easier way than Flight Check? I’m unfamiliar with the software. Can it check pdfs against date in Excel files by scripting?

The thread by Calvin Ford, any chance it’s still up? I’m up for the complexity. Thanks Mark and Nedloh.

To follow along with what Mark has suggested, you’d really need to look at the PDF output from your specific device.

I work with PDF’s coming from (many) multiple sources (RIP’s and Desktops) and it seems that each device, or SW version writes it’s own particular code regardless of it’s stated PDF version compliance.

Flightcheck will identify the colorspace of each object within the PDF. We use Applescript/FC for multiple tasks. It’s a bit slow for some things, but we’re only processing tens of files a day. If it was hundreds, I’d probably look into something else.

-N

Nedloh

There are different batches, some are in the 70s, while the others are over 200 number range, whereas the thickest bunch of files are over 500. I guess I will have to consider other options then. Suggestions?

Thanks :smiley:

I’ve found something interesting. The PDFs I’m referring to are Quark PDFs. When I check the metadata (Document Properties in Acrobat 8 Prof.) I find colour in 2 separate portions. First, it’s available in Description > Additional Metadata > Advanced > http://ns.adobe.com/pdfx/1.3/ > XPressPrivate: %%DocumentProcessColors: Cyan Magenta Yellow Black … Second I can find it in Custom (List: XPressPrivate %%DocumentProcessColors: Cyan Magenta Yellow Black …

I’ve noticed that usually colour contains all four separations (CMYK), whereas my spot (MAGENTA) is represented by (MK) and black and white is represented by either true black (K) or a harsher stronger black (MYK).

Now the question is, how do I obtain this data and use it to check against a excel file?

Thanks

:smiley:

If that info is available “on the surface” in Acrobat 8, you may want to see if you can get the metadata via Applescript

Check Acrobat’s AS dictionary or see if Adobe has a scripting reference available for Version 8 . Version 6 has a pretty comprehensive guide that goes with it.

-N

Oh dear. So I’ve realized that when running through Prinergy’s Evo, the program doesn’t hold any colour profiles - in metadata. Any further suggestions on how to separate colour from black and white pdfs? Thanks!

How about having Acrobat’s Preflight sort them for you?

Thanks for that suggestion.

I’m just wondering about preflight? Does it mean I have to open the 240 files that I have, in acrobat so that preflight can separate the colour from the black and white pdfs? It looks time consuming! But I’m really new to preflight, so I could be completely mistaken.

:slight_smile:

I’m only vaguely aware of what you are trying to do, so bear with me…I saw some similarities to my own need to discern things about files without opening them…a “detective” script, as it were.

I saw folks mentioning “%%” strings insude these PDFs. Tothe original poster:

Have you ever taken a peek inside your PDFs to see if you can discern the differences manually first?

You can use the code below to make a drag-n-drop to do a dump to screen of the PDF contents via Hexdump. If you can find your differences manually, then it’s a simple matter of writing a routine to check for it (also below).

Hexdump “viewer” for manual scanning:
Just drag-n-drop the file or files and the Hexdump will be opened in TextEdit for viewing.

--
-- Get Hexdump Info v4
-- by Kevin Quosig, 3/28/07
--
-- Used to drag-n-drop files to examine their contents/headers.
--
-- Most code segments courtesy of James Nierodzik of MacScripter
-- http://bbs.applescript.net/profile.php?id=8727
--


--
-- UTILITY HANDLER
--

-- Search and Replace routine using AppleScript Text Item Delimiters "trick"
--
on searchNreplace(parse_me, find_me, replace_with_me)
	
	--save incoming TID state, set new TIDs
	set {ATID, AppleScript's text item delimiters} to {"", find_me}
	
	--using the specified character as a break point to strip the delimiter out and break the string into items
	set being_parsed to text items of parse_me
	
	--switch the TIDs again (replace string)
	set AppleScript's text item delimiters to {replace_with_me}
	
	--coerce it back to a string with new delimiters
	set parse_me to being_parsed as string
	
	--restore incoming TID state
	set AppleScript's text item delimiters to ATID
	
	--return results
	return parse_me
	
end searchNreplace


--
-- MAIN HANDLER
--
on open fileList
	
	-- parse through files dropped onto droplet
	repeat with i from 1 to number of items in fileList
		
		set AppleScript's text item delimiters to {""} --reset delimiters
		set this_item to item i of fileList as string ---pick item to work with
		set this_item_posix to quoted form of POSIX path of this_item --need POSIX path for shell scripts
		set doc_name to name of (info for alias this_item) --used for renaming the TextEdit window
		
		--Improved hexdump script line by TheMouthofSauron at MacScripter
		--http://bbs.applescript.net/viewtopic.php?pid=77811#p77811
		--
		--hexdump with the -C parameter formats the hexdump as columns of hex pairs
		--and then a column with a human-readable "ASCII translation" delimited by a pipe
		--character at the beginning and end of the ASCII column
		--
		--"awk" takes the entire -C formatted hexdump line ($0 = all arguements)
		--and filters-out the hex pairs and the delimiting of pipe characters
		--(return only 16 characters starting at position 62)
		--
		set hex_dump to (do shell script "hexdump -C " & this_item_posix & " | awk '{print(substr($0,62,16))}'")
		
		--remove carriage returns so output is one giant paragraph
		--(allows for TextEdit searching for strings and manual scanning)
		set hex_dump to searchNreplace(hex_dump, return, "")
		
		--write to TextEdit window and rename window to file name to keep things straight
		tell application "TextEdit"
			make new document
			set text of front document to hex_dump
			set name of front window to doc_name
		end tell
	end repeat
end open

GREP Checker:
Takes a file path and a list of items to search for.

-- revised GREP routine courtesy of
-- Bruce Phillips of MacScripter
-- http://bbs.applescript.net/viewtopic.php?pid=83871#p83871

on grepForString(path_to_grep, search_list)
	
	repeat with current_grep_item in search_list
		try --known bug between AppleScript and GREP where if GREP finds nothing, AppleScript errors-out
			do shell script "/usr/bin/grep --count " & quoted form of current_grep_item & " " & quoted form of POSIX path of path_to_grep
			set grep_result to result
			exit repeat
		on error error_message number error_number
			if error_message is "0" then
				-- grep didn't find anything
				set grep_result to 0
			else
				-- pass on the error
				error error_message number error_number
			end if
		end try
	end repeat
	
	if grep_result is 0 then set current_grep_item to "<nothing found>"

	return {grep_result, contents of current_grep_item}

end grepForString

Calvin,

I’m really unaware of hexdump. So please pardon the ignorance. Is it a app that comes with OS X or …

I myself am quite anti-re-frying, but the powers that be above me have decided to do that. A Quark PDF is re-placed in a Quark Document with folios. It’s then post-scripted. This postscript file (ps) is then pushed out to Prinergy - it becomes a PDF.

As you know a PS holds no colour profiles. So at the very end, when I’m trying to separate the colour pdfs from the black and white pdfs, it’s a hard task - has to be done manually. Of course once this is done, I want to rename the colour pdfs a separate prefix and the black and white pdfs a separate prefix. I think I’ve found a useable script for the renaming. But the colour differentiation is the head breaker.

I was just trying to figure a way to automate it.

As one of the posts suggested previously, running 240 files through Acrobat-Preflight is time consuming…sort of defeats the purpose, unless I’m doing something wrong

Thanks

John

:smiley:

Yes, hexdump is a command-level UNIX utility.

Basically my “solution” was to examine a hexdump for things that might give clues to the nature of a file (have one CMYK and one Greyscale test file that you make yourself). If you can manually discern a difference, then you’d use the second routine I provided to give the unique string and do a search against it. If it came back true for a certain search, you could then do your name changes.

That make any sense? I wasn’t sure how verbose to be.

–Kevin