Sorting pdfs by metadata

Hello all. I hope someone can help me. I want to know if it’s possible to sort pdfs by their metadata information?

Hi Chuck,

I love everything PDF workflow and so I just couldn’t resist to write a small script for you that shows you how to do it.

My solution is based on a Python helper script that requires Mac OS X 10.5 Leopard. In case you are using an earlier incaranation of our beloved operating system, you need to install PyObjC.

To test this script, please download the helper script to your dekstop and then execute the AppleScript code below:


on run
	set pdffiles to choose file with prompt "Please choose PDF files only:" with multiple selections allowed
	set stringlist to ""
	set countpdffiles to length of pdffiles
	repeat with i from 1 to countpdffiles
		set pdffile to item i of pdffiles
		set pdffilepath to quoted form of (POSIX path of (pdffile as Unicode text))
		if i is equal to countpdffiles then
			set stringlist to stringlist & pdffilepath
		else
			set stringlist to stringlist & pdffilepath & space
		end if
	end repeat
	set pyscriptpath to quoted form of (POSIX path of (((path to desktop) as Unicode text) & "sortpdfs.py"))
	set command to "/usr/bin/python/ " & pyscriptpath & space & stringlist
	set sortedpdfpaths to paragraphs of (do shell script command)
	return sortedpdfpaths
end run

It should return the paths to the chosen PDF documents sorted by the author’s name given in the metadata. Of course by manipulating the Python script, you can also choose to sort by the creator, title, etc.

HTH!

Thanks for the help Martin. I have another question for you. Is it possible to sort the pdfs if they contain keywords?

Of course it is. Please study and download the modified Python helper script.


on run
	set pdffiles to choose file with prompt "Please choose PDF files only:" with multiple selections allowed
	set stringlist to ""
	set countpdffiles to length of pdffiles
	repeat with i from 1 to countpdffiles
		set pdffile to item i of pdffiles
		set pdffilepath to quoted form of (POSIX path of (pdffile as Unicode text))
		if i is equal to countpdffiles then
			set stringlist to stringlist & pdffilepath
		else
			set stringlist to stringlist & pdffilepath & space
		end if
	end repeat
	set pyscriptpath to quoted form of (POSIX path of (((path to desktop) as Unicode text) & "sortpdfskey.py"))
	set command to "/usr/bin/python/ " & pyscriptpath & space & stringlist
	set sortedpdfpaths to paragraphs of (do shell script command)
	return sortedpdfpaths
end run

Do you have any suggestions that are solely native applescript without the use of any secondary scripts?

You can also easily access the keywords of a PDF document by scripting the excellent and free PDF viewer Skim:


tell application "Skim"
	set pdfinfo to info of document 1
	-- not every PDF document has keywords...
	try
		set pdfkeywords to keywords of pdfinfo
		-- {"Yooooo!", "Sal Soghoian is my role model!"}
	end try
end tell

Moreover you can use the «mdls» command to get the keywords of a PDF document:


set pdfpath to quoted form of "/Users/martin/Desktop/test.pdf"
set command to "mdls -name kMDItemKeywords -raw " & pdfpath
set output to do shell script command

But the problem is the sorting, as AppleScript does not provide any convenient built-in sort functions and also does not feature key/value dictionaries like Python (or any other programming language). So you will end up with several (nested?) lists that have to be sorted and compared. That’s why I do not like this approach :wink: But it’s possible.