Converting HTML to PDF

I have web pages named A001.html, A002.html and so on till A100.html.
I use cups-pdf to print webpages as PDF is Safari.
How to write an applescript that makes these html files to PDF?
(I can later combine these PDF pages into a book with an Automator workflow)

sorry for the post
looks like this may solve the problem: http://bbs.macscripter.net/viewtopic.php?id=14765
i will try it

The last script in the above link converts HTML to rtf. Any idea how to convert them to pdf?

Hi Lance,

I just rewrote this script to batch convert HTML files to PDF using cups-pdf. Save the AppleScript code below as an application bundle. Then you can drop HTML files onto it and it will print them to PDF with cups-pdf (must be the current default printer):


-- created: 26.10.2008
-- tested on/with:
-- ¢ Mac OS X 10.5.5
-- ¢ Safari 3.1.1
-- ¢ Intel & PowerPC based Macs
-- requires:
-- ¢ cups-pdf package
-- you can get cups-pdf for free right here:
-- >> http://www.codepoetry.net/projects/cups-pdf-for-mosx

-- This AppleScript droplet batch converts dropped HTML files to
-- PDF using the Safari browser and cups-pdf. The script assumes that
-- cups-pdf is your current default printer. If you need to use specific 
-- print settings, then please create a printer preset for cups-pdf.

property mytitle : "html2pdf"

-- I am called when the user open the script with a double click
on run
	tell me
		activate
		display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of HTML files onto my icon to batch convert them to PDF using cups-pdf." buttons {"OK"} default button 1 with title mytitle with icon note
	end tell
end run

-- I am called when the user drops Finder items onto the script icon
on open droppeditems
	try
		set htmlpaths to {}
		-- did the user drop any *.html files onto the script?
		repeat with droppeditem in droppeditems
			if (droppeditem as Unicode text) ends with ".html" then
				set htmlpaths to htmlpaths & (droppeditem as Unicode text)
			end if
		end repeat
		-- no, he or she didn't!
		if htmlpaths is {} then
			set errmsg to "You did not drop any *.html files onto me."
			my dsperrmsg(errmsg, "--")
		else
			-- using Safari, cups-pdf and the «print without print dialog» command to batch
			-- convert the HTML files to PDF
			tell application "Safari"
				repeat with htmlpath in htmlpaths
					open (htmlpath as alias)
					delay 3
					set docloaded to false
					repeat 10 times
						delay 1
						set docstate to (do JavaScript "document.readyState" in document 1)
						if docstate is "complete" then
							set docloaded to true
							exit repeat
						end if
					end repeat
					if docloaded is true then
						print document 1 without print dialog
					end if
					close document 1
				end repeat
			end tell
		end if
		-- catching unexpected errors
	on error errmsg number errnum
		my dsperrmsg(errmsg, errnum)
	end try
end open

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
	tell me
		activate
		display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
	end tell
end dsperrmsg

Maybe it can be of some use for you.

Thanks, Martin. I appreciate your help.
i can use Automator to combine pdf files. But i have not been able to figure out what sequence does it use to combine the pdf files. Hopefully, it does it in the ascending order of the name of the file. (It would be very difficult to verify this since there are more 100 pages) Do you know about it? Or can apple script do it?
(Since i have printed the HTML with cups-pdf, the name of the files are now job_170-xxx, job_171-yyy etc.)

If you want to keep the original file name, then you can use the following script. It does not rely on «cups-pdf», but uses the internal «/System/Library/Printers/Libraries/convert» program instead. It will place the produced PDF documents into the same directory as where the corresponding HTML files are located.


-- created: 23.04.2008
-- tested on/with:
-- ¢ Mac OS X 10.5.2
-- ¢ Safari 3.1.1
-- ¢ Intel & PowerPC based Macs

-- This AppleScript droplet batch converts dropped HTML files to PDF.

property mytitle : "html2pdf"

-- I am called when the user open the script with a double click
on run
	tell me
		activate
		display dialog "I am an AppleScript droplet." & return & return & "Please drop a bunch of HTML files onto my icon to batch convert them to PDF." buttons {"OK"} default button 1 with title mytitle with icon note
	end tell
end run

-- I am called when the user drops Finder items onto the script icon
on open droppeditems
	try
		set htmlpaths to {}
		-- did the user drop any *.html files onto the script?
		repeat with droppeditem in droppeditems
			if (droppeditem as Unicode text) ends with ".html" then
				set htmlpaths to htmlpaths & (droppeditem as Unicode text)
			end if
		end repeat
		-- no, he or she didn't!
		if htmlpaths is {} then
			set errmsg to "You did not drop any *.html files onto me."
			my dsperrmsg(errmsg, "--")
		else
			repeat with htmlpath in htmlpaths
				set htmlfileinfo to info for (htmlpath as alias)
				set htmlfilename to name of htmlfileinfo
				set rhtmlfilename to (reverse of (characters 1 through -1 of htmlfilename)) as Unicode text
				set offsetdot to offset of "." in rhtmlfilename
				set pdffilename to ((characters 1 through -(offsetdot + 1) of htmlfilename) as Unicode text) & ".pdf"
				set htmldirpath to my getpardirpath(htmlpath as Unicode text)
				set pdfpath to htmldirpath & pdffilename
				try
					set pdfpath to pdfpath as alias
				on error
					set htmlpath to quoted form of (POSIX path of htmlpath)
					set pdfpath to quoted form of (POSIX path of pdfpath)
					set command to "/System/Library/Printers/Libraries/convert -f " & htmlpath & " -o " & pdfpath
					do shell script command
				end try
			end repeat
		end if
		-- catching unexpected errors
	on error errmsg number errnum
		my dsperrmsg(errmsg, errnum)
	end try
end open

-- returns the parent folder of an item as a Mac path! ':'
-- expects «itempath» to be a Mac path! ':'
-- origin: http://www.fischer-bayern.de/applescript/html/parent_f.html
on getpardirpath(itempath)
	set olddelims to AppleScript's text item delimiters
	set AppleScript's text item delimiters to {":"}
	set counttxtitems to (count text items of itempath)
	set lasttxtitem to the last text item of itempath
	if lasttxtitem = "" then
		set counttxtitems to counttxtitems - 2 -- bei Pfad zu einem Ordner 
	else
		set counttxtitems to counttxtitems - 1 -- bei Pfad zu einer Datei
	end if
	set pardirpath to text 1 thru text item counttxtitems of itempath & ":"
	set AppleScript's text item delimiters to olddelims
	return pardirpath
end getpardirpath

-- I am displaying error messages
on dsperrmsg(errmsg, errnum)
	tell me
		activate
		display dialog "Sorry, an error occured:" & return & return & errmsg & " (" & errnum & ")" buttons {"Never mind"} default button 1 with icon stop with title mytitle
	end tell
end dsperrmsg

The above script is a better way of converting HTML to pdf.

say "Thanks a lot, Martin"

very cool! I never knew about /System/Library/Printers/Libraries/convert.

How hard would it be to make this work with files not supported by MIME? I would love to give my users a handy way to batch-convert Word or Open Office files to PDF.