Extract Preview.pdf from corrupted Pages or Numbers flatfile

I was asked to write a script extracting the Preview.pdf available in Numbers and Pages flatfiles when such file is corrupted.

I built a script doing the trick but I’m not really satisfied of the scheme used.
So I post it here hoping that one of you will have the good idea.


--{code}
--[SCRIPT recup_PDF]
(*
Enregistrer le script en tant que Script : recup_PDF.scpt
déplacer le fichier ainsi créé dans le dossier
<VolumeDeDémarrage>:Users:<votreCompte>:Library:Scripts:

aller au menu Scripts , puis choisir  recup_PDF
Demande de sélectionner un 'fichier moderne' de Numbers ou Pages
et tente d'en extraire le fichier Preview.pdf.

Il est également possible d'enregistrer le script en tant que progiciel (application sous 10.6 et au-delà )
et de glisser déposer l'icône du document à traiter sur celle du script application.

--=====

L'aide du Finder explique:
L'Utilitaire AppleScript permet d'activer le Menu des scripts :
Ouvrez l'Utilitaire AppleScript situé dans le dossier Applications/AppleScript.
Cochez la case "Afficher le menu des scripts dans la barre de menus".
Sous 10.6.x,
aller dans le panneau "Général" du dialogue Préférences de l'Éditeur Applescript
puis cocher la case "Afficher le menu des scripts dans la barre des menus".

--=====

Save the script as a Script: recup_PDF.scpt

Move the newly created file into the folder:
<startup Volume>:Users:<yourAccount>:Library:Scripts:

go to the Scripts Menu, then choose "recup_PDF"
Ask to navigate to a Numbers or Pages flatfile
and try to extract the embedded Preview.pdf file.

One may also save the script as an Application Package (Application under 10.6 and higher)
then, drag and drop the document's icon on the application script.

--=====

The Finder's Help explains:
To make the Script menu appear:
Open the AppleScript utility located in Applications/AppleScript.
Select the "Show Script Menu in menu bar" checkbox.
Under 10.6.x,
go to the General panel of AppleScript Editor's Preferences dialog box
and check the "Show Script menu in menu bar" option.

--=====

Yvan KOENIG (VALLAURIS, France)
2010/08/24
*)
--=====

on run
	my commun(choose file of type {"com.apple.iwork.numbers.numbers", "com.apple.iwork.numbers.sffnumbers", "com.apple.iwork.pages.pages", "com.apple.iwork.pages.sffpages"} without invisibles)
end run

--=====

on open sel
	set ma_selection to (item 1 of sel) as alias
	tell application "System Events" to tell disk itdem("" & ma_selection)
		set type_id to type identifier
	end tell
	if type_id is not in {"com.apple.iwork.numbers.numbers", "com.apple.iwork.numbers.sffnumbers", "com.apple.iwork.pages.pages", "com.apple.iwork.pages.sffpages"} then
		if my parle_anglais() then
			error """ & ma_selection & "" isn't a Numbers or Pages document !"
		else
			error "« " & ma_selection & " » n'est pas un document Numbers ou Pages !"
		end if
	end if
	my commun(ma_selection as alias)
end open

on commun(le_corrompu)
	tell application "System Events" to tell disk item ("" & le_corrompu)
		set c_un_paquet to package folder
	end tell
	
	if c_un_paquet then
		if my parle_anglais() then
			error """ & le_corrompu & "" isn't a Numbers or Pages flat file !"
		else
			error "« " & le_corrompu & " » est un paquet Numbers ou Pages !"
		end if
	end if
	
	set balise1 to "QuickLook/Preview.pdf"
	if (get system attribute "sysv") < 4176 then
		set balise2 to "%%EOF" & (ASCII character 10)
	else
		set balise2 to "%%EOF" & linefeed
	end if
	set octets_a_lire to 10 * 512
	
	set i to 0
	set pdf_disponible to false
	repeat
		try
			(*
We read 50 bytes more than the step so we will not miss an occurence crossing the step boundary *)
			set contenu_pdf to read le_corrompu from (i * octets_a_lire) for (octets_a_lire + 50) (* don'task to 'read file' as le_corrompu is an alias *)
		on error
			exit repeat
		end try
		if contenu_pdf contains balise1 then
			set offset1 to i * octets_a_lire
		else if contenu_pdf contains balise2 then
			set offset2 to (i + 1) * octets_a_lire
			set pdf_disponible to true
		end if
		if pdf_disponible then exit repeat
		set i to i + 1
	end repeat
	
	if pdf_disponible is false then
		if my parle_anglais() then
			error """ & le_corrompu & "" doesn't contain a Preview.pdf file !"
		else
			error "« " & le_corrompu & " » ne contient pas de fichier Preview.pdf !"
		end if
	end if
	(*
Read an acceptable block of datas *)
	set contenu_pdf to read le_corrompu from offset1 to offset2 as data (* don'task to 'read file' as le_corrompu is an alias *)
	
	set balise1_hex to "517569636B4C6F6F6B2F507265766965772E706466" --"QuickLook/Preview.pdf"
	set balise2_hex to "2525454F460A" --"%%EOF" & linefeed
	try
		(*
Force an error to be able to decipher the contents of the read block *)
		contenu_pdf as text
	on error contenu_pdf number errNbr
		(*
contenu_pdf starts with : "«data rdat.", drop this header *)
		set contenu_pdf to item 2 of my decoupe(contenu_pdf, "«data rdat")
		(*
to drop what is before balise1_hex as well as balise1_hex so the useful range will begin with "%PDF" *)
		set contenu_pdf to my decoupe(contenu_pdf, balise1_hex)
		set offset_local to ((count of item 1 of contenu_pdf) + (count of balise1_hex)) div 2
		set contenu_pdf to item 2 of contenu_pdf
		(*
Calculate the offset of the %PDF block in the file *)
		set offset_beg to offset1 + offset_local
		(*
to drop what is after balise2_hex *)
		set contenu_pdf to (item 1 of my decoupe(contenu_pdf, balise2_hex)) & balise2_hex
		(*
Calculate the count of useful bytes *)
		set longueur_du_contenu to (count of contenu_pdf) div 2
	end try
	(*
Extracts the PDF block *)
	set contenu_pdf to read le_corrompu from offset_beg for longueur_du_contenu as data (* don'task to 'read file' as le_corrompu is an alias *)
	(*
Create on the Desktop a PDF file with an unique name *)
	set p2d to path to desktop (* p2d must be defined here *)
	set nom_pdf to "Preview" & (do shell script "date +_%Y%m%d-%H%M%S.pdf")
	tell application "System Events" to make new file at end of p2d with properties {name:nom_pdf}
	(*
Write the extracted datas in the new file *)
	set chemin_du_pdf to "" & p2d & nom_pdf
	write contenu_pdf to file chemin_du_pdf
	(*
Wait for completion of the save process *)
	set maybe to 0
	repeat
		try
			tell application "System Events" to set maybe2 to physical size of file chemin_du_pdf
			if maybe2 = maybe then exit repeat (* write task is completed *)
			set maybe to maybe2
		end try
	end repeat
	tell application "System Events" to open file chemin_du_pdf
end commun

--=====

on hex2num(t)
	return ((offset of (text item 1 of t) in "0123456789ABCDEF") - 1) * 16 + (offset of (text item 2 of t) in "0123456789ABCDEF") - 1
end hex2num

--=====

on decoupe(t, d)
	local oTIDs, l
	set oTIDs to AppleScript's text item delimiters
	set AppleScript's text item delimiters to d
	set l to text items of t
	set AppleScript's text item delimiters to oTIDs
	return l
end decoupe

--=====

on parle_anglais()
	return (do shell script "defaults read 'Apple Global Domain' AppleLocale") does not start with "fr_"
end parle_anglais

--=====
--[/SCRIPT]
--{code}

Yvan KOENIG (VALLAURIS, France) mardi 24 août 2010 22:16:40