Convert Pages files to MS Word recursively

Task: start from a user-specified folder, recurse into all folders and for each Pages file found, export an MS Word version of the Pages file into the same folder.

First post so please don’t flame me! I have borrowed heavily from this post: https://macscripter.net/viewtopic.php?id=47006. Pages seems to want POSIX-format (colon separated) paths and not unix-style (slash separated), so there’s a couple of extra calls to get to the right format. (There must be a way of avoiding this extra step but I couldn’t find it…).

Tested on High Sierra.

Posted in the hope that someone will find it useful.

use scripting additions
use script "FileManagerLib" version "2.3.0"

tell application "Finder"
	set source_folder to choose folder with prompt "Please select top-level directory."
end tell

set theFiles to objects of source_folder result type paths list with searching subfolders without include folders

repeat with myFile in theFiles
	
	if (name_extension of (parse object myFile with HFS results)) contains "pages" then
		set InputFile to parent_folder_path of (parse object myFile with HFS results) & full_name of (parse object myFile with HFS results)
		set WordFile to parent_folder_path of (parse object myFile with HFS results) & name_stub of (parse object myFile with HFS results) & ".docx"
		
		log "Converting " & InputFile & " to: " & WordFile
		tell application "Pages"
			set mydoc to open file InputFile -- open input file in Pages
			export mydoc to file WordFile as Microsoft Word
			close mydoc saving no -- close the original file without saving
		end tell
		
	else
		log "No conversion: " & myFile
	end if
end repeat

Edit: this requires Shane Stanley’s FileManagerLib library (https://www.macosxautomation.com/applescript/apps/Script_Libs.html#FileManagerLib).

Hi. Welcome to MacScripter.

This site’s Posting Guidelines (link at the top of this page) explicitly ban flaming under any circumstances, so you should be OK. :slight_smile:

Thanks for posting your script. Since it uses a third-party library, it would be a good idea to include a comment pointing this out and indicating where the library can be obtained.

Hi, fidgety. Welcome to MacScripter. Nice script!

I will add my own version. It is also very fast (1.5 times faster than your version) and does not use third-party libraries.


use framework "Foundation"
use scripting additions

set sourceFolder to (POSIX path of (choose folder with prompt "Please select top-level directory."))

-- Get contents of folder, includung contents of subfolders, No packages and hidden files
set fileManager to current application's NSFileManager's |defaultManager|()
set sourceURL to current application's NSURL's URLWithString:sourceFolder
set fileKey to current application's NSURLIsRegularFileKey
set searchOptions to 6 -- skip packages and hidden files option
set entireContents to (fileManager's enumeratorAtURL:(sourceURL) ¬
	includingPropertiesForKeys:({fileKey}) options:(searchOptions) errorHandler:(missing value))'s allObjects()

-- Filter case-insensitively for items with ".pages" extensions.
set thePredicate to current application's NSPredicate's predicateWithFormat:("pathExtension ==[c] 'pages'")
set urlArray to entireContents's filteredArrayUsingPredicate:(thePredicate)
set pagesFiles to urlArray as list

-- If no Pages files founded, then return
if pagesFiles is {} then
	display notification "NO FILES FOR CONVERSION FOUNDED." with title "CONVERSION PAGES FILES TO MICROSOFT WORD FILES"
	return
end if

-- Converting  process
repeat with pagesFile in pagesFiles
	tell application "Finder" to set parentFolder to (container of (pagesFile as alias)) as alias
	set aBaseName to my getNameWithoutExtension(POSIX path of pagesFile)
	display notification "Converting  " & return & aBaseName & ".pages" & ¬
		"  >>>  " & aBaseName & ".docx"
	tell application "Pages"
		set mydoc to open file (pagesFile as alias as string) -- open input file in Pages
		export mydoc as Microsoft Word to file ((parentFolder as string) & aBaseName & ".docx")
		close mydoc saving no -- close the original file without saving
	end tell
end repeat

-- Get base name of file without extension (fast method)
on getNameWithoutExtension(aFilePosixP)
	set TID to text item delimiters
	set text item delimiters to {"/"}
	set baseName to last text item of aFilePosixP
	if baseName contains "." then
		set text item delimiters to {"."}
		set nameWithoutExtension to text 1 thru text item -2 of baseName
	else
		set nameWithoutExtension to baseName
	end if
	set text item delimiters to TID
	return nameWithoutExtension
end getNameWithoutExtension

Most of the scripts’ running time is taken with Pages opening documents and exporting them in the new format, for which the code is essentially the same in both scripts. So 1.5 times faster, or even 1.5 times as fast, seems a bit optimistic. :confused:

The only drag in the rest of fidgety’s code (apart from the log commands) is the use of FileManagerLib’s ‘parse object’ command five times per input path. It only needs to be called once per path and could in fact be replaced altogether with some simple AS text manipulation:

-- Requires Shane Stanley's FileManagerLib library (<https://www.macosxautomation.com/applescript/apps/Script_Libs.html#FileManagerLib>).

use scripting additions
use script "FileManagerLib" version "2.2.2"

tell application "Pages"
	activate
	set source_folder to (choose folder with prompt "Please select top-level directory.")
end tell

set theFiles to (objects of source_folder result type files list with searching subfolders without include folders) -- FileManagerLib command. NB. 'result type' now 'files list'.

repeat with myFile in theFiles
	set InputPath to myFile as text -- HFS path.
	
	if (InputPath ends with ".pages:") then set InputPath to text 1 thru -2 of InputPath
	if (InputPath ends with ".pages") then
		set WordPath to text 1 thru -7 of InputPath & ".docx"
		
		--log "Converting " & InputFile & " to: " & WordFile
		tell application "Pages"
			set mydoc to open myFile -- open input file in Pages
			export mydoc to file WordPath as Microsoft Word
			close mydoc saving no -- close the original file without saving
		end tell
		
	else
		--log "No conversion: " & myFile
	end if
end repeat

Hi, Nigel.
I have no doubt that the Stanley’s library can be used more efficiently. And then the speed can be better than in my script. For example, (parse object myFile with HFS results) is calculated in a repeat loop 5 times. It is enough to calculate this 1 time and assign it to the some variable. In the remaining 4 cases, you must use this variable.

I replaced choose folder command with hardcoded to my Documents directory, to test. I commented my display notifications, your logs, and difference in speed of my script and of 2 others became bigger.
Here, I will provide results:

  1. Nigel Garvey’s script - 5.16 seconds
  2. KniazidisR’s script - 3.84 seconds
  3. fidgety’s script - 8.71 seconds

Here, I provide the fastest script (2.82 seconds):
– Requires Shane Stanley’s FileManagerLib library (https://www.macosxautomation.com/applescript/apps/Script_Libs.html#FileManagerLib).


use scripting additions
use script "FileManagerLib" version "2.2.2"
use framework "Foundation"

tell application "Pages"
	activate
	set source_folder to (choose folder with prompt "Please select top-level directory.")
end tell

set theFiles to (objects of source_folder result type urls array with searching subfolders) -- FileManagerLib command. NB. 'result type' now 'urls list'.
set thePredicate to current application's NSPredicate's predicateWithFormat:("pathExtension ==[c] 'pages'")
set theFiles to (theFiles's filteredArrayUsingPredicate:(thePredicate)) as list -- result type now files list

-- if theFiles = {} then log "No files to convert founded"

repeat with myFile in theFiles
	set InputPath to myFile as text -- HFS path.
	
	--log "Converting " & InputPath & " to: " & (text 1 thru -7 of InputPath & ".docx")
	tell application "Pages"
		set mydoc to open myFile -- open input file in Pages
		export mydoc to file (text 1 thru -7 of InputPath & ".docx") as Microsoft Word
		close mydoc saving no -- close the original file without saving
	end tell
end repeat

And, here, I provide the fastest script without third-party lybraries (the speed same - about 2.82 seconds):


use framework "Foundation"
use scripting additions

tell application "Pages"
	activate
	set sourceFolder to (POSIX path of (choose folder with prompt "Please select top-level directory."))
end tell

-- Get contents of folder, includung contents of subfolders, No packages and hidden files
set fileManager to current application's NSFileManager's |defaultManager|()
set sourceURL to current application's NSURL's URLWithString:sourceFolder
set fileKey to current application's NSURLIsRegularFileKey
set searchOptions to 6 -- skip packages and hidden files option
set entireContents to (fileManager's enumeratorAtURL:(sourceURL) ¬
	includingPropertiesForKeys:({fileKey}) options:(searchOptions) errorHandler:(missing value))'s allObjects()

-- Filter case-insensitively for items with ".pages" extensions.
set thePredicate to current application's NSPredicate's predicateWithFormat:("pathExtension ==[c] 'pages'")
set pagesFiles to (entireContents's filteredArrayUsingPredicate:(thePredicate)) as list

-- If no Pages files founded, then return
if pagesFiles is {} then
	display notification ¬
		"NO FILES FOR CONVERSION FOUNDED." with title "CONVERSION PAGES FILES TO MS WORD FILES"
	return
end if

-- Converting  process
repeat with pagesFile in pagesFiles
	set inputPath to pagesFile as text
	display notification inputPath & "  >>>>" & return & (text 1 thru -7 of inputPath & ".docx") ¬
		with title "CONVERSION PAGES FILES TO MS WORD FILES"
	tell application "Pages"
		set mydoc to open inputPath
		export mydoc to (text 1 thru -7 of inputPath & ".docx") as Microsoft Word
		close mydoc saving no
	end tell
end repeat

If we are talking about speed a Spotlight search with mdfind is 5 times faster than the script library (no offense, Shane)

set source_folder to (choose folder with prompt "Please select top-level directory.")
set theFiles to paragraphs of (do shell script "mdfind -onlyin " & quoted form of POSIX path of source_folder & space & "'kMDItemFSName = *pages'")

repeat with myFile in theFiles
	set inputPath to POSIX file myFile as alias
	tell application "Pages"
		set mydoc to open inputPath -- open input file in Pages
		export mydoc to file (text 1 thru -7 of inputPath & ".docx") as Microsoft Word
		close mydoc saving no -- close the original file without saving
	end tell
end repeat

Well I don’t know what you’re timing. :wink:

With the Pages stuff, logs, and notifications, commented out, and timing from after the ‘choose folder’ line (ie. just the aquisition of the file paths), my timings with a folder containing 153 Pages files, a PDF, a subfolder, and four alias files are:

fidgety: 0.897 - 0.972 seconds
KniazidisR: 0.491 - 0.505 seconds
NG: 0.05 - 0.054 seconds

However, when I include the opening and exporting of the documents (with Pages 8.1 already open and frontmost when the timings begin, the folder’s contents not showing in the Finder during execution, and the created docx files being removed and deleted between each run) a problem is revealed with both our scripts!:

fidgety: 162.132 - 165.811 seconds
KniazidisR: errors trying to export an old Pages document whose file is a bundle and whose path therefore ends with a colon.
NG: 159.223 seconds, but doesn’t catch three documents whose files are bundles.

With bundles allowed for in the KniazidisR and NG scripts (fidgety’s script already allows for them):

fidgety: 171.809 seconds - 174.283 seconds
KniazidisR: 166.665 - 167 962 seconds
NG: 165.13 seconds - 169.911 seconds

… in other words, all essentially the same speed except for the noise you’d expect when performing application tasks on a large number of files.

I’ll add the colon fix to the script I posted earler.

Hi Nigel, Thanks, I have edited the post now to add this.

Hi KniazidisR, thanks very much! I wasn’t expecting anyone to put this much effort into it.

As a beginner with AppleScript, I don’t understand why it is necessary to coerce the type of “myFile” to text. Shouldn’t it be a string (text) already ? Or is the language so strongly typed as to require this ?

I already knew, in my case, it was going to take an hour or so to run the script, so at that point I wasn’t really bothered about making it any faster. However thank you for taking the time to respond.

Without doing any profiling, it looked like a significant chunk of the overall time was taken up by the fancy window opening animation in Pages. Is there a way to run Pages (or more generically, any app) in “silent” mode, i.e. without the window opening at all ? I tried a google search, it didn’t come up with anything.

Thanks! I never would have discovered this one on my own.

Hi fidgety.

In my version of your script, I altered the ‘objects of’ command to return a list of file specifiers instead of a list of POSIX paths. These can be used directly with the ‘open’ command in Pages and can be coerced to text (ie. to HFS paths) for editing into the form needed for the file specifier in the ‘export’ command.

I completely agree with this proposal. Faster than Spotlight’s metadata is nothing. There is only one minus - the process of indexing files from Spotligt itself consumes computer resources, which results in the opposite effect.

But there is a solution, and I think it is optimal - to enable Spotlight indexing only when you are sleeping or are at work, that is, create a script that starts the Spotlight process at certain hours of the day.

By the way, Shane has a very efficient Spotligt metadata library - MetadataLib, which I use quite often.

Absolutely fair remark. I became acquainted with the Pages program last year and only yesterday found out that the old Pages files existed as a bundle. Although I had some suspicions - why did the original script use if contains?

Now it is clear. I will fix this bug in my scripts when there is free time.

None taken — I have a lib for those, too ;). But Spotlight queries don’t always work, for various reasons, so in my mind they’re best for cases where either the alternative is just too slow to contemplate, or where near enough is good enough.