Editing RTFD documents in Pages

My goal is to write a script that could remove specific items from RTFD files opened in Pages 6.2.
(context: the files contain articles copied from Safari. I want the remove the ad inserts before and after the actual article)

My problem is that I cannot locate in the document anything but plain text, whereas the document contains plenty images, links, boxes, etc.

The Pages dictionary mentions classes such as ‘Text items’ and ‘Images’, which I assume should contain the objects that I want to remove. Unfortunately. I cannot access them. My ‘Document’ seems to contain only ‘Pages’ objects. It also has a ‘Body text’ property, but nothing else of interest.

I know that I could extract the plain text, but that is not my goal as I want to keep the pictures and formatting. Also, please note that I am not bound to Pages and could use ‘TextEdit’ if it’s up to the task.
(in fact ‘DevonThink Pro’ would be ideal, but I don’t even know where to start)

Any guidance pointing me in the right direction would be greatly appreciated.

Thanks in advance! W.

You are facing a failure of the current Pages.
What you want to achieve was easy with Pages '09.

Here is a quick and dirty workaround :

set rtfdFile to choose file of type {"com.apple.rtfd"}

tell application "Finder"
	set theContainer to container of rtfdFile as alias
	set origName to name of rtfdFile
	set bareName to text 1 thru -6 of origName
	set name of rtfdFile to bareName
	set rtfFile to move file ((theContainer as text) & bareName & ":" & "TXT.rtf") to theContainer
	set name of rtfFile to text 1 thru -2 of origName
end tell

It will give you the RTF file embedded into the rtfd file.

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) mercredi 6 septembre 2017 14:05:27

Here is an alternate script relying upon the fact that the shell makes no difference between a package and a folder.

set rtfdFile to choose file of type {"com.apple.rtfd"}

tell application "Finder"
	set theContainer to container of rtfdFile as text
	set origName to name of rtfdFile
end tell

# Builds the quoted form of the posix path of the rtf file embedded in the rtfd package
set quotedSource to quoted form of POSIX path of ((rtfdFile as text) & "TXT.rtf")
# Builds the quoted form of the posix path of the rtf file extracted from the rtfd package
set quotedExtracted to quoted form of POSIX path of (theContainer & text 1 thru -2 of origName)

# Triggers the shell to move the wanted file
# CAUTION : if an rtf file with the same name than the new one, it will be replaced silently.
do shell script "mv " & quotedSource & space & quotedExtracted

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) mercredi 6 septembre 2017 20:43:04

Only when you use the option ‘-f’

Hello DJ

During my tests I never used the option -f and the existing file was replaced silently by the new one.
As I wanted to re-check I ran this edited version :

set rtfdFile to choose file of type {"com.apple.rtfd"}

tell application "Finder"
	set theContainer to container of rtfdFile as text
	set origName to name of rtfdFile
	set extractedPath to (theContainer & text 1 thru -2 of origName)
	try
		set oldSize to size of file extractedPath --> 4601
	end try
end tell

# Builds the quoted form of the posix path of the rtf file embedded in the rtfd package
set quotedSource to quoted form of POSIX path of ((rtfdFile as text) & "TXT.rtf")
# Builds the quoted form of the posix path of the rtf file extracted from the rtfd package
set quotedExtracted to quoted form of POSIX path of extractedPath

# Triggers the shell to move the wanted file
# CAUTION : if an rtf file with the same name than the new one, it will be replaced silently.
do shell script "mv " & quotedSource & space & quotedExtracted

tell application "Finder"
	set newSize to size of file extractedPath --> 11740
end tell

As you may see, the “old” file was 4601 bytes long and the created one is a 11740 one.

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) jeudi 7 septembre 2017 18:09:18

That is interesting, at least…

So somehow mv will overwrite without prompt in certain conditions :confused:

In fact, when I ran the script for the first time there was no “old” rtf file.
When I ran it a second time I was quite sure that I will get an error message as I did with the original code doing everything with the Finder.
As I didn’t got an error, I thought that the script missed to do the job and it’s why I decided to delete a large part of the existing file.
After running again I was able to check that the old file was really replaced by the new one.

As I am curious (and pig headed) I decided to test the behavior of an alternate version.
This time the script moves the file without renaming it :

set rtfdFile to choose file of type {"com.apple.rtfd"}

tell application "Finder"
	set theContainer to container of rtfdFile as text
end tell

# Builds the quoted form of the posix path of the rtf file embedded in the rtfd package
set quotedSource to quoted form of POSIX path of ((rtfdFile as text) & "TXT.rtf")
# Builds the quoted form of the posix path of the rtf file extracted from the rtfd package
set quotedExtracted to quoted form of POSIX path of theContainer

# Triggers the shell to move the wanted file
# CAUTION : if an rtf file with the same name than the new one, it will be replaced silently.
do shell script "mv " & quotedSource & space & quotedExtracted

As the original one, this alternate version doesn’t issue an error message.

It seems that the mv command - implemented by Apple - doesn’t match its man specifications.

Yvan KOENIG running Sierra 10.12.6 in French (VALLAURIS, France) vendredi 8 septembre 2017 14:05:10

I have 409 rtfd files scattered over 33 folders. I simply want to automate opening each one in Pages and saving as a Pages doc (.pages). I can use Pages '09, Pages 2013 (both on El Cap) or Pages 7.1 on Mojave if needed. Is this possible?

It is straightforward to remove everything except the styled text but how would you determine which images/files to delete from the rtfd?

You mentioned that these are pages saved from safari. Can you provide an example page with a couple of files to be deleted? By the way, the simple way to get at the images of an rtfd is to right-click and ‘Show Package Contents’.

  1. First, your script should get a list of these rtfd files in 33 folders. That is, find them. Given the list of files, the script must then loop through each item in the list individually.

  2. Keep in mind, however, that Apple’s new policy is that an empty destination file must be created before content is exported to it. That is, it should be existing before exporting to it. You can create empty destination files using, for example, the Finder.

  3. You did not specify, however, whether you want to overwrite the original rtfd files, export result to the native folder of each rtfd, or to some other single folder. An inaccurate question is the main reason that you cannot be fully helped.

“You did not specify, however, whether you want to overwrite the original rtfd files, export result to the native folder of each rtfd, or to some other single folder. An inaccurate question is the main reason that you cannot be fully helped.”

Good points. I’d like to export the result to the native folder, that is the folder where the rtfd package is. So, where file.rtfd exists now, there would also be file.pages.