Searching for strings in a Pages Document

All,
I have a Pages document which has image files listed in a table. I have a list of files names and I would like to loop through this list and report which files weren’t found in the Pages document. I exported the Pages file into plain text, and used the “read” command to read the contents of the text file into a variable. Here is the code:


set filetoFind to "image1.png"
set theFile to (choose file with prompt "Select a file to read:" of type {"TXT"})
open for access theFile
set fileContents to (read theFile)
close access theFile

if fileContents contains filetoFind then
	display dialog "found"
else
	display dialog "not found"
end if

I always get the “Not Found” dialog, even though I see “image1.png” in the plain text file. Is this an encoding issue? Is there a better way to find strings in a Pages Doc/ or the exported plain text file? What am I missing?

Your script worked perfectly when I first tried it. But it seems that the kind of “Plain Text” Pages exports depend on what’s in the document. If the text can be wholly represented as Mac Roman, that’s the form in which it’s saved and your script works. But if the text contains any Unicode-only characters, it’s saved as little-endian UTF-16 (on my Intel machine, at least) with an appropriate BOM at the beginning. In that case, the file has to be read ‘as Unicode text’ for the script to work.


set filetoFind to "image1.png"
set theFile to (choose file with prompt "Select a file to read:" of type {"TXT"})
set fRef to (open for access theFile)
try
	if ((read fRef for 2 as data) is in {«data rdatFFFE», «data rdatFEFF»}) then
		-- The file starts with a UTF-16 BOM.
		set fileContents to (read fRef from 1 as Unicode text)
	else
		set fileContents to (read fRef from 1 as string)
	end if
end try
close access fRef

if fileContents contains filetoFind then
	display dialog "found"
else
	display dialog "not found"
end if

Edit: Just to note that on my G5, the UTF-16’s saved in big-endian form, which is native to PPC processors. The script in this post works with that too (except for the ‘of type {“TXT”}’ bit, for some reason).

You may download :

https://www.box.com/s/6kcz3nthwdezch90ppnu

or/and

https://www.box.com/s/52t3hlupkfmn0virhonx

Yvan KOENIG (VALLAURIS, France) vendredi 8 mars 2013 14:04:14

Thanks

Nigel and Yvan