Hi.
I’m a big fan of Safari webarchive format and many times I find it to be better and covert to PDF.
I use webarchive as a way to read documents… but also to archive.
I know QuickLook and Safari and (TextEdit in limited way) read this binary plist file.
I also know that Spotlight, mdfind and maybe other could search inside this format.
I do not like to import the webarchive to Safari to be able to extract text or copy…
So I was thinking about using apple textutil command to convert or cat to txt format, do find string matching.
I also find out that doc, docx and wordml had very good output in TextEdit. That is
very interesting if I need to edit and later… for printing. This format are more close to rtf format.
So my question to all…
If I choose to do it with textutil everything are done in background and that is great.
Here is a fast AppleScript…
set thePath to POSIX path of (path to desktop as alias) & "myArchive.webarchive"
set out to do shell script "textutil -cat txt " & quoted form of thePath & space & "-stdout " & "|" & "pbcopy"
set clip to the clipboard
The result of Script Editor is not same as I do directly in command-line…
I do understand I have to clean the code somehow… hmmm
What would be the best approach to be able to search in webarchive, find matching, extract text from it ??
Thanks.