Extract pdf text so that it's 'paste-able' to word

Hi folks - Newbie here, here’s the plan

extract text from pdf file (easy enough, using the automator action)

use the resulting text of the document and paste it to:

  : a third party application (word counter) - for the purpose of counting the most commonly used words in a document

  : Word or Textwrangler - for the purpose of reformatting text

A fundamental problem seems to be, regardless of the action, the only thing that ever makes it to the clipboard is the NAME of the file, NOT the text

I thank you, in advance, for your help

ugh, the dreaded pasting from PDF’s to anything else.
the biggest issue will be the formatting of the PDF. if it doesn’t have very simple formatting then you have to overcome that as well. i’m too much of a n00b with Automation to help with that side of things but i have a lot of experience working with PDF files and the first thing you may have to do is send it thru an OCR app like ABBY in order to make it behave like a normal document. the truth is, you can probably use ABBY, Readiris or a similar program to just convert the whole file to Word and then get rid of what you don’t need.

i’ll step aside and let the experienced scripter peeps handle it now
cheers
Steve

Hi, monster40lbs.

Steve is right. The biggest problem is the quality of the PDF file. And even if the file is OK the amount of text can be a problem too.

I’ve managed to put together a workflow that seems to work. Sorry about being in Spanish but it’s what I got. These are the actions loosely translated:

  1. Get the specified Finder items (I used the “About the Stacks” PDF from the Documents folder)
  2. Extract text from PDF (my settings: RTF format, save in Desktop, same name as input, and replace existing files)
  3. Open Word documents (MS Word action)
  4. Copy contents of Word file to the Clipboard (MS Word action, “All the content” selected)
  5. Get Clipboard contents
  6. One line of Applescript to count the words

I tested it with a 500-page PDF and Automator chocked in it, so amount of text is definitely an issue.

I hope it helps.

Regards,
Antonio

Model: Macbook Pro 13"
AppleScript: 2.1.2
Browser: Opera/9.80 (Macintosh; Intel Mac OS X; U; es-ES) Presto/2.6.30 Version/10.61
Operating System: Mac OS X (10.6)

You could also check out http://www.macworld.com/article/142601/2009/09/makeaservice.html