Extracting words from a text file

Hi, can someone suggest how I could do this in AppleScript - what I want to do is go through a text file and copy each word into a new record in Filemaker. I know how to write the result into a filemaker database but I’m not sure how to get it to loop through the file and get each word

Could someone point me in the right direction?
ta!!

This is a quick weekend reply:

Its really two seperate tasks, if you have the memory for it, just read the entire file in to a variable:


set str to read (choose file)

Then process the string:


repeat with i from 1 to count words in str
  set w to word i of str
  -- do something with the word
end repeat

AppleScript sometimes has some very funny ideas about what constitutes a “word.” Check out the language guide, (which I don’t have a link to handy at the moment).

Thanks for that - it works perfectly for what I need (as long as the file isn’t larger than 32k - but that shouldn’t be a problem)
cheers!!
howard

When I’m adding each word is it possible to also add the sentence that the word occurs in as well? For example when evaluating “jumps” in the sentence “the quick brown fox jumps over the lazy dog” also returns “the quick brown fox jumps over the lazy dog” as a separatevalue -this what I’ve got so far;

on adding folder items to this_folder after receiving added_items
repeat with thisfile in added_items
set str to read thisfile
set thisfilename to name of (info for (thisfile))
repeat with i from 1 to count words in str
set w to word i of str
tell application “FileMaker Pro”
create new record at end of database “lawreport.fp5”
set cell “text” of last record of database “lawreport.fp5” to w
set cell “file” of last record of database “lawreport.fp5” to thisfilename
end tell
end repeat
move thisfile to alias “path:to:destination:folder”
end repeat
end adding folder items to

Whoops, this limitation was corrected under Pather, I believe. What you’ll need to do is work with ‘chunks’ of the string. Something along the lines of:


set str to read (choose file)
set x to 1
repeat
	try
		set chunk to str's text x thru (x + 30000)
		-- do something with chunk of text
	on error
		try
			set chunk to str's text x thru -1
			-- do something with last chunk of text
		on error
			exit repeat
		end try
	end try
end repeat


repeat with p from 1 to count paragraphs in str

	set oneLine to paragraph p of str

	repeat with w from 1 to count words in oneLine

		set oneWord to word w of oneLine

		-- do something with the word oneWord and the
		-- line it belongs to (oneLine).

	end repeat
end repeat

that’s brilliant thanks Arthur - works perfectly. Plus now I can get it to insert the word number a sentence number into the database as well which is a big help!!

One last question though - is it possible to get the script to ignore paragraph breaks in the text file - it counts paragraphs breaks as a sentence when numbering sentences

thanks for your help
howard

Hmm, are we talking about “soft returns,” or sequences of 2 or more “hard-returns?”

In the first case, it’s a question of dealing explicitly with a specific line-ending character:


-- Which is which depends on what applications you're
-- using, which version of the operating system, etc.
--
set hardReturn to ASCII character 13
set softReturn to ASCII character 10

set AppleScript's text item delimiters to hardReturn

repeat with p from 1 to count text items in str
	
	set realLine to text item p of str
	
	repeat with w from 1 to count words in realLine
		
		set oneWord to word w of realLine
		
		-- etc...

In the case of sequences of 2 or more returns, where you end up with “empty” paragraphs, you can just repeatedly replace every 2 occurances with 1 occurance:


on SingleCharOccurances(str, oneChar)
	
	set twoChars to oneChar & oneChar
	
	repeat while str contains twoChars
		
		set AppleScript's text item delimiters to twoChars
		set str to str's text items
		
		set AppleScript's text item delimiters to oneChar
		set str to str as string
		
	end repeat
	
	set AppleScript's text item delimiters to {""} -- restore
	
	return str
	
end SingleCharOccurances

SingleCharOccurances(str, return)

There’s a faster way to do this, actually:


property kcAsc0 : ASCII character 0

on SingleCharOccurances(s, c)
	
	set k to kcAsc0 -- sentinal, must not exist in s or c
	
	set astids to AppleScript's text item delimiters
	try
		set AppleScript's text item delimiters to c
		set s to s's text items
		
		set AppleScript's text item delimiters to k & c
		set s to s as string
		
		set AppleScript's text item delimiters to c & k
		set s to s's text items
		
		set AppleScript's text item delimiters to ""
		set s to s as string
		
		set AppleScript's text item delimiters to k
		set s to s's text items
		
		set AppleScript's text item delimiters to ""
		set s to s as string
		
	on error e number n from f to t partial result p
		set AppleScript's text item delimiters to astids
		error e number n from f to t partial result p
	end try
	set AppleScript's text item delimiters to astids
	
	return s
	
end SingleCharOccurances

SingleCharOccurances(str, return)

you’ll have to excuse my ignorance but where in my script would this go?

on adding folder items to this_folder after receiving added_items
repeat with thisfile in added_items
set str to read thisfile
set thisfilename to name of (info for (thisfile))
repeat with p from 1 to count paragraphs in str
set oneLine to paragraph p of str
repeat with w from 1 to count words in oneLine
set oneWord to word w of oneLine
tell application “FileMaker Pro”
create new record at end of database “lawreport.fp5”
set cell “text” of last record of database “lawreport.fp5” to oneWord
set cell “wordnumber” of last record of database “lawreport.fp5” to w
set cell “file” of last record of database “lawreport.fp5” to thisfilename
set cell “line” of last record of database “lawreport.fp5” to oneLine
set cell “paranumber” of last record of database “lawreport.fp5” to p
end tell
end repeat
move thisfile to alias “path:to:destination:”
end repeat
end repeat
end adding folder items to

The handler definition can go all the way at the top or bottom of your script. The call to remove “empty” paragraphs should be made before you start to loop through the paragraphs:


on adding folder items to this_folder after receiving added_items
	repeat with thisfile in added_items
		set str to read thisfile

		set str to SingleCharOccurances( str, return )

		set thisfilename to name of (info for (thisfile))
		repeat with p from 1 to count paragraphs in str
	etc...
end adding folder items to

on SingleCharOccurances(s,c)
	etc...
end SingleCharOccurances

Actually, the simpler and more intuative solution would look like this:


on adding folder items to this_folder after receiving added_items 
	...
		repeat with p from 1 to count paragraphs in str 

			set oneLine to paragraph p of str 

			if (oneLine is not "") then -- empty paragraph

				repeat with w from 1 to count words in oneLine
					etc...

			end if

You can also count length:

if ( length of oneLine is not 0 ) then