Processing part of a file at a time

I have a file that contains special words (like for a medical dictionary or words like Macscripter) and each word is on a newline.

like -

apple
script
epitopes
at
Macscripter

           and so forth.........

I intend to use this list of words as a dictionary for my word processor.
Needless to say, I expect this to grow quite big in a short period of time.

If at some point (hypothetically speaking) if my file (fileA) grows to say > 1 GB or so (or Huge) and if I try to use an applescript to process such a file, (say use a process that weeds out common words that are already there in another file, fileB of similar size)
would the entire file be loaded into memory or can I some how make applescript load say 500kB (may be more) per loop of the process?

Is it advisable to do such processes using applescript or should I be looking into learning another language like perl ?

Hi dns278,

AppleScript can read blocks, but did you know that you already have a dictionary of words on your computer in UNIX?

e.g.

do shell script “look cat”

This will find all word that begin with cat.

gl,

Hi Kel,

Its really cool. ( I didn’t know there was a dictionary in UNIX). Thanks for that…

But if I do “look demyelinating” it returns nothing.
This and many such words that I use are found in expensive dictionaries only.
Of course I can right click in open office and add it to my custom dictionary. but even that is too tedious.

How do I make applescript read a file in blocks?

What I would do is just add to the list of words that ‘look’ uses unless you already have your list. When you open the Terminal.app and do ‘man look’, you find:

If file is not specified, the file /usr/share/dict/words is used

What I would do is backup the words file. Then all you gotta do is add your words to that file.

If you want to make your own words file, then there are many ways you can do it for AppleScript. You probably want to keep the file sorted, so you’ll need an algorithm for that. But anyway, back to your question.

What you would do is use the read/write scripting additions commands. I’m trying to think how I can say this without going through a whole lesson. :slight_smile: You want to read blocks of words (or paragraphs) right?

gl,

Here’s the plan of action. We duplicate the words file for use in our example, in the Terminal with:

cp /usr/share/dict/words ~/Desktop/

The first thing I want to see is how large the file is. Getting information on it in the Finder reveals that the file is 2.3 MB. FYI you don’t need to read this in blocks if you’re reading as text. A simple test in AppleScript:


set f to choose file -- the words file
set the_words to read f

No problem. AppleScript can handle text. It’s when you coerce the text to list where the problems arise. For learning, let’s say we want to read the file in 1 MB blocks. On the side, reading the file in blocks will eliminate various search algorithms. I’ll pause here, because that’s a big thing. Are you sure you want to read in blocks?

Anyway, I’m going into the abyss. Here’s an example of reading byte blocks:


set f to choose file -- the words file
set mboftext to read f from 0 to 1024 ^ 2

gl, man

Thanks Kel.

Dive in…
Am listening eagerly…

I’m sure you already know this but your word is in the Dictionary.app.

How do you plan to implement an applescript that’s easier than right-clicking?

Are you suggesting there is a medical dictionary built into apple?
If no, thats what I would like to have.

Its not implementing applescript thats tedious. Its implementing the addition of properly spell checked new words that are not there in the default dictionary for openoffice thats tedious.

(right click - correct the word - after referring to a free online source - add to dictionary) x each word the spell checker thinks is incorrect

This is what I am hoping for…

I would generate a list of words that are not present in the default dictionary.
Then make applescript do a curl like the one regulus6633 mentioned a couple of posts ago.
get the result and look for the words “No entries for such a word”.
If the search string is not present, such a word exists and applescript adds it to my list.
Else
It adds it to a word review list for manual review.

Are you using OSX 10.4? It’s probably the same in 10.3 too. If so then you already have a dictionary that contains “demyelinating”. This dictionary is actually the Oxford dictionary and thesaurus. Just go to the applications folder and open the “Dictionary”. Or there’s an easier way. Just double click on the word right here in Safari to highlight it. Then right-click on the highlighted word and choose “look up in dictionary” and you will be shown the dictionary. My dictionary (supplied by Apple with 10.4) also has the word “epitopes”. I’m not sure if it’s a “medical” dictionary but it has those words.

Of course this probably won’t help you do it easily in open office because I don’t think open office is tied into the core technologies of OSX, but any other application should work just by highlighting a word and right-clicking to get the definition.

I think NeoOffice (which is a mac port of open office) might be tied into core technologies, so you might give it a try.

Great. Good to see that dictionary has many words like epitope, demyelinating and so on…
But it does not have many other words like nephrosclerosis, sulfosalicylic, Ezetimibe and so on…

Thanks for the suggestion on Neo office.
Will try that… or may be BBEdit ?

Hey now, watch what you’re saying here. I’m not sure those words are appropriate for this forum! :smiley:

Have you thought about a simple AppleScript to look up those words in http://medical-dictionary.thefreedictionary.com/, eg: http://medical-dictionary.thefreedictionary.com/sulfosalicylic – not hard to construct and contains all the words mentioned here.

How do you create a simple script like that?

You saw exactly what I was thinking. Reminds me of the movie NEXT… :cool:

I don’t remember exactly what you wanted to do, but here’s a very simple form that looks up the word in the clipboard. Select the word, copy it it the clipboard, run this:


set MedDict to "http://medical-dictionary.thefreedictionary.com/"
set searchWord to the clipboard -- assumes you've copied it
open location MedDict & searchWord