copy text WHILE?

I’ve just started learning applescript and I’m a little stumped with what I’m trying to do:

I have a very long .rtf document. I want to split the .rtf document into separate files by copying the text in each section and saving it as a separate file. I’ve been trying to write a script to split it up. I’ve been using TextEdit to open the document.

Each section of the text file begins with “Sec.” and then a number. So the whole document is divided into “Sec. 1.”, “Sec. 2.”, Sec. 3.", etc.

I want to copy all the text after “Sec. 1.” and before “Sec. 2.” and save it as a file called “Sec. 1”

I want the script to do that for each of the 243 sections of the document to make 243 separate files called “Sec. 1” through “Sec. 243”

Any help would be much appreciated. Thanks! I think when I get this script I’ll have the familiarity to do alot of the stuff I’m trying to learn with AppleScript.

Model: G5
AppleScript: 1.10
Browser: Safari 412
Operating System: Mac OS X (10.4)

Hi bridgerbell,

Did you want the seperate sections written to rtf or text?

Later,

it doesn’t matter to me. either way, rtf or text. any ideas?

Hi bridgerbell,

It’s easier for me to save to a plain text file using the Standard Additions read/write commands. Firstly, I have a TextEdit rtf file on my desktop and I’ll choose it with the ‘choose file’ comman, open it in TextEdit, and get the text.

set rtf_file to choose file
tell application “TextEdit”
launch (activate)
open rtf_file
set the_text to text of front document
end tell

If you run this script in the Script Editor and look at the result, you’ll see the text of the rtf doc because the last result was the text of the front document. Actually, there’s an implied 'get command there:

set the{text to get text of front document

Then you need to split up the text that you got. There are various ways to do this, so I’m thinking what is most understandable way.

Later,

Oops, it should look like this:

set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document
end tell

Notice that when you use something like “Sec. 1”, “Sec. 2”, … ,Sec. 2 whatever, the last bit of text won’t have a “Sec. 244” if there are only 233 sections.

I’m thinking that you should add a section 244 at the end.

set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document
end tell
set the_text to the_text & “Sec. 244”

What I’m thinking now is that the text between the section labels might contain the string say “Sec. 77” as in “… refer to Sec. 77 …”. Then you have problems. What do you think?

Later,

The reason why I say this is because you can use the ‘offset’ command. If the text in TextEdit is like this:

Sec. 1

hello

Sec. 2

bye

Sec. 3

hello again

Sec. 77

here’s more text

This script opens the file in TextEdit:

set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document
end tell
set the_text to the_text & “Sec. 244”
set this_offset to (offset of “Sec. 77” in the_text)

I get a rsult of:

50

But If section 2 refered to section 77 with “Sec. 77” I’d get a different offset.

The text definitely does not have “Sec. xx” anywhere but in the headings, so I won’t have problems with that. There are no “refer to Sec. xx” parts of the text. So that should be fine. I ran the script as you suggested and I see the full text in my script window’s result. Then I have:
set temp to display dialog “Specify cleave word” default answer “”
set textEntered to text returned of temp

So that I can enter “Sec.” as the word to delineate sections. And then I don’t know where to go next? Thanks for your help thus far.

From there there are a lot of ways to do this. I think I would just get the offsets first. It’s probably faster to just work with AppleScript first then do your writing to files.

My TextEdit document looks like this:

Sec. 1

hello

Sec. 2

bye

Sec. 3

hello again

Sec. 4

here’s more text

Now this should give you a list of each sections text generically almost.

set n to 4 – number of sections
set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document
end tell
set the_text to the_text & "Sec. " & (n + 1)
set sec_list to {}
repeat with i from 1 to n
set this_sec to "Sec. " & i
set next_sec to "Sec. " & (i + 1)
set begin_offset to offset of this_sec in the_text
set end_offset to (offset of next_sec in the_text) - 1
set this_text to text begin_offset thru end_offset of the_text
set end of sec_list to this_text
end repeat
return sec_list

Now we want to write to file.

Later,

okay i am trying to run that last script and I select the file and after a few seconds get an AppleScript Error:

Can’t get text 0 thru 274662 of “Sec. 1. It having been…”

and it displays the text

if i take out the “- 1” from
set end_offset to (offset of next_sec in the_text) - 1

then it runs without an error message but it seems to be an infinite loop or something because it won’t stop running. i see the end of the text in the result window and a number at the very end that keeps getting bigger as the script runs

A 0 offset means that it wasn’t found.

So, “Sec. 1” wasn’t found.

On this line:

set the_text to text of front document

change to:

set the_text to (text of front document) as string

Wait, try running this in Script Editor:

set n to 4 – number of sections
set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document
end tell
set the_text to the_text & "Sec. " & (n + 1)
set sec_list to {}
repeat with i from 1 to n
set this_sec to "Sec. " & i
set next_sec to "Sec. " & (i + 1)
set begin_offset to offset of this_sec in the_text
set end_offset to (offset of next_sec in the_text) - 1
try
set this_text to text begin_offset thru end_offset of the_text
set end of sec_list to this_text
on error
log this_sec
end try
end repeat
return sec_list

The result log should show what section is missing. I think you might have a section missing.

Later,

okay, here’s what i have:

set n to 243 – number of sections
set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document as string
end tell
set the_text to the_text & "Sec. " & (n) + 1
set sec_list to {}
repeat with i from 1 to n
set this_sec to "Sec. " & i
set next_sec to "Sec. " & (i + 1)
set begin_offset to offset of this_sec in the_text
set end_offset to (offset of next_sec in the_text) - 1
set this_text to text begin_offset thru end_offset of the_text
set end of sec_list to this_text
end repeat
return sec_list

and i get the same error. ?

Hi bridgebell,

Did you try the script with the log yet?

Later,

I may be missing a section. I went through it before and had to add in Sec. 165 which was missing. There may be another one missing, but I’m at work now so I’ll have to try it when I get home. I am most grateful for your help. This is educational for me. I’ll let you know what happens when I run the script with the log.

okay…the script is working with the error log. i don’t know what the error was. now i’m trying to figure out how to make the new documents with the offset sections and save them with the filenames “Sec. 1”, “Sec. 2”, etc

Here’s what I have:

set n to 243 – number of sections
set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document
end tell
set the_text to the_text & "Sec. " & (n + 1)
set sec_list to {}
repeat with i from 1 to n
set this_sec to "Sec. " & i
set next_sec to "Sec. " & (i + 1)
set begin_offset to offset of this_sec in the_text
set end_offset to (offset of next_sec in the_text) - 1
try
set this_text to text begin_offset thru end_offset of the_text
set end of sec_list to this_text
on error
log this_sec
end try
tell application “TextEdit”
make new document
set text of document to ???
end tell
end repeat
return sec_list

okay i made a breakthrough for me:

i have gotten it to make the new files with the correct text in the body of the documents. now i need a little help with saving the files as “Sec. 1”, “Sec. 2”, etc. I have made a variable for the destination folder but I am having trouble getting it to save the files and to name them. Any help would be much appreciated.

set n to 243 – number of sections
set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to text of front document
end tell
set the_text to the_text & "Sec. " & (n + 1)
set sec_list to {}
set exportFolder to choose folder with prompt “Select destination folder:”
repeat with i from 1 to n
set this_sec to "Sec. " & i
set next_sec to "Sec. " & (i + 1)
set begin_offset to offset of this_sec in the_text
set end_offset to (offset of next_sec in the_text) - 1
try
set this_text to text begin_offset thru end_offset of the_text
set end of sec_list to this_text
on error
log this_sec
end try
tell application “TextEdit”
make new document
set the text of the front document to this_text
end tell
end repeat
return sec_list

Hi bridgerbell,

Sorry, I fell asleep.

You don’t need to use TextEdit to make new documents and save them. Your computer has built-in scripting additions that reads/writes plain text files. We don’t need the sec_list anymore. It will just write the text to a new file.

set n to 3 – number of sections
set rtf_file to choose file
tell application “TextEdit”
launch
activate
open rtf_file
set the_text to (text of front document) as string – coerce unicode to plain text
end tell
set the_text to the_text & "Sec. " & (n + 1)
set exportFolder to choose folder with prompt “Select destination folder:”
repeat with i from 1 to n
set this_sec to “Sec. " & i
set next_sec to “Sec. " & (i + 1)
set begin_offset to offset of this_sec in the_text
set end_offset to (offset of next_sec in the_text) - 1
try
set this_text to text begin_offset thru end_offset of the_text
set file_spec to (”” & exportFolder & this_sec) as file specification
set reference_number to (open for access file_spec with write permission)
try
write this_text to reference_number
close access reference_number
on error
close access reference_number
error “Error writing to file.”
end try
on error error_message
display dialog error_message
end try
end repeat

There are several things you can do if a section is not found. You can also change it, so when you get the offset, if it returns 0, then you could do something there. It’s your preference on what to do.

gl,