Breaking an rtf file with names

Hi
I have a huge rtf file which I hope to break in different files based on an identifier which is at the very top of each portion I hope to break.
These files should still be rtf as they all have some html links I need to keep.
While this is easy with plain text I am unable to use the previous parameters which allowed me to do so-

The "break word is
MOVIE then a Carriage return and then the actual movie title
Ideally I’d like to use the Movie title as the name of the new file.
Thanks a lot for any help

Dan

I was suggested this script:

set folderPath to path to desktop folder as string --  the destination folder where to save new RTF files
 
--  choose the end of line of the RTF text (the carriage return or the linefeed)
set rc to return --<-- carriage return
--set rc to linefeed --<-- linefeed
 
set searchString to "MOVIE" & rc
tell application "TextEdit"
      tell (get front document)
            set tc to count paragraphs
            set start_Line to 1 -- the starting line of the first portion
            considering case
                  repeat with i from 1 to tc
                        if (get paragraph i) = searchString then --   "MOVIE" and carriage return
                              if (get paragraph (i - 1)) is rc and (get paragraph (i - 2)) is rc then --  two empty lines before "MOVIE" (carriage return and carriage return)
                                    set movTitle to paragraph (i + 1) -- get the title
                                    tell movTitle to if it ends with rc then set movTitle to text 1 thru -2 --  remove the carriage return in the title of the movie
                                    my newRtfDocAndSave(start_Line, i + 1, folderPath & movTitle & ".rtf", it)
                                    set start_Line to i + 2 --  the starting line of the next portion
                              end if
                        end if
                  end repeat
            end considering
      end tell
end tell
 
on newRtfDocAndSave(n1, n2, f, tDoc)
      tell application "TextEdit"
            set newDoc to make new document
            duplicate (paragraphs n1 thru n2 of tDoc) to first paragraph of newDoc -- copy portion of the rtf text from the current document  to the new document
            close newDoc saving yes saving in (file f) -- save and close the new document
      end tell
end newRtfDocAndSave

however I opened one file (Filename iAsia.rtf) to have an open front document opened but the script gave me this error
“TextEdit got an error: Can’t get document 1. Invalid index.”
So I added this line at the very top
set thefile to (choose file of type “rtf” with prompt “Please select a file”)
and changed the
tell (get front document) to
tell thefile
but I received an additional error:
alias “Start:Users:Danwan:Desktop:Blogs:iAsia.rtf” doesn’t understand the “count” message."
how can I fix it all?
I am on the last ELCapitan and the file was a TextEdit rtf file

Regards and hope someone can help

I’m puzzled because here, the posted script behave flawlessly.
I just had to comment out the instruction set rc to return --<-- carriage return
and activate the instruction :
set rc to linefeed --<-- linefeed

May you try to run this subset after opening one rtf file ?

set folderPath to path to desktop folder as string --  the destination folder where to save new RTF files

tell application "TextEdit"
	tell (get front document)
		# determine which end of line character is used
		if its text contains return then
			set rc to return
		else
			set rc to linefeed
		end if
		set searchString to "MOVIE" & rc
		set tc to count paragraphs
	end tell
end tell

When I ran it with an open document (in fact it was a rtfd one) I got :
tell current application
path to desktop as string
end tell
tell application “TextEdit”
get document 1
get every text of document “cheval - copie.rtfd”
count every paragraph of document “cheval - copie.rtfd”
end tell
Résultat :
1129

If you wish to be able to choose the file, you may test :

set folderPath to path to desktop folder as string --  the destination folder where to save new RTF files
set theSourceFile to choose file of class {"public.rtf"}
tell application "TextEdit"
	open theSourceFile # REQUIRED
	tell (get front document)
		# determine which end of line character is used
		if its text contains return then
			set rc to return
		else
			set rc to linefeed
		end if
		set searchString to "MOVIE" & rc
		set tc to count paragraphs
	end tell
end tell

When I ran it I got :
tell current application
path to desktop as string
end tell
tell application “Script Editor”
choose file
end tell
tell application “TextEdit”
open alias “SSD 500:Users:username:Desktop:cheval - copie 2.rtf”
get document 1
get every text of document “cheval - copie 2.rtf”
count every paragraph of document “cheval - copie 2.rtf”
end tell
Résultat :
1129

My understanding is that when you used choose file you failed to insert the instruction : open theSourceFile flagged as # REQUIRED.

Yvan KOENIG running El Capitan 10.11.3 in French (VALLAURIS, France) jeudi 10 mars 2016 15:37:35

Merci Yvan. mais le script compte les paragraphs et il ne ouvre jamais la dernier partie
Donc je n’arrive pas a avoir mes nouveaux documents
Quoits faire pour

on newRtfDocAndSave(n1, n2, f, tDoc)
tell application "TextEdit"
set newDoc to make new document
duplicate (paragraphs n1 thru n2 of tDoc) to first paragraph of newDoc -- copy portion of the rtf text from the current document to the new document
close newDoc saving yes saving in (file f) -- save and close the new document
end tell
end newRtfDocAndSave

Thanks Yvan. The script gives me now the count of paragraphs
However it dies after counting them
I dont know what to do to enable the last part?
PS
J’ai aussi essayé d’écrire dans votre belle langue que maintenant je n’écris jamais et je ne parle pas plus.
… Diphtongues … accents … tout oublié. Mais je suis maintenant un vieil homme et cette communauté m’a aidé toujours.
Je vous remercie beaucoup, par un Italien à Taiwan.

Model: MAC PRO 4.1 32GB RAM ans SSD disk
Browser: Safari 537.36
Operating System: Mac OS X (10.10)

The posted script was given to test the access to the file according to the problems described in your second message.

If the short scripts work flawlessly, your original script would work too as it does here.

Here is the complete version using choose file.

set folderPath to path to desktop folder as string --  the destination folder where to save new RTF files

set theSourceFile to choose file of class {"public.rtf"}

tell application "TextEdit"
	open theSourceFile # REQUIRED
	tell (get front document)
		# determine which end of line character is used
		if its text contains return then
			set rc to return
		else
			set rc to linefeed
		end if
		set searchString to "MOVIE" & rc
		set tc to count paragraphs
		set start_Line to 1 -- the starting line of the first portion
		considering case
			repeat with i from 1 to tc
				if (get paragraph i) = searchString then --   "MOVIE" and carriage return
					if (get paragraph (i - 1)) is rc and (get paragraph (i - 2)) is rc then --  two empty lines before "MOVIE" (carriage return and carriage return)
						set movTitle to paragraph (i + 1) -- get the title
						tell movTitle to if it ends with rc then set movTitle to text 1 thru -2 --  remove the carriage return in the title of the movie
						if (movTitle contains "/") or movTitle contains ":" then set movTitle to my remplace(movTitle, {"/", ":"}, "_")
						my newRtfDocAndSave(start_Line, i + 1, folderPath & movTitle & ".rtf", it)
						set start_Line to i + 2 --  the starting line of the next portion
					end if
				end if
			end repeat
		end considering
	end tell
end tell

on newRtfDocAndSave(n1, n2, f, tDoc)
	tell application "TextEdit"
		set newDoc to make new document
		duplicate (paragraphs n1 thru n2 of tDoc) to first paragraph of newDoc -- copy portion of the rtf text from the current document  to the new document
		close newDoc saving yes saving in (file f) -- save and close the new document
	end tell
end newRtfDocAndSave

#=====
(*
replaces every occurences of d1 by d2 in the text t
*)
on remplace(t, d1, d2)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d1}
	set l to text items of t
	set AppleScript's text item delimiters to d2
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end remplace

#=====

It did the job flawlessly too.

Yvan KOENIG running El Capitan 10.11.3 in French (VALLAURIS, France) jeudi 10 mars 2016 16:55:02

Thanks you again for you r time and interest in my problem.
However the script doen’t save any files
It duplicates them correnctly in a new untitled doc and closes them
On my screen I see the original file in the background, an Untitled document which the script populates correctly then closes opening a second untitled document seconds after closing the previous.
It all happens in a second or more.
I added a delay before
close newDoc saving yes saving in
(file f) - save and close the new document statement 3 lines before the end of the script
I can see the duplication is perfect
but perhaps the saving in (file f) is the problem … as the (file f) doesn’t seem to be defined.

Wth a longer delay 20 I see a standard “save” Texedit window requesting to overwrite the “untitled-rtd”. but it closes as the new name is not available from the (file f) .
If the delay is long enough I am able to quickly name the new document and also reopen a new document
This because the script doesn’t work unless both the original file and an Untitled Document are open
My original file is extremely long with more than 2002 fragments: the manual entering for the new file becomes practically impossible
In the duplicated fragment the is a second "“delimiter” always the same throughout the file and it may give a possible new delimiter to save the file with a name.

The structure in the duplicated fragment si like this:
Linefeed
Possible title + the words “Last seen:”
as pasted here:

Searching for the Elephant Last Seen: 04 Nov 2014

Director: S.K. Jhung | South Korea | 2009 | 144’¨Producer: S.K. Jhung, Neung-yeon Joh | Studio: Asian Crush | Genre: Drama
Summary: ¨The film depicts the hollow lives of affluent thirty-something young urban professionals in Seoul. The protagonists are three childhood friends, each struggling with a compulsion: schizophrenia, addiction, and infidelity. The revelation of their secrets exacerbates their sense of deprivation, and the three friends are inevitably led to a shocking finale when they learn that growing pains are not just distant memories of their youth.
Comment
MY Films
Trailer search

MOVIE

Is so because the first £fragment" only contains the word “MOVIE” after a number of LF

Is there a way to use whatever comes bedore the words “last Seen” as the title foe the (file f)

Thanks a lot again

Dan

I was able to replicate the described behavior when the string movFile contains a colon (:).
Look at this example: movFile is “Tartempion: 1er”
The save instruction becomes:

close document “Sans titre” saving yes saving in file “SSD 500:Users:username:Desktop:Tartempion: 1er.rtf”
which means save the filer " 1er.rtf" in the subfolder Tartempion of the desktop folder. Alas there is no subfolder Tartempion so the code fail.

I edited the script in message #5 to get rid of that.

For your complementary request, I’m not sure that I understand what you wrote (English is not my main language).

May you send a sample source file to my mailbox: koenig yvan sfr fr

Yvan KOENIG running El Capitan 10.11.3 in French (VALLAURIS, France) samedi 12 mars 2016 11:48:28

Yvan,
merci encore pour votre temps et votre intérêt à m’aider à résoudre mon problème.
Mais, je me demande si vous voulez dire “ligne 5” dans votre script, puisque vous avez écrit:.
“I edited the script in message #5 to get rid of that.”
Heureusement, je reçu de l’aide du Japon … Il est pas un applescript, donc le peu que je sais sur Applescript ne me aide pas à comprendre sa structure. Mais il fonctionne parfaitement, car il me renvoie ce que je demandais (la deuxième question inclus) dans ma réponse précédente.
Je vous écris dans mon “terrible français”, et je suis désolé pour mes erreurs …
Ici vous trouverez le script japonais qui a fait ce travail. Je crois que est écrit en Python … et autre chose.

Thank you again for your time and your interest to help me solve my problem.
I wonder if you mean “line 5” in your script, since you wrote: “I edited the script in message #5 to get rid of that.”
Luckly, I received help from Japan … it is not an applescript, hence the little I know on Applescript doesn’t help me to understand its structure. But it functions perfectly, as it returns me what I was asking (the second question included) in my previous reply.
I write in my “terrible french”, and I am sorry for my mistakes …
Here you will find the japanese script which did this job. I believe is written in Python … plus something else.

The script:

set |NBSP| to character id 160 – U+00A0 NO-BREAK SPACE
set |NL| to linefeed – U+000A LINE FEED
–set dlm to |NBSP| & |NL| & “MOVIE” & |NL| & |NBSP| & |NL| – # delimiter pattern 1
set dlm to |NL| & “MOVIE” & |NL| – # delimiter pattern 2

set d to (choose folder with prompt “Choose output directory”)
set ff to (choose file of type {“rtf”} with prompt “Choose source rtf file(s)” with multiple selections allowed)
set args to dlm’s quoted form & space
repeat with a in {d} & ff
set args to args & a’s POSIX path’s quoted form & space
end repeat

do shell script "/usr/bin/python <<‘EOF’ - " & args & "

coding: utf-8

file:

split_rtf.py

usage:

split_rtf.py delimiter directory file.rtf [file.rtf …]

argv[1] : delimiter text

argv[2] : output directory

argv[3:] : source rtf file(s)

version:

0.11

import sys, os, re
from Foundation import *
from AppKit import *

argv = [ a.decode(‘utf-8’) for a in sys.argv ]
DLM = argv[1]
OUTDIR = argv[2]
DLML = len(DLM)

for f in argv[3:]:
mas, docattr, err = NSAttributedString.alloc().initWithURL_options_documentAttributes_error_(
NSURL.fileURLWithPath_(f),
{NSDocumentTypeDocumentOption : NSRTFTextDocumentType},
None,
None)
if not mas:
sys.stdout.write(‘%s: failed to read rtf: %s\n’ % (f.encode(‘utf-8’), err.description().encode(‘utf-8’)))
continue

s = mas.string()
aa = s.componentsSeparatedByString_(DLM)
k0 = aa[0].length() + DLML

for a in aa[1:]:
    k = k0
    k0 += a.length() + DLML
    m = re.search(r'^(.*) Last Seen:', a, re.M)
    if not m: continue
    n = m.group(1)
    n = re.sub(r':', ';', n)    # replace : with ; (: in file name is shown as / in OS X Finder)
    n = re.sub(r'/', ':', n)    # replace / with : (/ is reserved as node separator in POSIX path)
    outfile = os.path.join(OUTDIR, '%s.rtf' % n)
    data = mas.RTFFromRange_documentAttributes_(
        (k, a.length()),
        docattr)
    b = data.writeToFile_atomically_(
        outfile,
        False)
    if not b:
        sys.stdout.write('%s: failed to write file: %s\\n' % (outfile.encode('utf-8'), err.description().encode('utf-8')))

EOF"

Kind regards
Danwan

Here is the final version of the script which I was able to rebuild after receiving a sample file.

set p2d to path to desktop folder
set folderName to "4dawnam"
tell application "System Events"
	if not (exists folder folderName of p2d) then
		make new folder at end of p2d with properties {name:folderName}
	end if
end tell
set folderPath to (p2d as text) & folderName & ":" # Don't remoge the trailing colon --  the destination folder where to save new RTF files
set theSourceFile to choose file of class {"public.rtf"}

tell application "TextEdit"
	open theSourceFile # REQUIRED
	tell (get front document)
		# determine which end of line character is used
		if its text contains return then
			set rc to return
		else
			set rc to linefeed
		end if
		set searchString to "MOVIE" & rc
		set tc to count paragraphs
		set start_Line to 1 -- the starting line of the first portion
		considering case
			repeat with i from 1 to tc
				if ((get paragraph i) = searchString) or i = tc then --   "MOVIE" and carriage return, or end of document
					my newRtfDocAndSave(start_Line, i, folderPath, it, rc)
					set start_Line to i + 2 --  the starting line of the next portion
				end if
			end repeat
		end considering
	end tell
end tell

#=====

on newRtfDocAndSave(n1, n2, f, tDoc, rc)
	set marker to "Last Seen:"
	considering case
		tell application "TextEdit"
			set newDoc to make new document
			duplicate (paragraphs n1 thru n2 of tDoc) to first paragraph of newDoc -- copy portion of the rtf text from the current document  to the new document
			tell document 1
				# a bit of cleaning at beg
				repeat
					if paragraph 1 = rc or paragraph 1 ends with "MOVIE" & rc then
						delete paragraph 1
					else
						exit repeat
					end if
				end repeat
				# a bit of cleaning at end
				repeat while last paragraph is in {rc, "MOVIE" & rc}
					delete last paragraph
				end repeat
				
				set foundMarker to false
				repeat with j from 1 to count paragraphs
					if (get paragraph j) contains marker then
						set foundMarker to true
						set theParagraphWithTitle to paragraph j
						set movTitle to text 1 thru ((offset of marker in theParagraphWithTitle) - 1) of theParagraphWithTitle
						repeat while movTitle ends with space
							set movTitle to text 1 thru -2 of movTitle
						end repeat
						if (movTitle contains "/") or movTitle contains ":" then set movTitle to my remplace(movTitle, {"/", ":"}, "_")
						exit repeat
					end if
				end repeat
				if foundMarker then
					close newDoc saving yes saving in (file (f & movTitle & ".rtf")) -- save and close the new document
				end if
			end tell # document 1
		end tell # TextEdit
	end considering
end newRtfDocAndSave

#=====
(*
replaces every occurences of d1 by d2 in the text t
*)
on remplace(t, d1, d2)
	local oTIDs, l
	set {oTIDs, AppleScript's text item delimiters} to {AppleScript's text item delimiters, d1}
	set l to text items of t
	set AppleScript's text item delimiters to d2
	set t to l as text
	set AppleScript's text item delimiters to oTIDs
	return t
end remplace

#=====

Yvan KOENIG running El Capitan 10.11.3 in French (VALLAURIS, France) dimanche 13 mars 2016 20:33:29

Thanks Yvan … I will try it again tomorrow …
Dan