Chopping up a document in Word

I have a long document in rtf format (saved from Word 2004 which I don’t own - its not my document) that looks like this:

– snip –
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin eleifend ipsum eget nunc. Morbi mattis velit vel justo scelerisque ullamcorper. Nam tincidunt enim vel nisl. Aliquam nisl.
Suspendisse blandit, sapien at consectetuer faucibus, sem leo dignissim eros, at rhoncus lorem turpis in massa. Proin urna ligula, convallis et, imperdiet nec, tristique ut, dolor. Ut neque lectus, varius eu, imperdiet et, dignissim et, elit. Nam at wisi vitae dui auctor porttitor.
July 10, 2005 ← 16 point font#1, color1
10:09am ← 12 point font#2, color2 (but all the rest of the text in 12 point Font#3, black
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin eleifend ipsum eget nunc. Morbi mattis velit vel justo scelerisque ullamcorper. Nam tincidunt enim vel nisl. Aliquam nisl. Suspendisse blandit, sapien at.
consectetuer faucibus, sem leo dignissim eros, at rhoncus lorem turpis in massa. Proin urna ligula, convallis et, imperdiet nec, tristique ut, dolor. Ut neque lectus, varius eu, imperdiet et, dignissim et, elit. Nam at wisi vitae dui auctor porttitor.
July 11, 2006
10:07am
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Proin eleifend ipsum eget nunc. Morbi mattis velit vel justo scelerisque ullamcorper. Nam tincidunt enim vel nisl. Aliquam nisl. Suspendisse blandit, sapien at consectetuer faucibus, sem leo dignissim eros, at rhoncus lorem turpis in massa. Proin urna ligula, convallis et, imperdiet nec, tristique ut, dolor. Ut neque lectus, varius eu, imperdiet et, dignissim et, elit. Nam at wisi vitae dui auctor porttitor.
July 13, 2007
– snip –

I want to divvy it up into individual documents. One way to do that is to add “~~~~”, say, between the black text and the colored larger date. Then, without losing the formatting, I can use text item delimiters to chop it up and get date and time from the first two paragraphs and text from the rest.

Suggestions?

TextEdit can read rtf documents, can detect paragraphs of a document, and should retain the formatting during copy/paste operations.

So you could open it with textedit, read a paragraph at a time into applescript and determine the paragraph’s length. The paragraph’s length would help you determine if you had an actual paragraph of text or if you had a date or time stamp. Or you could just try coercing the paragraph to a date with a try/on error statement.

Using a counter variable, you could cut and paste the appropriate paragraphs into new textedit documents and save them.

Thanks. I’m trying to move them to Journler which also understands rtf, so that might be the way to go.

Hi Adam,

I played a bit with MS Word and this is the result.
Hope it helps


property font1 : "Times"
property color1 : {255, 0, 0}
property size1 : 16

tell application "Microsoft Word"
	set aDoc to active document
	set highPara to {}
	tell aDoc
		repeat with i from 1 to count paragraphs
			tell font object of text object of paragraph i
				if color is color1 and name is font1 and font size is size1 then set end of highPara to i
			end tell
		end repeat
	end tell
	repeat with i from 1 to ((count highPara) - 1)
		set {_from, _to} to {(item i of highPara), (item (i + 1) of highPara) - 1}
		set myRange to create range active document start (start of content of ¬
			text object of paragraph _from of active document) end (end of content ¬
			of text object of paragraph _to of active document)
		select myRange
		copy object selection
		set newDoc to make new document
		paste object text object of newDoc
		save newDoc in ((path to desktop as Unicode text) & "doc" & i & ".doc")
		close front document
	end repeat
end tell

Looks good, Stephan, but I can’t test it; still using Word X, didn’t buy Word 2004 (too many less expensive, less complex alternatives – I wrote two complete sets of book-length course notes in Word 5 and that was the last one I liked).