Need Word Applescript syntax

Thanks to syntax in a script by Hans Hafner, I am beginning to set up a clean-up script for Word documents sent to me for web posting. I was able to strip double-returns, tabs, swap n-dashes for em-dashes, strip multiple word spaces, set space after paragraphs and fix indents. However, I am running into what must be a very obvious wall when I try to change type size or font in Word copy.

I have tried

    set text font size to 12   

addressing the text range, the paragraph, the active document, document 1, etc. and have not yet hit on the magic combination that actually makes a change. I have also tried variations on the somewhat arcane

 execute find find object of _range find text font size 14 replace with text font size 12 replace replace all

The other problem I’m having is conditional formatting based on content. I can’t seem to set one set of paragraph parameters based on whether or not a newline character is in the paragraph. I have tried AppleScript’s formatting (\n) and Word’s formatting (^l) but neither seems to be detected. I hate to say it, but this was a piece of cake in Quark, and seems impossible in Word. (I suspect I have no clear picture of Word Applescript syntax or scope … that was often the case when I hit a Quark wall.)

Has anyone had any success with the bloated Word dictionary, or can offer up some script snips of syntax?

I appreciate any help anyone can offer. I’ve done a fair amount of Quark scripting, and this brings back the hours of head-banging I went through until I discovered or was handed the proper syntax. The whole style of dictionary usage seems very unusual.

Thanks

George Mack

George:

Please do not become apoplectic with the enclosed script. As I noted in my comments, Word simply is not conducive to tight scripting. It seems like if it is not sloppy looking, it just does not work.

Anyway, this is a script I put together last spring to process a bunch of .txt files of a huge lot of chapters of Isaiah that I wanted formatted a particular way for study purposes. Essentially, it goes through and deletes any blank paragraphs, does some re-formatting of the title, and then changes the whole document’s font size and spacing as well.

I have not touched this thing in months, but it did the job at the time, and I tried to keep good notes as I was working through it in the comments, so you may find something useful there. I think this took about 6 hours to generate with the crummy examples in Word’s AppleScript PDF document.

If you are interested, I think I still have some of the original .txt files I can send you for examples on running this baby.

--Script to format a document containing the KJV version of a chapter in Isaiah.  These versions were created in Gospel Library 2001 on a PC and saved in Word.
--Amazingly enough, it works very well.  Word is just not designed for tight coding, however.

set {paraMark, para2Kill} to {"¶", {}} --Paragraph mark, and list of paragraphs to be deleted.  Essentially, all blank paragraphs that do NOT come before a ¶ marked paragraph.
tell application "Microsoft Word"
	set paraCount to count every paragraph of text object of active document --Count them up initially
	my KillParagraph((paraCount - 2), paraCount) --Kills the last 3, which are unnecessary
	my KillParagraph(1, 2) --Kills the first 2, also unneeded.
	set selFind to find object of selection --Mind boggling code, but it does seem to work.
	set forward of selFind to true
	set wrap of selFind to find stop
	set content of selFind to "TestamentCHAPTER"
	execute find selFind
	set content of text object of selection to "Testament CHAPTER" -- need to center the header
	set paraCount to count every paragraph of text object of active document --Recount
	repeat with ppx from 3 to (paraCount - 1) --This convoluted method works, don't mess with it.
		set c1 to content of text object of paragraph ppx of text object of active document --We are cycling through all the paragraphs, this gets the text of that paragraph into a variable.
		if (get ASCII number of character 1 of c1) = 13 then --Test paragraph to see if is blank (ascii 13 is return)
			set c2 to content of text object of paragraph (ppx + 1) of text object of active document --If a blank paragraph is encountered, check the next one to see if if contains a paragraph mark.
			if c2 does not contain paraMark then
				set end of para2Kill to ppx --If not, set the number of the BLANK paragraph to the kill list.
			end if
		end if
	end repeat
	set kill2Para to reverse of para2Kill --Reverse the order of the list, so that the paragraph numbers don't get mixed up.
	repeat with ppk in kill2Para
		my KillParagraph(ppk, ppk) --Kill off the blanks.
	end repeat
	--Now, set the formatting 
	set paraCount to count every paragraph of text object of active document --Recount, again.
	set alignment of paragraph 1 of active document to align paragraph center --Heading
	set myRange to create range active document start (start of content of ¬
		text object of paragraph 1 of active document) end (end of content ¬
		of text object of paragraph (paraCount - 1) of active document) --Select all paragraphs for both font size and line spacing changes.
	select myRange
	set line spacing rule of paragraph format of selection to line space multiple
	set line spacing of paragraph format of selection to (lines to points lines 1.5)
	set fSel to font object of selection
	set font size of fSel to 16
	collapse range text object of selection direction collapse end --Deselects selection, and places cursor at the end.  (Try start?)
end tell
-----------------------
on KillParagraph(kp1, kp2)
	tell application "Microsoft Word"
		set myRange to create range active document start (start of content of ¬
			text object of paragraph kp1 of active document) end (end of content ¬
			of text object of paragraph kp2 of active document) --Works OK, but does not like negative numbers.  Does just fine if the numbers are the same.
		select myRange
		delete myRange
	end tell
end KillParagraph

Hi Craig

Thanks for the fast reply. I’ll try out this code and see if it works the way I need it. The “font object” entity may be just what I was looking for,

The script I was using, derived from one of Hans Hafner’s, uses a different technique to strip blank paragraphs. (I’m cleaning my Minister’s sermons for PDFs for our church website). Since I wanted to strip out double blanks, and swap newlines, mostly seen in quote blocks, for paragraphs elsewhere, I used the following:

	tell application "Microsoft Word"
	activate
	set _range to text object of active document -- from Hans Hafner's script
-- 1) save true paragraph breaks 	
	-- MUST use MS Word codes, not Applescript's 
	execute find find object of _range find text "^p^p" replace with "€" replace replace all
	
	-- 2) put in newlines for block quotes instead of hard returns
	-- Word format is "^l" (caret, lowercase L)
	execute find find object of _range find text "^p" replace with "^l" replace replace all
	
	-- 3) use spaced n dashes instead of em dashes
	execute find find object of _range find text "^+" replace with " ^= " replace replace all
	
	-- 4) remove all double spaces
	-- I had trouble getting this to work until I copied a double space from a Word document and pasted it between the quotes on the next line
	repeat with i from 1 to 5 -- should find most with 5 passes... still need visual check
		execute find find object of _range find text "  " replace with " " replace replace all -- this was 2 spaces until I pasted it in here...
	end repeat
	
	--  5) restore paragraph returns
	execute find find object of _range find text "€" replace with "^p" replace replace all
	

After that, I applied a space after of 6 points

set paraCount to (count paragraphs of _range)
	repeat with i from 1 to paraCount
		set space after of paragraph i of _range to 6
		--set first line indent to 24 pts
		set hanging punctuation of paragraph i of _range to true
				set first line indent of paragraph i of _range to 24
		end repeat

I’m still looking for a way to detect the newline character, to set the indent of quoteblocks differently. But there are few enough that I can set the specs manually.

Hope this was of interest to you.

George:

Pretty dang interesting. I hope you get it all figured out. I don’t have the energy to dive into another session of AppleScripting Word, but don’t be afraid to use ASCII codes like I did to find that newline character. Once you figure out what it is, you should be able to finish up quickly.

Now, THAT is some serious irony. (Any sermons on Isaiah?)

Sorry, no Isaiah sermons, to date.

For what it’s worth, and for others who are struggling with Word scripting, I googled Applescript Examples for MS Word and found this site:

http://www.microsoft.com/mac/resources/resources.aspx?pid=asforoffice

There’s a 500+ page PDF document on using Word with Applescript. Why this isn’t packed on the install disk, I don’t know. :frowning:

I tend to cobble together something that works, then refine, bullet-proof, extend and finally generalize it. As I fooled around with this script, the loop-by-paragraph structure I posted above was replaced with

set space after of every paragraph of _range to 6

which is MUCH faster.
When I get this optimized, with all my desired features in place, I’ll upload it as a sample/model.

Still haven’t quite managed to structure a select paragraph by content thingy, but your suggestion of using ASCII numbers is intriguing (assuming that Word still uses them consistently?) But this is good progress for scripting this app for 2 days…

Best wishes,
George

Very nice. I’ll be fooling with this technique tomorrow…

Thank you.

George

I tried out Jacques’ code


set n to (ASCII number 10) --newline character

tell application "Microsoft Word"
	set _range to text object of active document
	my dosomething(paragraphs of _range whose last character is not n, 24, 6)
	my dosomething(paragraphs of _range whose last character is n, 10, 4)
end tell

on dosomething(L, t_indent, T_space)
	tell application "Microsoft Word" to repeat with paragraph_ in L
		tell paragraph_
			set first line indent to t_indent
			set space after to T_space
			set hanging punctuation to true
		end tell
	end repeat
end dosomething

and it won’t work. I’m not sure why.

I tried using “\n”, “^l” and “ASCII number 10” for the n search term, and in each case Word did not detect it. Word interpreted ASCII number 10 as 49.

Well, perhaps a newline character cannot be the last character in a paragraph. I tried other constructs, such as

my dosomething(paragraphs of _range whose content of text object contains n, 10, 4)

and

my dosomething(paragraphs of _range whose text object contains n, 10, 4)

and

my dosomething(paragraphs of _range which contains n, 10, 4)

the log shows that Word found no paragraphs with newline characters.

Is there some way of going from a Find Text to a Select Paragraph? It seems as though Word has Text Objects, Text Ranges, and Font Objects, overlapping but mutually opaque, as opposed to Quark, where objects are hierarchically organized.

I appreciate any enlightenment on this that you can offer.

Best wishes, and Happy Thanksgiving,

George

Jacques

I did also try

ASCII character 10

without results. I think the insistance on it being last character kicked it out (newline has to fall between other returns, by definition?)

I went back to Craig’s script and with a little more attention to his technique of extracting the copy of the paragraph to a variable to test, I was able to get my script to work properly. BUT ONLY after first substituting in a dummy character whereever I wanted a newline, then running the test/format sequence, then substituting the newline character back in for the dummy character. Still can’t reliably find the newline in Word using Applescript.

I’ll be posting the script soon, after I try a bit of optimization.

George

Here is a final script (until I think of something else…). I wrote this to facilitate getting my minister’s sermons on the web, and cleaning the copy to what I, as a graphic designer, would expect. The fixed styles/measurements in the script, and the expectations of the script handlers, are based on this person’s customary style. Using FastScript to launch this, with an open 4 page sermon, the cleanup took just under a minute.

The scripts expects an open Microsoft Word file, where the Title is followed by a double return, then a series of description lines each followed by a single return. The body text is composed of paragraphs each followed by an extra return. Poetry and similar quotes use single returns after each line. Endnotes follow a paragraph whose contents consists of “Sources” and a return. I have tried to comment everything. If you want to modify this, possibly stripping out the timer function or typographic changes you don’t agree with, please feel free.


--	SCRIPT SERMONSCRUB
--	George Mack, Nov 2006, with contributions from Hans Hafner, Craig Smith and Jacques of the MacScripter BBS

 tell application "Microsoft Word"
	if not (exists active document) then
		open (choose file with prompt "Which file should SermonScub clean?")
	end if
	
	set StartTime to current date -- for timer
	
	activate
	set myRange to text object of active document
	
	--	STILL TO DO		Set size of header numbers to 12
	
	--	** GLOBAL CHANGES TO ALL TEXT **
	set name of font object of myRange to "Times"
	set font size of font object of myRange to 12
	set paragraph left indent of every paragraph of myRange to 0
	set first line indent of every paragraph of myRange to 24
	set space after of every paragraph of myRange to 6
	
	--	 ** CLEAN COPY STYLES, double returns, spaces, etc.  Could be pair of arrays plus handler with loop **
	-- 	MUST use MS Word codes, not Applescript's 
	--	Use holder character to save true paragraph breaks (double returns)
	execute find find object of myRange find text "^p^p" replace with "€" replace replace all
	--	Use holder character for newlines for block quotes
	-- 	this only necessary because I can't figure how to reliably locate newlines using AppleScript in Word.
	execute find find object of myRange find text "^p" replace with "£" replace replace all
	--	Replace spaced n dashes for em dashes, my preference
	execute find find object of myRange find text "^+" replace with " ^= " replace replace all
	--	remove all double spaces
	repeat with i from 1 to 5 -- should find most with 5 passes... still need visual confirmation later
		execute find find object of myRange find text "  " replace with " " replace replace all
	end repeat
	--	restore true paragraph returns
	execute find find object of myRange find text "€" replace with "^p" replace replace all
	--	change 3 periods to true ellipsis, spaced -- this still isn't working properly, spaces aren't inserted.
	execute find find object of myRange find text "..." replace with " . " replace replace all
	--	return with tab or return with space to return, so we don't have oversize first indents
	execute find find object of myRange find text "^p^t" replace with "^p" replace replace all
	execute find find object of myRange find text "^p " replace with "^p" replace replace all
	
	set paracount to (count paragraphs of active document) -- fresh count
	-- 	Ignores 2 header paragraphs  
	set myRange to create range active document start (start of content of ¬
		text object of paragraph 3 of active document) end (end of content ¬
		of text object of paragraph (paracount) of active document)
	--	find newline character, which marks quote blocks, fix paragraph indents accordingly.
	select myRange
	set paracount to (count paragraphs in myRange)
	repeat with ParX from 1 to paracount -- set paragraph att. of block quotes.
		copy content of text object of paragraph ParX of myRange to ContentPpx
		if ContentPpx contains "Sources" & return then exit repeat
		--	end of block quotes, ready to format sources. Adding return in search string guards against the word showing up in other text paragraphs... but not some other situations
		if ContentPpx contains "£" then
			set paragraph left indent of paragraph ParX of myRange to 72
			set first line indent of paragraph ParX of myRange to 0
			if character 1 of ContentPpx is in {"\"", """} then --Hang quotes; 5 pts this font, by trial and error. 
				set first line indent of paragraph ParX of myRange to -5
			end if
		end if
	end repeat
	
	--	** Fix Source materials indents **
	copy ParX to SourceStart --save paragraph number where we exited
	repeat with ParX from SourceStart to paracount
		set paragraph left indent of paragraph ParX of myRange to 0
		set first line indent of paragraph ParX of myRange to 0
	end repeat
	
	--	Restyle first line, title
	set myRange to create range active document start (start of content of ¬
		text object of paragraph 1 of active document) end (start of content of ¬
		text object of paragraph 2 of active document)
	set font size of font object of myRange to 16
	set bold of font object of myRange to true
	
	--  	Apply some paragraph attibutes for Title
	set first line indent of paragraph 1 of active document to 0
	set first line indent of paragraph 2 of active document to 0
	set alignment of paragraph 1 of active document to align paragraph center
	set alignment of paragraph 2 of active document to align paragraph center
	
	--  Restore newlines in block quotes
	set paracount to count paragraphs of active document
	set myRange to create range active document start (start of content of ¬
		text object of paragraph 1 of active document) end (end of content ¬
		of text object of paragraph paracount of active document)
	execute find find object of myRange find text "£" replace with "^l" replace replace all
	
	-- remove manual page breaks  
	execute find find object of myRange find text "^m" replace with "^p" replace replace all
	
	collapse range text object of selection direction collapse end
	
	beep 2
	display dialog "All Done! " & return & "Please check for unforseen situations." giving up after 4
	
	--	** Rest of TIMER FUNCTION **
	set StopTime to current date
	set Elapsed to (StopTime - StartTime) -- in seconds
	set _minutes to Elapsed div 60 as text
	set _seconds to Elapsed mod 60
	if _seconds < 10 then set _seconds to "0" & _seconds
	set Elapsed to "" & _minutes & ":" & (_seconds as text)
	display dialog "Elapsed time was " & Elapsed giving up after 4
	
end tell

I hope this is helpful as a starting file to some of you.

George