Delimit text Help (MS Word)

Hi Guys,

I am using MacIntosh X (Mountain Lion), MS Office 2011 and Applescript version 2.5
I had Word file with bunch of list, like the below sample text.

Name:XXXXXXX
Age:YYYYY
D.O.B:00/00/0000
Gender:ZZZZ

I need a script to loop through all the para and to Delimit the text from “:” to the end of the para and to keep it as a variable. So I call the variable to another script.

For example:
P1 as “XXXXXXX”
P2 as “YYYYY”
P3 as “00/00/0000”
P4 as “ZZZZ”

Likewise end of the document.

Thanks in advance,
John

If I am understanding you correctly, here is a start toward a solution that allows all the programming to be in Applescript. The following Applescript will read all the lines in the Word file selected into a single variable. This is then processed to separate out each of the variables in your example data. Granted this is rather a brute force approach, but it does work. Other options would be to put each piece of data for an individual into a list.

Hope this helps :slight_smile:


set theFile to choose file with prompt "Select word file to process:" of type {"com.microsoft.word.doc", "org.openxmlformats.wordprocessingml.document"}

tell application "Microsoft Word"
	open theFile
	set theContent to content of text object of active document
	my processContent(theContent)
end tell

on processContent(theContent as text)
	set lineCount to the number of paragraphs in theContent
	repeat with i from 1 to lineCount
		set theLine to paragraph i of theContent as text
		if theLine contains "Name:" then
			set theName to characters 6 thru -1 of theLine as text
		else if theLine contains "Age:" then
			set theAge to characters 5 thru -1 of theLine as text
		else if theLine contains "D.O.B:" then
			set theDOB to characters 7 thru -1 of theLine as text
		else if theLine contains "Gender:" then
			set theGender to characters 8 thru -1 of theLine as text
		end if
		-- put call to process(es) to handle variables here
	end repeat
end processContent


The test word file included the following lines:

Name:XXXXXXX
Age:YYYYY
D.O.B:00/00/0000
Gender:ZZZZ
Name:AAAAAAA
Age:YYYYY
D.O.B:00/00/0000
Gender:ZZZZ
Name:BBBBBBB
Age:YYYYY
D.O.B:00/00/0000
Gender:ZZZZ

Hi, haolesurferdude. Your method requires hard-coded numbers, but, as we don’t know if the OP’s sample contains all possible scenarios”they almost never do :)”it would be advisable to use the offset command to determine the colon’s location.

Because there is also no way to determine how many different paragraphs there will be throughout the text, a record should be made to pair the p variables (keys) with their values.


set theContent to "Name:XXXXXXX
Age:YYYYY
D.O.B:00/00/0000
Gender:ZZZZ"

set {textList, recordList} to {{}, {}}

# Get text after offset of colons
repeat with index from 1 to count theContent's paragraphs
	set textList's end to theContent's paragraph index's text ((offset of ":" in theContent's paragraph index) + 1) thru -1
end repeat

# Make records
repeat with index from 1 to count textList
	set recordList to my recordList & (run script "{|" & ("p" & index) & "|: " & (quote & textList's item index & quote) & "}")
end repeat

# Get value with key
recordList's p2 --or p3, p4, etc.

Edited for clarity.

Hi Marc Anthony,

Your code works perfect and split out the content.

But I tried it with a small modification, i.e., setting thecontent to whole document text like below.


set myFile to choose file with prompt "Please select the Word File:" of type {"doc", "docx"} default location (path to desktop)

tell application "Microsoft Word"
	activate
	open myFile
	set thecontent to content of text object of active document
end tell

set {textList, recordList} to {{}, {}}

# Get text after offset of colons
repeat with index from 1 to count thecontent's paragraphs
	set textList's end to thecontent's paragraph index's text ((offset of ":" in thecontent's paragraph index) + 1) thru -1
end repeat

# Make records
repeat with index from 1 to count textList
	set recordList to my recordList & (run script "{|" & ("p" & index) & "|: " & (quote & textList's item index & quote) & "}")
end repeat

# Get value with key
recordList's p1 --or p3, p4, etc.

But it throws an error (Can’t get text 1 thru -1 of “”) in the following line

set textList's end to thecontent's paragraph index's text ((offset of ":" in thecontent's paragraph index) + 1) thru -1

how to fix it?

Thanks
John

The error message is telling you it encountered a line without text. Change the first loop to read:

if not theContent's paragraph index's text = "" then set textList's end to theContent's paragraph index's text ((offset of ":" in theContent's paragraph index) + 1) thru -1