Build fixed length paragraphs of whole words from a page of words

I am sure I was set this question in my computer studies a-level back in 1979. I was lucky to have been taught by Miss Judith Jolly. She bought an Apple II for the Tavistock Comprehensive School. A great inspiration but never did grasp the fundamentals of sorting data!!!

I need to extract text: for example 1000 characters and then paste it each time into a new document. The source is a page of text without any commas or full stops. (although I might later try to write a script which ‘delimits’? between the names and puts a comma).

Any assistance would be gratefully received. I have really got into this Applescript. Some of my dreams are coming true. Whist other times I lay there in a ‘for next loop’!

Thanks Pete

Pro FCP VFX editor Pinewood UK

-- A script to copy text into a new document with a maximum number of 50 characters and no trauncated words extracted from a simple text file.

-- list from textEDIT doc
set LongPlayersList to "Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger John Arne Riise Harry Kewell Steven Gerrard Robbie Fowler Luis Garcia Mark Gonzalez " --Fabio Aurelio Xabi Alonso Peter Crouch Jermaine Pennant Craig Bellamy Dirk Kuyt Momo Sissoko Jamie Carragher Pepe Reina Paul Anderson Stephen Warnock Gabriel Paletta Boudewijn Zenden Danny Guthrie Adam Hammill Lee Peltier Craig Lindfield"

-- to indicate
set characterCount to count of characters of LongPlayersList
display dialog ("Number of characters in this selection is " & characterCount)

if characterCount is greater than 200 then
	set PlayersList1 to (characters 1 thru 50 of LongPlayersList) as string
	set PlayersList2 to (characters 51 thru 100 of LongPlayersList) as string
	set PlayersList3 to (characters 101 thru 150 of LongPlayersList) as string
	set PlayersList4 to (characters 151 thru 200 of LongPlayersList) as string
	display dialog ("PlayersList1" & PlayersList1)
	display dialog ("PlayersList2" & PlayersList2)
	display dialog ("PlayersList3" & PlayersList3)
	display dialog ("PlayersList4" & PlayersList4)
	-- create new text document and paste text of PlayersList1
	-- create new text document and paste text of PlayersList2
	-- create new text document and paste text of PlayersList3
	-- create new text document and paste text of PlayersList4
else
	if characterCount is greater than 150 then
		set PlayersList1 to (characters 1 thru 50 of LongPlayersList) as string
		set PlayersList2 to (characters 51 thru 100 of LongPlayersList) as string
		set PlayersList3 to (characters 101 thru 150 of LongPlayersList) as string
		set PlayersList4 to (characters 151 thru characterCount of LongPlayersList) as string
		display dialog ("PlayersList1" & PlayersList1)
		display dialog ("PlayersList2" & PlayersList2)
		display dialog ("PlayersList3" & PlayersList3)
		display dialog ("PlayersList4" & PlayersList4)
		-- create new text document and paste text of PlayersList1
		-- create new text document and paste text of PlayersList2
		-- create new text document and paste text of PlayersList3
		-- create new text document and paste text of PlayersList4		
	else
		if characterCount is greater than 100 then
			set PlayersList1 to (characters 1 thru 50 of LongPlayersList) as string
			set PlayersList2 to (characters 51 thru 100 of LongPlayersList) as string
			set PlayersList3 to (characters 101 thru characterCount of LongPlayersList) as string
			display dialog ("PlayersList1" & PlayersList1)
			display dialog ("PlayersList2" & PlayersList2)
			display dialog ("PlayersList3" & PlayersList3)
			-- create new text document and paste text of PlayersList1
			-- create new text document and paste text of PlayersList2
			-- create new text document and paste text of PlayersList3			
		else
			if characterCount is greater than 50 then
				set PlayersList1 to (characters 1 thru 50 of LongPlayersList) as string
				set PlayersList2 to (characters 51 thru characterCount of LongPlayersList) as string
				display dialog ("PlayersList1" & PlayersList1)
				display dialog ("PlayersList2" & PlayersList2)
				-- create new text document and paste text of PlayersList1
				-- create new text document and paste text of PlayersList2										
				set the numberDocs to 2
			else
				if characterCount is greater than 0 then
					set PlayersList1 to (characters 1 thru characterCount of LongPlayersList) as string
					display dialog ("PlayersList1" & PlayersList1)
					-- create new text document and paste text of PlayersList1				
					set the numberDocs to 1
				end if
			end if
		end if
	end if
end if

Interesting problem. This isn’t optimized for speed, but it works for the list of words you provided. allDocs is a list of 50 character or less strings that don’t start or end with a space and don’t split a word.


set LPL to "Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger John Arne Riise Harry Kewell Steven Gerrard Robbie Fowler Luis Garcia Mark Gonzalez Fabio Aurelio Xabi Alonso Peter Crouch Jermaine Pennant Craig Bellamy Dirk Kuyt Momo Sissoko Jamie Carragher Pepe Reina Paul Anderson Stephen Warnock Gabriel Paletta Boudewijn Zenden Danny Guthrie Adam Hammill Lee Peltier Craig Lindfield"

set allDocs to {}
set W to words of LPL
set k to 1
repeat
	try
		set P to characters k thru (k + 49) of LPL as string
		if first character of P is space then set P to (characters 2 thru -1 of P) as text
		if last character of P is space then set P to (characters 1 thru -2 of P) as text
		if last word of P is not in W then -- we've split a word
			set P to words 1 thru -2 of P
			set AppleScript's text item delimiters to space
			set P to P as text
			set AppleScript's text item delimiters to ""
		end if -- now find out where we left off
		set k to (offset of (last word of P) in LPL) + (length of last word of P)
		set end of allDocs to P -- stick it in our list of strings
	on error -- we've hit the end of the original string
		set end of allDocs to characters (k + 1) thru -1 of LPL as text
		exit repeat
	end try
end repeat
--> {"Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger", "John Arne Riise Harry Kewell Steven Gerrard", "Robbie Fowler Luis Garcia Mark Gonzalez Fabio", "Aurelio Xabi Alonso Peter Crouch Jermaine Pennant", "Craig Bellamy Dirk Kuyt Momo Sissoko Jamie", "Carragher Pepe Reina Paul Anderson Stephen", "Warnock Gabriel Paletta Boudewijn Zenden Danny", "Guthrie Adam Hammill Lee Peltier Craig Lindfield"}

Thanks Adam,

It is trully amazing who you have turned this on its head to find the solution!

But how do I get the four separate resultant documents in this example?


set LPL to "Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger John Arne Riise Harry Kewell Steven Gerrard Robbie Fowler Luis Garcia Mark Gonzalez Fabio Aurelio Xabi Alonso Peter Crouch Jermaine Pennant Craig Bellamy Dirk Kuyt Momo Sissoko Jamie Carragher Pepe Reina Paul Anderson Stephen Warnock Gabriel Paletta Boudewijn Zenden Danny Guthrie Adam Hammill Lee Peltier Craig Lindfield"

set allDocs to {}
set W to words of LPL
set k to 1
repeat
	try
		set P to characters k thru (k + 99) of LPL as string
		if first character of P is space then set P to (characters 2 thru -1 of P) as text
		if last character of P is space then set P to (characters 1 thru -2 of P) as text
		if last word of P is not in W then -- we've split a word
			set P to words 1 thru -2 of P
			set AppleScript's text item delimiters to space
			set P to P as text
			set AppleScript's text item delimiters to ""

			display dialog allDocs as text

		end if -- now find out where we left off
		set k to (offset of (last word of P) in LPL) + (length of last word of P)
		set end of allDocs to P -- stick it in our list of strings
	on error -- we've hit the end of the original string
		set end of allDocs to characters (k + 1) thru -1 of LPL as text
		exit repeat
	end try
end repeat
-- eg
--> doc1{Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger John Arne Riise Harry Kewell Steven Gerrard Robbie}
--> doc2 {Fowler Luis Garcia Mark Gonzalez Fabio Aurelio Xabi Alonso Peter Crouch Jermaine Pennant....}
--> doc3 {Craig Bellamy Dirk Kuyt Momo Sissoko Jamie Carragher Pepe Reina Paul Anderson Stephen...}
--> doc4 {Warnock Gabriel Paletta Boudewijn Zenden Danny Guthrie Adam Hammill Lee Peltier Craig Lindfield}

Like so… (and I didn’t “turn it on it’s head”, I just pursued what you stated as the conditions on each doc)


set LPL to "Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger John Arne Riise Harry Kewell Steven Gerrard Robbie Fowler Luis Garcia Mark Gonzalez Fabio Aurelio Xabi Alonso Peter Crouch Jermaine Pennant Craig Bellamy Dirk Kuyt Momo Sissoko Jamie Carragher Pepe Reina Paul Anderson Stephen Warnock Gabriel Paletta Boudewijn Zenden Danny Guthrie Adam Hammill Lee Peltier Craig Lindfield"

set allDocs to {}
set W to words of LPL
set k to 1
repeat
	try
		set P to characters k thru (k + 99) of LPL as string
		if first character of P is space then set P to (characters 2 thru -1 of P) as text
		if last character of P is space then set P to (characters 1 thru -2 of P) as text
		if last word of P is not in W then -- we've split a word
			set P to words 1 thru -2 of P
			set AppleScript's text item delimiters to space
			set P to P as text
			set AppleScript's text item delimiters to ""
		end if -- now find out where we left off
		set k to (offset of (last word of P) in LPL) + (length of last word of P)
		set end of allDocs to P -- stick it in our list of strings
	on error -- we've hit the end of the original string
		set end of allDocs to characters (k + 1) thru -1 of LPL as text
		exit repeat
	end try
end repeat
-- publish the docs
repeat with k from 1 to count allDocs
	writeDoc(item k of allDocs, k)
end repeat
-- handler to write them to text files (could have been incorporated in the scheme above but didn't know if you wanted allDocs)
to writeDoc(someText, idx)
	set f to open for access ((path to desktop as text) & "doc[" & idx & "].txt") with write permission
	try
		write someText to f
		close access f
	on error -- make sure it's closed or you can't trash it!
		close access f
	end try
end writeDoc

Many Thanks Adam,

I was so focused on the characters that I failed to consider the words. Maybe one day I’ll be able to write such an elegant script. At least I am trying. I will implement this tomorrow

Regards Pete

Welcome, Pete. I spent about 2 hours figuring out the heart of that first script; lots of false starts always coming back to the fact that you didn’t want to break words. I should also say that if your list runs to tens of thousands of words, the script can be speeded up very significantly, in several ways, but I was afraid they would confuse the main issue for you, since I presumed you would want to modify it and I didn’t want to spend too much longer getting that to work. The speeder-upper is to enclose variables that are large oft-visited lists in properties of a script within your script so that AppleScript keeps them as references to memory instead of lists to be broken down and rebuilt. Here’s one example in the last script in this article from our tutorial archives.

Hi

4 years later I need to modify this script slightly…

I need it to save each fixed length line of text as a paragraph in one file.

like this:-

I need it to save each
fixed length line of text
as a paragraph in one
file.

I have tried but can’t work it out

set LPL to "I need it to save each fixed length line of text as a paragraph in one file"

set allDocs to {}
set W to words of LPL
set k to 1
repeat
   try
       set P to characters k thru (k + 49) of LPL as string
       if first character of P is space then set P to (characters 2 thru -1 of P) as text
       if last character of P is space then set P to (characters 1 thru -2 of P) as text
       if last word of P is not in W then -- we've split a word
           set P to words 1 thru -2 of P
           set AppleScript's text item delimiters to space
           set P to P as text
           set AppleScript's text item delimiters to ""
       end if -- now find out where we left off
       set k to (offset of (last word of P) in LPL) + (length of last word of P)
       set end of allDocs to P -- stick it in our list of strings
   on error -- we've hit the end of the original string
       set end of allDocs to characters (k + 1) thru -1 of LPL as text
       exit repeat
   end try
end repeat
--> {"I need it to save each fixed length line of text as a paragraph in one file"}

Hi,

using Adams original script.

I think this is what you want. (tested once and worked)

set LPL to "Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger John Arne Riise Harry Kewell Steven Gerrard Robbie Fowler Luis Garcia Mark Gonzalez Fabio Aurelio Xabi Alonso Peter Crouch Jermaine Pennant Craig Bellamy Dirk Kuyt Momo Sissoko Jamie Carragher Pepe Reina Paul Anderson Stephen Warnock Gabriel Paletta Boudewijn Zenden Danny Guthrie Adam Hammill Lee Peltier Craig Lindfield"

set allDocs to ""
set W to words of LPL
set k to 1
repeat
	try
		set P to characters k thru (k + 99) of LPL as string
		if first character of P is space then set P to (characters 2 thru -1 of P) as text
		if last character of P is space then set P to (characters 1 thru -2 of P) as text
		if last word of P is not in W then -- we've split a word
			set P to words 1 thru -2 of P
			set AppleScript's text item delimiters to space
			set P to P as text
			set AppleScript's text item delimiters to ""
		end if -- now find out where we left off
		set k to (offset of (last word of P) in LPL) + (length of last word of P)
		set allDocs to allDocs & P & return -- stick it in our list of strings
	on error -- we've hit the end of the original string
		set allDocs to allDocs & characters (k + 1) thru -1 of LPL as text
		exit repeat
	end try
end repeat
-- publish the docs

writeDoc(allDocs)

-- handler to write them to text files (could have been incorporated in the scheme above but didn't know if you wanted allDocs)
to writeDoc(someText)
	set f to open for access ((path to desktop as text) & "doc_.txt") with write permission
	try
		write someText to f
		close access f
	on error -- make sure it's closed or you can't trash it!
		close access f
	end try
end writeDo

I’m sure that, with three and a half years’ more experience, Adam would now use more robust techniques. :wink:

set LPL to "Jerzy Dudek Steve Finnan Sami Hyypia Daniel Agger John Arne Riise Harry Kewell Steven Gerrard Robbie Fowler Luis Garcia Mark Gonzalez Fabio Aurelio Xabi Alonso Peter Crouch Jermaine Pennant Craig Bellamy Dirk Kuyt Momo Sissoko Jamie Carragher Pepe Reina Paul Anderson Stephen Warnock Gabriel Paletta Boudewijn Zenden Danny Guthrie Adam Hammill Lee Peltier Craig Lindfield"
set maxLen to 100 -- Maximum paragraph length.

set allParas to {}
set k to 1
set textLength to (count LPL)
repeat until (k > textLength)
	set l to k + maxLen -- We'll be testing the (maxLen + 1)th character each time .
	if (l > textLength) then set l to textLength -- . unless the end of the text comes before then.
	
	set P to text k thru l of LPL
	ignoring white space
		if (P > space) then
			if (P ends with space) or (l is textLength) then
				set end of allParas to text 1 thru word -1 of P
			else
				set end of allParas to text 1 thru word -2 of P
			end if
			set k to k + (count result) + 1
		else
			set k to k + (count P)
		end if
	end ignoring
end repeat

set astid to AppleScript's text item delimiters
set AppleScript's text item delimiters to return
set allParas to allParas as text
set AppleScript's text item delimiters to astid

-- publish the docs
writeDoc(allParas)

-- handler to write them to text files (could have been incorporated in the scheme above but didn't know if you wanted allParas)
to writeDoc(someText)
	set f to (open for access file ((path to desktop as text) & "doc_.txt") with write permission)
	try
		set eof f to 0
		write someText to f
		close access f
	on error -- make sure it's closed or you can't trash it!
		close access f
	end try
end writeDoc

LOL…