remove abundant spaces in text

G’dday all,
could you help me out? Probably an easy thing to do, but I’m a newbie to this.
I’ve got a number of text files (*.txt) which I join by usage of one of the scripts I found in the forum. See below.
First: I would like to be able to give the resultFile a name, instead of the combined_file.txt.

property resultFile : "combined_file.txt"

set sourceFolder to (choose folder)
set theFiles to (list folder sourceFolder without invisibles)
set sourceFolder to (POSIX path of sourceFolder)
-- Empty the current results
do shell script "echo -n '' > " & quoted form of (sourceFolder & resultFile)
repeat with currentItem in theFiles
	if name extension of (info for (POSIX file (sourceFolder & currentItem))) is "txt" then
		if ((currentItem as string) is not equal to resultFile) then
			do shell script "cat " & ¬
				(quoted form of (sourceFolder & currentItem)) & ¬
				" >> " & (quoted form of (sourceFolder & resultFile))
		end if
	end if
end repeat



Second:
Now, in these files there are often a lot of spaces, where I only need one. (Think of someone typing in Word and using the spacebar very much).
Is it possible to add a part to the script above and delete the unnessary spaces in the resultFile?
I’m learning as I go, so help is much appreciated.

Elsa

Hi.

This will allow you to choose a name for the combined file:

set theName to text returned of (display dialog "Enter file name" default answer "") & ".txt"

set sourceFolder to (choose folder)
set theFiles to (list folder sourceFolder without invisibles)
set sourceFolder to (POSIX path of sourceFolder)
-- Empty the current results
do shell script "echo -n '' > " & quoted form of (sourceFolder & theName)
repeat with currentItem in theFiles
	if name extension of (info for (POSIX file (sourceFolder & currentItem))) is "txt" then
		if ((currentItem as string) is not equal to theName) then
			do shell script "cat " & ¬
				(quoted form of (sourceFolder & currentItem)) & ¬
				" >> " & (quoted form of (sourceFolder & theName))
		end if
	end if
end repeat


You also want to search the file for any section that has more than one space in a row and reduce that to one space?

Somebody more knowledgeable will probably help with that before I figure it out.

j

hi elsa, capJ,

did the pick file a bit differently, you might want to check it out. also, i added sed to remove the spaces:


property resultFile : {missing value}

set resultFile to text returned of (display dialog "What would you like to call the new file?" default answer "")

set sourceFolder to (choose folder)
set theFiles to (list folder sourceFolder without invisibles)
set sourceFolder to (POSIX path of sourceFolder)
-- Empty the current results
do shell script "echo -n '' > " & quoted form of (sourceFolder & resultFile)
repeat with currentItem in theFiles
	if name extension of (info for (POSIX file (sourceFolder & currentItem))) is "txt" then
		if ((currentItem as string) is not equal to resultFile) then
			do shell script "/bin/cat " & ¬
				(quoted form of (sourceFolder & currentItem)) & ¬
				" | /usr/bin/sed 's/  */ /g'" & " >> " & (quoted form of (sourceFolder & resultFile))
		end if
	end if
end repeat

if name extension of (info for (POSIX file (sourceFolder & resultFile))) is not "txt" then
	do shell script "/bin/mv " & sourceFolder & resultFile & space & sourceFolder & resultFile & ".txt"
end if

this other one will also remove blank lines:


property resultFile : {missing value}

set resultFile to text returned of (display dialog "What would you like to call the new file?" default answer "")

set sourceFolder to (choose folder)
set theFiles to (list folder sourceFolder without invisibles)
set sourceFolder to (POSIX path of sourceFolder)
-- Empty the current results
do shell script "echo -n '' > " & quoted form of (sourceFolder & resultFile)
repeat with currentItem in theFiles
	if name extension of (info for (POSIX file (sourceFolder & currentItem))) is "txt" then
		if ((currentItem as string) is not equal to resultFile) then
			do shell script "/bin/cat " & ¬
				(quoted form of (sourceFolder & currentItem)) & ¬
				" | /usr/bin/sed 's/  */ /g'" & " | /usr/bin/sed '/^$/d'" & " >> " & (quoted form of (sourceFolder & resultFile))
		end if
	end if
end repeat

if name extension of (info for (POSIX file (sourceFolder & resultFile))) is not "txt" then
	do shell script "/bin/mv " & sourceFolder & resultFile & space & sourceFolder & resultFile & ".txt"
end if

have a great weekend!

Thanks, waltr.

I think I’ll spend a portion of the weekend reading manpages and the Event Log from your script.

j

hi J,

one thing you won’t find in the Man pages is the pipe (|) command. i don’t think you know this one, but it’s essential to how i did this so here goes. all Unix command line utilities are designed to take input form “Standard Input” and output to “Standard Output” and “Standard Error”. what this means is that it’s relatively easy to chain the commands together in such a way to make a tool that can be reused again and again. for instance, when i saw in this script that the person was using ‘cat’ to stream the file, like this:

 cat /path/to/the/input/file >> /path/to/the/output/file

i could see that they were ‘redirecting’ Standard Out to a file with the ‘>>’ operator. by putting my pipe in there like this:

 cat /path/to/the/input/file | sed <regular expression/> >> /path/to/the/output/file

i just grab the stream and filter it with sed. i don’t interupt the flow of the shell script one bit (well, i do change it).

usually you will learn sed and awk together. check the man page, but really check the web–>you’ll find much more info on sed on the web. i see sed used a lot more than awk these days (and that may just be the circles i travel in) but you do see it. it’s a good thing to know.

EDITED: to add, btw–you’ll see in the second script there are 2 pipes. you can pipe and pipe and pipe all day long if you want.

Hi waltr.

Thanks for the explanation, it’s much appreciated. And since it’s well past a reasonable hour -

Good night,

j

Hi Waltr, J,
thanks very much for the swift reply!
Being able to add my own textfile name works fine, very nice, thank you!

Waltr, the second script deletes ALL spaces and I would like to have one remaining between words.
Here’s an example:
old text was: Would you like to swing on a star, carry moonbeams home in a jar
Text should be: Would you like to swing on a star, carry moonbeams home in a jar
After running the script I get: Wouldyouliketoswingonastar,carrymoonbeamshomeinajar

That’s a bit too many deletes… Can you help me there too?
You guys are wonderful! Thanks!

Elsa

www.elsruiters.nl

Hi, give this (now fixed) script a try:

set sourceFolder to choose folder with prompt "Please choose a folder containing text documents to be processed:"
set shellString to "cd " & quoted form of POSIX path of sourceFolder & "; /usr/bin/sed -E 's/  +/ /g' "

tell application "System Events" to set fileList to name of every file of sourceFolder whose visible is true and name extension is "txt"

set combinedFile to " >> " & text returned of (display dialog "Please enter a name for the combined file:" default answer "combined.txt")

repeat with thisFile in fileList
	do shell script (shellString & quoted form of thisFile & combinedFile)
end repeat

Hello Qwerty,
thanks for helping, but this doesn’t do the job: it only finds the first word of the first file and ignores (?) deletes (?) everything after that.
Elsa

elsaskippy,

I don’t know what went wrong for you, but I just ran both of waltr’s scripts and they worked perfectly.

Qwerty’s removed exta spaces from the first file, but didn’t combine files until I removed the word “return” from

return do shell script (shellString & thisFile & resultFile)

I used files created with TextEdit.

j

hi elsa,

i’m not seeing that problem with the second script. in fact, the sed command is the same for both (regarding spaces) so i’m really surprised you are seeing one work and not the other.

however, i did find a problem with the script in that if you have any spaces in the path, the mv command gives an error. this will fix that:


property resultFile : {missing value}

set resultFile to text returned of (display dialog "What would you like to call the new file?" default answer "")

set sourceFolder to (choose folder)
set theFiles to (list folder sourceFolder without invisibles)
set sourceFolder to (POSIX path of sourceFolder)
-- Empty the current results
do shell script "echo -n '' > " & quoted form of (sourceFolder & resultFile)
repeat with currentItem in theFiles
	if name extension of (info for (POSIX file (sourceFolder & currentItem))) is "txt" then
		if ((currentItem as string) is not equal to resultFile) then
			do shell script "/bin/cat " & ¬
				(quoted form of (sourceFolder & currentItem)) & ¬
				" | /usr/bin/sed 's/  */ /g'" & " | /usr/bin/sed '/^$/d'" & " >> " & (quoted form of (sourceFolder & resultFile))
		end if
	end if
end repeat

if name extension of (info for (POSIX file (sourceFolder & resultFile))) is not "txt" then
	do shell script "/bin/mv " & quoted form of (sourceFolder & resultFile) & space & quoted form of (sourceFolder & resultFile) & ".txt"
end if

if you’d like for me to take a look at the files that you are having problems with, PM me & i’ll give you my email address.–i don’t check here as often on the weekend though…

From another thread:

Could this be the problem?

hi J,

yes, i think you are correct. here is a script that creates any number of text files with elsa’s example:


property myText : "Would you    like to swing      on a   star,  carry moonbeams       home           in a jar"
try
	do shell script "/bin/mkdir ~/Desktop/sedTest/"
end try

set myVar to text returned of (display dialog "How many text files do you want to test" default answer "")

set x to 1

repeat while x ≤ myVar
	do shell script "/bin/echo " & quoted form of myText & " > ~/Desktop/sedTest/file" & x & ".txt"
	set x to x + 1
end repeat

when i create the folder and documents with this the test works perfectly. i’m going to look at the original to see if i changed something.

EDITED: i fixed the script in this posting. now it makes the spaces properly.

Hi waltr.

Your script gave me files with no extra spaces - not much of a test for your combine files script - so I added spaces manually, resaved the files, and ran script #2 from post #3 and it worked properly (again.)

j

EDIT: just saw your edit, waltr.

Thank you very much all, for helping and thinking along with me. I finally managed got Qwerty’s script to run.
It was something that J referred to: the original files came from a Word Processor other than TextEdit. Once I created 2 text files with TextEdit, and took out ‘return’ as J suggested from Qwerty’s script, I got exactly what I needed.
Now, this works at home with my test files… I’ll let you know how it goes with the files it’s intended for. (I’ll be sure to make a backup) :slight_smile:
Once again, all, thank you ever so much. I’m a happy birdy!
:smiley: Elsa

Sorry guys, bad job on my part. :frowning:
Glad to see you got it working though. :slight_smile:

I’ve changed the script to reflect the fix anyway, plus a few 'quoted form of’s that should have been there.
Thanks. :smiley:

Hello there Waltr, J, Qwerty and other helpful folks
The script Qwerty sent me yesterday works perfect with text files (thank you! :)). I had an underscore in one of the folder names and that gave me an error, but I took that out and then I had no problem left.
My question however is as follows:
here, at home, I’m running Scripeditor version 2.1.1 (81) and Applescript 1.10.7. (on a powerbook Mac OSX 10.4.7, 512 MB, 1.67 Ghz)

At work however I’ve got version N1-1.9 (on Mac OSX 10.2.8, PowerPC G4, 768 MB, 1 Ghz)

Will this cause problems? I tried to run the script at work today and got the error. At home I noticed the underscore but I don’t think I’ve got that in the folder name at work. Does the version of Applescript or OSX have any influence at all older versions?

Ta, Elsa :slight_smile:

Hey, I though I fixed that! :mad:
It should work now (with the 'quoted form of’s); I have edited my original post, so scroll up. :slight_smile:

I think sed has been with Mac OS X with 10.0 and up (only guessing). Your work computer should be able to handle it fine.

Hya Qwerty, no worries mate, it’s a good exercise for me to try and figure out those little things, especially since I have very little knowledge of scripting. Your job was just magnificent! :slight_smile:
But unfortunately… at work I get this error: (the first sentence in Dutch means System Events received an error)

System Events kreeg een fout:
NSCannotCreateScriptCommandError

I’ll repeat the Script I’m using below:

set resultFile to " >> " & text returned of (display dialog "Please enter a name for the combined file:" default answer "") & ".txt"
set sourceFolder to choose folder
tell application "System Events" to set fileList to name of every file of sourceFolder whose visible is true and name extension is "txt"

set shellString to "cd " & POSIX path of sourceFolder & "; /usr/bin/sed -E 's/ +/ /g' "
repeat with thisFile in fileList
	do shell script (shellString & thisFile & resultFile)
end repeat

It’s such a shame because it works fine on my mac at home. I’ve tried several files, and also the ones I used at home. Anyone any suggestions? The only thing that springs to mind is some code that referres to the OS and/or the Script version.
Elsa

Model: 1Ghz Power PC G4 - 768MB
AppleScript: N1-1.9
Browser: Firefox 1.5.0.4
Operating System: Mac OS X (10.2.x)