Removing trailing spaces from a text string

Ok, I know this is an easy task, but I also know there are many ways to do this.

I download a 30MB text file every week. It is a comma seperated file with several fields to import into a database of your choice. The problem is the fields are really fixed lengh and have spaces at the end of of the field to make them fit into their fixed length.

I have a script that will remove these already and want to know if there is a better way of doing it. My solution was pretty basic. I have a loop that starts at the end of the string and moves backwards until I find a character that is not a space, flag that position and set the string to character position 1 to x.

Thank you for viewing.

set txtleng to the length of txtField
repeat with i from txtleng to 1 by -1
	if (character (i) of txtField) is not equal to " " then exit repeat
end repeat
if i is equal to 1 then
	set txtField to ""
else
	copy (text (1) thru (i) of txtField) to txtField
end if

Give this a try:

set txtField to "a few words             "
set text item delimiters to {space}
repeat with i from (count of text items of txtField) to 1 by -1
	if text item i of txtField is not "" then
		tell text items 1 thru i of txtField to set txtField to beginning & ({""} & rest)
		exit repeat
	else if i is 1 then
		set txtField to ""
	end if
end repeat
set text item delimiters to {""}
txtField

In certain circumstances you can trim out spaces by just asking for the words. You have to be super careful when using this because, for example, “@” is considered a word, so if you use this on any email addresses they’ll be broken by a space before and after the “@”. So yeah, this may not really be a better way for you, but FWIW:

set txtField to "a few words             "
set {old_delim, text item delimiters} to {text item delimiters, space}
set {txtField, text item delimiters} to {words of txtField as string, old_delim}
txtField

Here’s a slight modification that gets around the “@” business. In this case I’m just looking for three consecutive spaces as evidence of the logical end-of-line.

set txtField to "foo@bar.com            "
set {old_delim, text item delimiters} to {text item delimiters, space & space & space}
set {txtField, text item delimiters} to {first text item of txtField, old_delim}
txtField

Sorry Qwerty, this was not faster. I took a text string containing ten words with ten trailing spaces, and ran the string through each solution using a loop 100,000 times. Yours took 140 seconds and mine took 122 seconds to run. I did run both scripts multiple times. (using a PowerBook 12" 1.5GHz w/1.25GB RAM) (same test on a dual 2.7GHz G5 w/4.5GB RAM was 55 and 46 seconds respectfully)

:smiley: Wow silvermeteors, that was faster. Yours got done in 64 seconds (23 seconds on the G5). This one is great because the one line command eliminates my need to call a handler for each field.

Well done silvermeteors! I thought of words, didn’t bother to try because of the “@” reason. That second one was so obvious! Nice one.
Anyway, can you see any differences in speed in using space & space & space or " " ?

The space & space & space as-is did not return the correct result. I had to change the second set command from “first text item of” back to "words of " to get the correct result.

After the change the script took the exact same time to execute. But it got a 6 using the original version (2 on the G5), but it only returned the first word in the string. :slight_smile:

I’m not too concerned with the @ thing

set txtField to "a few words             "
set {old_delim, text item delimiters} to {text item delimiters, space}
set {txtField, text item delimiters} to {words of txtField as string, old_delim}
txtField

Ok, strange things happening here.

I am using this above sample in my script to remove the trailing spaces off text strings, I run it once a week on a large 24MB text file. The strange thing is it worked for weeks and then all of a sudden it is removing ALL the spaces. The first time I set several tests within the script to debug it to try to figure what was going wrong. All of a sudden the script started working. Ran the script, saved it, and quit for next time.

The next week same thing happened. Screwed around with the script, then it started working without really changing anything. Saved and quit. This week, same thing…

Anyone have an idea WTH is going on here?

Hi, macman_al.

I don’t know why all the spaces are being removed. Your excerpt works OK here. One possibility is that the ‘space’ constant is being set to an empty string earlier in the script. AppleScript constants aren’t always as “constant” as one would like! If you’ve used the term as a variable name too, that would explain things.

The method you’re using takes everything that AppleScript considers to be ‘words’ in the local language and interpolates spaces (or not) between them. Another approach, for your purposes, might be:

set txtField to "a few words       "
set txtField to text 1 thru word -1 of txtField

This simply returns everything up to and including the last “word” as is ” same text, same class of text. As with the method you’re using now, though, it loses any punctuation after the last “word”.

set txtField to "a few words "
set txtField to text 1 thru word -1 of txtField

Thanks Nigel. I am not using space as a variable name in the script.

I tried the above script you suggested, it errors when it encounters a string when it contains only one real character. Like "- "

I added a line at the top of the code to set space to " ". That didn’t do it either. I changed the space constant to " " and the script started working again. But that is this week, hopefully it will work next week. Thanks for the help.

I know that this is off topic, but if your file has unix line endings, sed could chop off those trailing spaces and I suspect do it really fast.

The sed command:

sed ‘s/ +$//g’

removes trailing spaces, and if your downloaded file was called ‘input.txt’ you could do in the Terminal:

sed ‘s/ +$//g’ < input.txt > output.txt

or you could do:

do shell script “sed ‘s/ +$//g’ < input.txt > output.txt”

If your file does not have unix line endings, that is an easy fix as well.

Andy

Browser: Safari 412
Operating System: Mac OS X (10.4)