Comparison of methods of removing leading and trailing blanks

I noticed a few threads in the past year presenting various methods of removing leading and trailing blanks from strings. I developed a method involving a recursive subroutine call and compared it with two other methods previously described. The three methods were timed with repeat loops removing leading and trailing spaces from the string " ABCDE " 10,000 times. The methods and results are as follows:


-- Method 1:  Repeat loops (posted by "kai")  ->  time for 10,000 reps = 14 seconds

set the_text to "         ABCDE         "

repeat while the_text starts with space
	set the_text to the_text's text 2 thru -1
end repeat
repeat while the_text ends with space
	set the_text to the_text's text 1 thru -2
end repeat

get the_text -- "ABCDE"

------------------------------------------------------------------------------------

-- Method 2:  Recursive subroutine (my method)  ->  time for 10,000 reps = 22 seconds 

set the_text to "         ABCDE         "

remove_spaces(the_text)

on remove_spaces(the_text)
	if the_text starts with space then set the_text to remove_spaces(the_text's text 2 thru -1)
	if the_text ends with space then set the_text to remove_spaces(the_text's text 1 thru -2)
	return the_text
end remove_spaces

get the_text -- "ABCDE"

------------------------------------------------------------------------------------

-- Method 3:  sed + do shell script  ->  time for 10,000 reps = 710 seconds !!

set the_text to "         ABCDE         "

do shell script "echo '" & the_text & "' | sed -E 's/^ +| +$//g'"

get the_text -- "ABCDE"


The “repeat loop” method executed the fastest, the “recursive subroutine” method was close behind, but the “sed+do shell script” method was much much slower. The latter was so slow presumably because of the baggage involved in “do shell script”'ing, since the same “sed” command run directly in the Terminal took only 109 seconds (using a “for” loop with 10,000 reps.) Still, “sed” was substantially and surprisingly slower than the two Applescript methods. It seems that everything “Unix” is not always the best. Here, here for Applescript! I hope this information is helpful to anyone looking to remove leading and trailing spaces.

See also: trim() [Remove spaces]


-- Method 0.5:  Repeat loops with 'considering case' and trap for empty and space-only strings.

set the_text to "     ABCDE     "

try
	considering case
		repeat while the_text starts with space
			set the_text to the_text's text 2 thru -1
		end repeat
		repeat while the_text ends with space
			set the_text to the_text's text 1 thru -2
		end repeat
	end considering
	
	get the_text -- "ABCDE"
on error
	get ""
end try

------------------------------------------------------------------------------------
 
-- Method 1.5:  Recursive subroutine with 'considering case', fewer recursions, and trap for space-only strings.

set the_text to "     ABCDE     "

set the_text to remove_spaces(the_text) -- "ABCDE"

on remove_spaces(the_text)
	try
		considering case
			if the_text starts with space then
				if the_text ends with space then
					set the_text to remove_spaces(the_text's text 2 thru -2)
				else
					set the_text to remove_spaces(the_text's text 2 thru -1)
				end if
			else if the_text ends with space then
				set the_text to remove_spaces(the_text's text 1 thru -2)
			end if
		end considering
		
		return the_text
	on error
		return ""
	end try
end remove_spaces

:slight_smile:

Hi,

I was working on this until I did a clean install of os for the New Year.

The idea was to use text item delimiters to replace (return & space) with (return) for leading spaces, until there are no (return & space) left. Then replace (space & return) with (return) for trailing spaces. Lastly, check for leading and trailing space at beginning and end of document, for the case where there is no return at the beginning and end.

Can’t remember if I finished and tested it. Come to think of it, I think I did finish and it was very quick.

gl,

Kai’s is still faster:


set t to "     hello     "
set t1 to the ticks
repeat 10000 times
	set s to return & space
	set r to return
	set temp_s to return & t & return
	set utid to AppleScript's text item delimiters
	repeat 2 times
		repeat while temp_s contains s
			set AppleScript's text item delimiters to {s}
			set temp_l to text items of temp_s
			set AppleScript's text item delimiters to {r}
			set temp_s to temp_l as string
		end repeat
		set s to space & return
	end repeat
	set AppleScript's text item delimiters to utid
	set temp_s to text 2 thru -2 of temp_s
end repeat
set t2 to the ticks
display dialog (t2 - t1)
temp_s

→ about 17 seconds

gl,

Thank you, Bruce, Nigel, and kel for your refinements and alternatives. It’s satisfying that there is such a richness in the Applescript environment that one can accomplish the same task with so many different approaches, ranging from Unix to recursion to text item delimiters to simple repeat loops, etc (with the simple repeat loops as the speed champion!) It still strikes me that “sed” is so much slower than Applescript-based solutions.