I made the following function to replace all non-alphanumeric characters with a space, but when I started using it on somewhat bigger text files (like 16K) it became obvious that it is very very slow really.
I just looks at each character and then adds it to a new string when it is ok, otherwise it replaces it with a space.
So, am I doing something completely wrong? It 's a straightforward operation, but processing times seem to grow exponentially longer with longer texts. 2k text => 15 seconds, 16k text => let’s just say I stopped waiting after 20 minutes (not exaggerating), go figure When I do this in php for instance it is done in under 1 second.
-- replace non-alphanumeric characters in source_text with a space and removes duplicate spaces -- note: returns an empty string when the entire input is reduced to 1 space on fn_normalize(source_text) -- check input try set source_text to source_text as string on error return false end try if length of source_text < 1 then return "" -- replace all characters by a space that are not plain alphanumeric set allowed_characters to "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890" -- built result text with allowed characters set result_string to "" set last_insert_is_space to true -- step trhough input characters repeat with single_character in the characters of source_text if ((offset of single_character in allowed_characters) > 0) then set result_string to result_string & single_character set last_insert_is_space to false else if last_insert_is_space is false then set result_string to result_string & space set last_insert_is_space to true end if end if end repeat -- trim end of string -- if the string is only one space (all characters were illegal) it will be truncated to an empty string if (character -1 of result_string as string) is space then set result_string to (characters 1 thru ((length of result_string) - 1) of result_string) as string end if return result_string end fn_normalize