Removing duplicate characters from a string

It’s certainly fun and interesting to try out different methods and test them for efficiency. In real life though, if you have a situation where you need a string containing just one instance of each character, the result you want is likely to be fairly short and the version suspected of containing duplicates is unlikely to be much longer. The least unlikely pathological situation I can think of is if you need to know what characters are used in a long text. In this case, a list is likely to be a more useful result and, unless you need to know the order in which the characters first appear, the order doesn’t matter. Another consideration in this situation is that you may also need to know how many times each character appears in the source string.

use framework "Foundation"

set sampleString to "3Xwww3✔✓°¦¦✓wWWΦ3X"
set desiredInfo to my charactersUsedIn:sampleString withCounts:false -- Change to true to include the counts.

(* Return a list containing either:
	unique instances the characters used in txt, in the order of their first appearances, or:
	records containing these instances and the number of times they're used, ditto.
 *)
on charactersUsedIn:txt withCounts:counting
	set allChrs to txt's characters
	set output to ((current application's NSOrderedSet's orderedSetWithArray:(allChrs))'s array()) as list
	if (counting) then
		set countedSet to current application's NSCountedSet's alloc()'s initWithArray:(allChrs)
		repeat with thisChr in output
			set thisChr's contents to {chr:(thisChr's contents), |count|:(countedSet's countForObject:(thisChr))}
		end repeat
	end if
	
	return output
end charactersUsedIn:withCounts:
1 Like