Remove Duplicates from List

Oddly enough, this script will remove duplicates from a list.

OS version: Any

--the_list can contain any type of item
set the_list to {1, 2, 3, 3, "3", "3", "3", 4, 5, 4, 5, 4, 1, 1, 2}
set return_list to {}
repeat with an_item in the_list
	if return_list does not contain an_item then set end of return_list to (contents of an_item)
end repeat
return return_list

ASObjC version. About 300 times faster:


use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"

set aList to {"a", "a", "a", "b", 1, "a", "b", 5.6, "a", "e", "f", "a", "a", "a", "b", "b", "b", "b", "b", "b"}

set aSet to current application's NSSet's alloc()'s initWithArray:aList
set aList to (aSet's allObjects()) as list

Since the aim’s to remove duplicates from a list, which is an ordered collection, it would be better to use an NSOrderedSet:


use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"

set aList to {"a", "a", "a", "b", 1, "a", "b", 5.6, "a", "e", "f", "a", "a", "a", "b", "b", "b", "b", "b", "b"}

set aSet to current application's NSOrderedSet's orderedSetWithArray:aList
set aList to (aSet's array()) as list
  • !!! :slight_smile: The speed is same fantastic. Thanks.

NOTE: For users, which want the result list, ordered alphabetical - use my script.

In my own tests, the NSOrderedSet version not only preserves the original order of items but is also slightly faster than the NSSet one.

Somewhere between them in speed (surprisingly), but producing the same quasi-random order as NSSet, are these:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set aList to {"a", "a", "a", "b", 1, "a", "b", 5.6, "a", "e", "f", "a", "a", "a", "b", "b", "b", "b", "b", "b"}

set anArray to current application's class "NSArray"'s arrayWithArray:(aList)
set aList to (anArray's valueForKeyPath:("@distinctUnionOfObjects.self")) as list
use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

set aList to {"a", "a", "a", "b", 1, "a", "b", 5.6, "a", "e", "f", "a", "a", "a", "b", "b", "b", "b", "b", "b"}

set anArray to current application's class "NSArray"'s arrayWithObject:(aList)
set aList to (anArray's valueForKeyPath:("@distinctUnionOfArrays.self")) as list

Were you comparing your code with KniazidisR’s? If so, his alloc() might explain the speed difference. I tried it here using setWithArray: instead, and it came in a whisker faster than using NSSet.

Oh, and if anyone following along is interested, I have a new version of Script Geek and I’m looking for people to test it. Email me if you want a copy.

Yes, but without setting aList to the end results in the timing repeats, of course. :slight_smile:

Ah yes. That would explain it. Using setWithArray: here, I’m getting slightly shorter times than with either NSOrderedSet or “@distinctUnionOfObjects”. I’m still holding out for NSOrderedSet as the way to go though. :wink:

No question. It’s just nice to have an explanation for seemingly counter-intuitive results.

Given the simple source list, I wondered why KniazdisR’s script would be 300 times faster than Trash Man’s script. So, I timed them using Nigel’s test script from the following thread and only modified it to report microseconds:

https://www.macscripter.net/viewtopic.php?id=47102

For both tests, I used the source list from KniazidisR’s script and I compiled and saved each script between test runs. The results were:

Trash Man’s script: 0.05 milliseconds

KniazidisR’s script: 0.17 milliseconds

A fraction of a millisecond is not relevant in the real world, but I wondered why KniazidisR and I would receive such different results. I suspect with a lengthy and complex source list, the ASObjC version would show its merit.

FWIW, I get results closer to yours.

I tested not with microscopic list but with this:


set aList to {1, 2, 3, 3, "3", "3", "3", 4, 5, 4, 5, 4, 1, 1, 2}
Repeat 10 times
set aList to aList & aList
end repeat

No need to mislead the user…

Trash Man’s script iterates through the original list, checking each item against those in a growing result list. So the time taken largely depends on the number of items in the collection and the number of those which are unique — besides all the other usual suspects such as processor speed, OS version, and whatever else the machine happens to be doing in the background.

KniazidisR’s idea uses a low-level system method to collapse the collection practically instantaneously. The larger the collection, the greater the speed advantage of this over an AS repeat.

Strictly speaking, removing anything from a list is impossible. All the scripts discussed here return different lists containing only unique values. The values in Trash Man’s results are at least instances from the original lists and the removals are done here ignoring case. Case can easily be considered if required. The ‘set’ scripts are strictly case-sensitive, so their results need to be filtered further if case needs to be “ignored”. Trash Man’s script and my NSOrderedSet one preserve the values’ order of first appearance.

I think “Trash Man” is a generic term for MacScripter posters whose details have been mislaid by the site’s database. Hence his ability to post a script here two years three months and fourteen days before registering! :wink:

KniazidisR. Thanks for explaining that. The source list in your original script contains 15 items while the source list you used for testing contained 15,360 items, which clearly is the reason for our different timing results.

Shane. I look forward to your revised Script Geek, which I use often. The only time I don’t use it is when I need to include some code outside the timing loop (such as the code in post 11 above).