finding duplicate items in a list

i looked on the faq, couldn’t find anything… i need some code that will take:

abccdefgghijk

and return

{“c”, “g”}

anybody know? thanks.

Well, while Nigel comes with a better algorithm, here is what I use:

set theList to {{1, 2, 3}, {a:1}, «data utxt0032», 0, 4, 5, 6, 9, 8, 7, 4, {1, 2, 3}, {a:1}, «data utxt0032»}

findDuplicates(theList) --> {4, {1, 2, 3}, {a:1}, «data utxt0032»}

to findDuplicates(x)
	script a
		property theList : x
		property originals : {}
		property duplicates : {}
	end script
	
	repeat with i from 1 to count a's theList
		set theItem to a's theList's item i
		
		if {theItem} is not in a's originals then
			set a's originals's end to theItem
		else
			if {theItem} is not in a's duplicates then set a's duplicates's end to theItem
		end if
	end repeat
	a's duplicates
end findDuplicates

[quote=“jj”]
Well, while Nigel comes with a better algorithm, here is what I use:



;-)

Thanks for the flattery (I think!), jj. Your algorithm's probably the best that's possible and your implementation of it is eminently sensible.

If you're prepared to get a little silly with it - as a Sunday afternoon diversion - it's possible to speed it up very slightly with a couple of pieces of arcane knowledge....

1. It takes slightly longer to test that something's [i]not[/i] so than to test that it [i]is[/i] so. (Presumably, the 'is so' test is done anyway and the result is then 'notted'.) This is normally (as here) hardly significant, but if the test is repeated thousands of times, say, it's worth considering making it positive:


if {theItem} is in a’s originals
– Do that.
else
– Do this.
end if

– Rather than:
if {theItem} is not in a’s originals
– Do this.
else
– Do that.
end if



Even when there's no alternative action to perform:  :-)


if {theItem} is in a’s duplicates then
else
– Do this.
end if

– Rather than:
if {theItem} is not in a’s duplicates then do this



2. References to lists are in fact [i]slightly slower[/i] with the 'is in' or 'contains' commands. For the optimum speed effect, you'd need to reference the lists via your script object properties when iterating though them or setting their ends, but access them via unreferenced variables for 'is in'.


set theList to {{1, 2, 3}, {a:1}, «data utxt0032», 0, 4, 5, 6, 9, 8, 7, 4, {1, 2, 3}, {a:1}, «data utxt0032»}
findDuplicates(theList)

to findDuplicates(x)
– Ordinary local variables.
set origs to {}
set dups to {}

script a
– The same lists, but assigned to this script object’s properties too for referencing.
property theList : x
property originals : origs
property duplicates : dups
end script

repeat with i from 1 to count x
set theItem to a’s theList’s item i

if {theItem} is in origs then
  if {theItem} is in dups then
  else
    set a's duplicates's end to theItem
  end if
else
  set a's originals's end to theItem
end if

end repeat
dups
end findDuplicates

So, being the un-referenced local variable directly linked to the script object’s analogous property, both lists are being automatically updated; then, you can access the local list for the “is in” operation ('cause it’s faster), and append data to the script object’s properties ('cause it’s a bit-bit-bit faster).
Then, I earn 76 ticks in 10000 iterations over the sample above. :smiley:
I know you’ve been a bunch of years learning to optimize speed in AS tasks, along with other AS-Lords, such as Arthur Knapp or Kai Edwards (obviously, you didn’t found these tips in a weekly magazine). :rolleyes:
Why don’t you write a little article with some rules on the speed topic? Perhaps we could publish it in unScripted (if not soooooo long) or keep it here as a sticky note or create a PDF and host it here for public access… And this would help others!