Remove Duplicates from a List?

Very helpful.

In case anyone comes across the same problem I was having…

I have lists, where each entry of the list is a record:

set tabGoods to {{|style|:"g185", colorCode:"51", colorName:"black"}, {|style|:"g185", colorCode:"51", colorName:"black"}}

And these functions to remove dulicates did NOT work on it. I don’t know why, but they spit the list back out again with the duplicates still in place.

I’m not sure why these don’t work and what I ended up writing does work, but just in case this is useful to anyone, this removes duplicates even when the list items are records:

on removeDuplicateRecords(inputList)
	set itemCount to count of items in inputList
	set outputList to {}
	repeat with anItem from 1 to itemCount
		set firstListItem to item anItem of inputList
		set occurrenceCount to 0
		repeat with anotherItem from 1 to count of items in outputList
			set secondListItem to item anotherItem of outputList
			if firstListItem is secondListItem then set occurrenceCount to occurrenceCount + 1
		end repeat
		if occurrenceCount = 0 then copy firstListItem to end of outputList
	end repeat
	
	return outputList
end removeDuplicateRecords

This could be painfully slow for large sets of records, I really don’t know. My lists have at most maybe 10-20 records, so it’s not significant. The longest lists I ran it on, it took 127 milliseconds, so it’s not stressing me out, :slight_smile: but I’m guessing from that time that it would not scale well to thousands… but at least it works for records.

  • t.spoon

Hi t.spoon.

The problem with the original script (apart from the fact that it no longer opens correctly in Script Editor!) is this line:

if x is in foo's okAddresses then

It tends to get written this way because it works with simple objects like strings and numbers. But the correct formulation when using ‘is in’ or ‘contains’ with a list of items is:

if {x} is in foo's okAddresses then

Notice the braces round ‘x’. The reason for them is that the code’s notionally looking for a section of the list, not, as we think of it, for an item in the list.

set tabGoods to {{|style|:"g185", colorCode:"51", colorName:"black"}, {|style|:"g185", colorCode:"51", colorName:"black"}}

tabGoods contains {|style|:"g185", colorCode:"51", colorName:"black"}
--> false

tabGoods contains {{|style|:"g185", colorCode:"51", colorName:"black"}}
--> true

tabGoods contains {{|style|:"g185", colorCode:"51", colorName:"black"}, {|style|:"g185", colorCode:"51", colorName:"black"}}
--> true

The same’s notionally true with text:

"Hell" is in "Hello"
--> true ” not because "Hello"'s a container containing "Hell", but because "Hell"'s a subsection of it.

I think the reason you can get away without the braces when checking for a text or number in the list is that in AppleScript, a single item is automatically coercible to a list containing that item, so we don’t have to think about it. But when the item’s already a list or a record, the coercion to list takes on a different meaning. In these cases, we have to be explicit with the braces. But using braces is actually correct in any case.

Hope this makes sense. :slight_smile:

Edit: Yes. I was right about items being coerced to lists. Here’s a short demo:

set tabGoods to {"g185", "51", "black", "g185", "51", "black"} -- Now a list of texts.

"g185" is in tabGoods
--> true, because "g185" is automatically coerced to {"g185"} (a text to list coercion) for the check.

{|style|:"g185", colorCode:"51", colorName:"black"} is in tabGoods
--> true, because the record is coerced to {"g185", "51", "black"} (a record to list coercion) for the check.
--> This is just to demonstrate that the coercion takes place. A record to list coercion should never be relied upon to produce a list with items in a particular order.

Hello Nigel.

That was a brilliant explanation.

It made a lot of sense to me.

Thanks

Some extra to Nigel’s explanation:


{2, 3} is in {1, 2, 3, 4, 5} --> true
{2, 4} is in {1, 2, 3, 4, 5} --> false

The reason why the first line will return true and the second false is that the first line is a subset of the list and the second line not, even when all values matches.

I didn’t know if records will be coerced into lists before comparing but I did know that records only compare values. A presumable reason behind this is that a scripting addition for instance can mess up the comparison (read: user defined key turn into a enumerated key). Technically there is a difference between a record containing user defined keys and enumerated keys. A record with user defined keys is actually a record containing one key (usrf) and a list as it’s value containing all the keys and values. The odd indexes are key values as normal AppleScript strings followed by their values. A record with enumerated keys are not. So when compared it’s better to only compare their values with their associated indexes. Which results in the same behavior as coercing into list first before comparing.

To make it it better understandable. A list as in the example of Nigel is actually stored as:

Then when a scripting addition is installed or other script library loaded into global scope and have colorCode and colorName enumerated respectively into ccod and cnam code, the list would look like:

Both lists would be presented the same way, except for some syntax highlighting, in script editor. If the records would be compared including their keys they would not match. But when only values are compared they will.

[offtopic]This is also why it’s important to use pipes around keys in records when using AppleScriptObjC, so you don’t send an enumerated key by accident[/offtopic]

I didn’t know that either, I did know hower that I could coerce a record to a list, on a one by one basis, what I didn’t know, or didn’t think of, was that I could coerce it with {} so I could use an “is in” expresson. :wink:

Great indepth on lists! :slight_smile:

The same principle applies if you’re obliged to concatenate something to a list:

set aList to {"a", "b", "c"}

aList & {|style|:"g185", colorCode:"51", colorName:"black"}
--> {"a", "b", "c", "g185", "51", "black"}

aList & {{|style|:"g185", colorCode:"51", colorName:"black"}}
--> {"a", "b", "c", {|style|:"g185", colorCode:"51", colorName:"black"}}

aList & {1, 2, 3}
--> {"a", "b", "c", 1, 2, 3}

aList & {{1, 2, 3}}
--> {"a", "b", "c", {1, 2, 3}}

Hello.

The concatenation examples were interesting, there we go again, with the record. The list example, is somewhere I have been. :slight_smile:

It is the "list compatible thing in order to search for elements, and records, (especially records), that has been an “aha” experience for me, but then again, looking at the difference, between a list of characters and strings, it is quite natural, that one object must be of the same form, as the object you want to check for containement of it.

set m to {{1, 2, 3, 4}, {5, 6, 7, 8}}

log ({1, 2, 3, 4} is in m) as text
-- false
log ({{1, 2, 3, 4}} is in m) as text
-- true 
-- and this one, so this is a little bit smarter than text item delimiters after all, you can't overstep "item boundaries"
log ({{3, 4, 5, 6}} is in m) as text
-- false

I see a lot of uses for this. Thanks a lot. :slight_smile:

I

Indeed. So consider this script:

set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}
x as list
--> {"zz", "a", "OK", "answer", "z", "aa"}

It seems that, somehow, the order that the items have been added is preserved in the order of the resulting list. Do you have any idea how? If I add:

set the clipboard to x

I see:

‘Jons’'pClp’{ ‘----’:{ ‘bhit’:‘utxt’(“OK”), ‘ttxt’:‘utxt’(“answer”), ‘usrf’:[ ‘utxt’(“zz”), ‘utxt’(“zz”), ‘utxt’(“a”), ‘utxt’(“a”), ‘utxt’(“z”), ‘utxt’(“z”), ‘utxt’(“aa”), ‘utxt’(“aa”) ] }, &‘subj’:null(), &‘csig’:65536 }

which is what I’d expect, but doesn’t explain the placement of the non-user items in the final list.

Although more relevant to understanding of the use of ‘is in’ and ‘contains’ with records and lists:

set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}
x contains {button returned:"OK", aa:"aa", z:"z", zz:"zz"}
--> true

Yes, and that makes sense if you assume there’s no order to record items. But the previous scripts suggest there is, at some level.

AppleScript is nothing if not entertaining…

There is some magic going on there. But the reason behind that is that a record can only contain enumerated keys and not user defined keys. Also another aspect is that those keys are only allowed once in the list, you can’t have the same enumerated key twice in a record. The user defined keys are therefore collected into one list and filled under a single keyword named ‘usrf’.

That makes that a record is different in presentation (AppleScript) and actual data (AppleEvent aka AERecord). It must be the AppleScript layer of the record keeping track of the order of the items while there is no such thing in the AppleEvent tier. I can confirm while I wrote scripting additions, as you have experienced yourself probably, the order of the AppleScript record isn’t always the same as the order of an AppleEvent record that comes in. That there is a difference was the only logical explanation I had back then, and still applies to this weird behavior.

To confirm my point I thought of running a mixed record through the AppleEvent manager and return the data back and see what happens. I have found that the items were rearranged inside the record, so once a mixed record leaves the AppleScript world it’s order is lost. What if we do the same with your script:

set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}

script scriptX
	on run argv
		return item 1 of argv
	end run
end script

run script scriptX with parameters {x} --> an re-arranged record

As you see the order of which you have set the items are lost and rearranged.

This is the closest answer I can get to to your question “Do you have any idea?”. I have no idea what really happens in code, but based on AppleEvent’s transparency and testing AppleScript code it is clear than an AppleScript record is not (entirely) the same as an AppleEvent record. An extra irreversible coercion/transition is made when an AppleScript records will enter the world of AppleEvents.

Is it a bug or poor implementation? No, AppleScript and AppleEvents are both clear that order of values in a record is not a guarantee. As with normal hash tables and associative arrays in other programming languages, the index is not important including the order in which the items are stored.

edit— shane showed i’m wrong about a conclusion… maybe it’s time to go to bed

That was my conclusion, too – something at a lower level.

But Nigel’s later example suggests that order is ignored when you use “is in” with records.

You were fast :)… I found out that is in is still safe. So the order of the items is not important when comparing records, but it is when comparing list containing values. The comparison itself is more than just a record to list coercion:

set x to display dialog "display" default answer "answer"
set x to {zz:"zz", a:"a"} & x & {z:"z", aa:"aa"}

script scriptX
	on run argv
		return item 1 of argv
	end run
end script

set results to run script scriptX with parameters {x} --> an re-arranged record
({text returned:"answer", z:"z", zz:"zz"} as list) is in (results as list) --false
{text returned:"answer", z:"z", zz:"zz"} is in results --true

Hello.

So a record in AppleScript, is really at least, a simulated set of values, if not a set of values internally.

A set is a an unordered collection of elements, where there are no count of any similiar elements, a constraint of uniqueness of elements, is also possible, and then it is called a set of unique values.

Attributes and properties, of objects and records, often work like this: the last attribute/property of a kind, is the one that are used.

It is a good thing that AppleScript treats the record as a record, and not a list, and arranges the attributes of the record in some order to make the comparision easier of them when there is a test for likeness/containment. :slight_smile:

Wow, that opened a can of worms I wasn’t expecting.

Thanks everyone, I’ve got a better understanding now, and using {braces} properly for my subroutine results in a faster and more elegant subroutine than my nested repeats.

Oddly, I now realize that many years ago I came across this problem and ended up, by checking my variable values, eventually figuring out to put something in a (seemingly) “extra” set of braces for a comparison, but never dug in to really figure out what was going on and just ran with it. That was probably 10+ years ago and I’d forgotten entirely until I read this thread.

I was checking my variable values along the way this time and saw that the duplicates weren’t being caught because one value was remaining a record while the other was being coerced to a list, but I couldn’t figure out why it was doing that, and it didn’t occur to me that I could fix it with a simple set of braces. I just figured if I forced Applescript to obtain both items in an identical manner, I would either avoid the coercion, or force an identical coercion, so I wrote it that way, and got the desired result.

What an odd language I write in. I completely understand the thought behind dynamic variable types to make things simpler to the user, and they’re great when they work as expected… but I can’t believe how often my “bug” is an unexpected variable type, and then it’s hidden from me… like having the user choose from a list of numbers, and it returns the resulting number as text… My scripts are full of

(((PathToFolder as text) & "/document name") as POSIX file)

and

if (userChoice as number) is someNumber

and, here’s a good one,

set the keywords of info to {tabChoice, (pathData as list as string), "true"}

. Note I’m forcing the value “true” as text, which is a long story. Sometimes making things simple makes them much more difficult.

There are probably better way to do all these things, I usually just do the first thing I find that works. Which I know can come back to bite me when I lack a deeper understanding of why it worked, because that means I also don’t understand when it’s not going to work. So thanks again for the deeper understanding on this one.

  • t.spoon.

Why

  1. do you have an inner script (foo)?
  2. is there no return clause?
  3. the line “foo’s okAddresses”? What does it do? Purpose?

I don’t think @julifos is still around. So, without meaning to step on any toes, I’ll briefly answer to prevent leaving you stranded:

The inner script object (foo) is an optimisation technique that leverages a quirk in AppleScript’s handling of list objects, for reasons pertaining to:

  • Speed: Accessing lists through a script object is significantly faster than conventional methods.
  • Structure: The script object declares two properties:
    • foo2: A direct reference to the input list l
    • okAddresses: An empty list that will be populated with items from l
  • Efficiency: Both reading from foo2 and writing to okAddresses can be done directly in memory, without the need to copy the data to and from the buffer, and without the need to evaluate any other items contained within the lists.
  • Implementation: This approach is particularly useful when dealing with large lists.

AppleScript does have a return mechanism, but it’s often implicit. Here’s how it works:

  • By default, handlers and scripts return the result of their final executed command.
  • An explicit return statement can be used to: a) Terminate execution early; and, optionally: b) Specify a specific value to return
  • Omitting return means all commands execute sequentially, with the last one’s result being returned.

The return clause isn’t always necessary because of this implicit behaviour, although it’s useful for control over script flow and can aid readability/clarity.

okAddresses is one of the properties in the foo script object. It points to the list that is constructed by the main body of the handler by populating it with items from the original list, l, such that no two items in okAddresses will have the same value.

The line serves as the handler’s implicit return statement. It’s the final command that gets executed, so its result is returned by the handler. It is functionally identical to:

return foo's okAddresses
1 Like

When one assigns an array (your list of addresses) to an NSSet all duplicates are automatically purged eliminating any repeat loop overhead. If there were any list entries that were empty, the following code would also remove them. Then, array y is produced from the NSSet and cast back as a list.

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property ca : *current application*

set y to ca's NSMutableArray's array()
set x to {"a@a.e", "b@b.c", "d@a.h", "a@a.e", ""}
set uniqSet to ca's NSSet's alloc()'s initWithArray:x
y's addObjectsFromArray:(uniqSet's allObjects())
y's removeObject:""

return y as list
2 Likes

Hi.

The order of the first instance of each item in the original list can be preserved by using an NSOrderedSet — or an NSMutableOrderedSet if you then want to remove further items:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use framework "Foundation"
use scripting additions

property ca : current application

set x to {"a@a.e", "b@b.c", "d@a.h", "a@a.e", ""}
set uniqSet to ca's NSMutableOrderedSet's orderedSetWithArray:x
uniqSet's removeObject:""
set y to uniqSet's array()

return y as list

Thanks, Nigel.

An NSorderedSet, or in your corrected example, the NSMutableOrderedSet would be the right solution to efficiently retain the order of a mailing list without duplication.