Sort & de-duplicate 1st array to find 2nd array of deletion duplicates

I’m hoping someone can offer some ASOC examples of how I could go about ordering an NSArray that contains multiple key/value pairs, in which one of these key/value pairs contains duplicates.

I’m visualizing this scenario as a ‘table’; 2 columns: {theID, theLocation}
I’d like to de-dupe rows based on {theLocation} but ensuring all values are in sort order first. If I had to build 2 arrays in this process, the first to retain the first alpha-numeric {theID} and it’s corresponding {theLocation}, and the second to include the remaining {theID, theLocation} which essentially should be the duplicates ready for deletion.

Not from a lack of trying with NSMutableOrderedSet, I’m not finding many ASOC examples on the net which helps me understand the completed code in ASOC when de-duplicating ‘tables’ and referencing the item properties/elements within for the remaining deletion set.

Here’s what I have to set things up using a Music.app/iTunes.app as an example…

use framework "Foundation"

tell application "Music"
	set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track of library playlist 1)
	set theLocation to current application's NSArray's arrayWithArray:(get location of every file track of library playlist 1)
end tell

set myArrayWithDuplicateLocation to current application's NSMutableArray's alloc's init()
repeat with i from 1 to count of theID
	set myData to {theID:theID's objectAtIndex:(i - 1), theLocation:theLocation's objectAtIndex:(i - 1)}
	(myArrayWithDuplicateLocation's addObject:myData)
end repeat

Results from myArrayWithDuplicateLocation:

(NSArray) {
{
theID:“EAC86C7395B85A98”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/Kings%20Of%20Tomorrow%20-%20Finally%20(Danny%20Krivit,%20Steve%20Travolta%20Re-edit).aif
},
{
theID:“7863B30CD98CE67F”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/Kings%20Of%20Tomorrow%20-%20Finally%20(Danny%20Krivit,%20Steve%20Travolta%20Re-edit).mp3
},
{
theID:“3711FDC9DAB25514”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/Kings%20Of%20Tomorrow%20-%20Finally%20(Danny%20Krivit,%20Steve%20Travolta%20Re-edit).aif
},
{
theID:“6BED2D9585D0FE7E”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/01%20Show%20Me%20Love.mp3
},
{
theID:“E42CCAFA2A7CE4A2”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/01%20Show%20Me%20Love.aif
},
{
theID:“071EE671801F822F”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/1-02%20-%20Sho-Nuff%20-%20It’s%20Alright.mp3
},
{
theID:“DCC4A1D20E443786”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/1-01%20-%20Sho-Nuff%20-%20Tonite.mp3
},
{
theID:“A42D90507D48062F”,
theLocation:(NSURL) file:///Users/USER/Music/TEST/1-01%20-%20Sho-Nuff%20-%20Tonite.mp3
}
}

t’s not clear to me what you mean by “de-duplicating”. What are you trying to achieve?

Apologies, I would be clearer if I could explain this better with a table example…

in the result of myArrayWithDuplicateLocation you will see library items which have duplicate locations, so this means we have a double up of {theID} for the same {theLocation} in the following 2 scenarios…

EXAMPLE 1: theLocation:(NSURL) file:///Users/USER/Music/TEST/Kings%20Of%20Tomorrow%20-%20Finally%20(Danny%20Krivit,%20Steve%20Travolta%20Re-edit).aif
has 2 id’s: EAC86C7395B85A98 & 3711FDC9DAB25514… One of these id’s needs to be deleted.

EXAMPLE 2: theLocation:(NSURL) file:///Users/USER/Music/TEST/1-01%20-%20Sho-Nuff%20-%20Tonite.mp3
has 2 id’s: DCC4A1D20E443786 & A42D90507D48062F, One of these id’s needs to be deleted.

Although I don’t know at this stage which ID that shares the same location was created first, I would like to know how to order the priority of ID In alpha-numeric values as to retain the first in the library and delete the further ID’s that share the same duplicate location that references the same file.

If the IDs were in creation order, you could use:

set theDict to current application's NSDictionary's dictionaryWithObjects:theID forKeys:theLocation
set theID to theDict's allObjects()
set theLocation to theDict's allKeys()

If you wanted to use the IDs based on sort order, you could do something like this:

set theID to current application's NSArray's arrayWithArray:{"A", "B", "C", "D", "E"}
set theLocation to current application's NSArray's arrayWithArray:{2, 1, 1, 3, 2}
set theDict to current application's NSDictionary's dictionaryWithObjects:theLocation forKeys:theID
set theID to (theDict's keysSortedByValueUsingSelector:"compare:")'s reverseObjectEnumerator()'s allObjects()
set theLocation to (theDict's allObjects()'s sortedArrayUsingSelector:"compare:")'s reverseObjectEnumerator()'s allObjects()
set theDict to current application's NSDictionary's dictionaryWithObjects:theID forKeys:theLocation
set theID to theDict's allObjects()
set theLocation to theDict's allKeys()

This is a great example of how to leverage NSDictionary’s unique keys & ignoring values… a now blindingly obvious process de-duplicating/or/efficiently removing the corresponding values associated with each duplicated key.

Previously I had been using theID as key and attempting to de-duplicate the theLocation as value in an array. Now I see that it was as simple as swapping this around to theLocation as key, and theID as value in NSDictionary. :rolleyes:

I’ll be coming back to theID based on sort order, I haven’t seen too many examples of keysSortedByValueUsingSelector so I’ll definitely be using this later. Thank you.

For now, this is what I have in the most optimised way I know how at this stage, using NSSet/NSMutableSet to efficiently find the delta between the 2 sets, therefore providing a list of ID’s to be deleted. If anyone has any optimization tips, please let me know.

use framework "Foundation"
tell application "Music"
	set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track of library playlist 1)
	set theLocation to current application's NSArray's arrayWithArray:(get location of every file track of library playlist 1)
end tell

set theDict to current application's NSDictionary's dictionaryWithObjects:theID forKeys:theLocation
set myUniqueLocationID to current application's NSSet's setWithArray:(theDict's allObjects())
set myDuplicateLocationID to current application's NSMutableSet's setWithArray:theID
myDuplicateLocationID's minusSet:myUniqueLocationID --the delta, remaining ID's for deletion
set myDuplicateLocationID to myDuplicateLocationID's allObjects() as list

So as I familiarize myself with NSDictionary, I do have one more question. How can I return a list of values (theID) by querying the key (theLocation) in a way that lets me search by partial string matches using ‘CONTAINS’ ? I’ve been looking at keysOfEntriesPassingTest but don’t understand it, I haven’t seen it translated to ASOC anywhere so far. I also have taken note that theLocation is a NSURL so this may have thwarted my recent attempts.

I’m sure I’m missing something simple here, in summary I’m looking for something like this: set aTestDict to theDict’s valueForKey:("self CONTAINS ‘TEST’ or self CONTAINS ‘Kings’ or self CONTAINS ‘Show Me Love’ ")

Predicates and valueForKey: or valueForKeyPath: work on arrays. Methods using blocks like keysOfEntriesPassingTest: aren’t available to ASObjC (and wouldn’t gain you a lot in this case anyway).

Thanks, Shane. Seems I got lost in the documentation. Back on track incorporating everything I’ve learned so far.
My process: Create NSDictionary for efficient de-duplication of keys to remove redundant values, use NSDictionary values to build a ‘delta NSSet’, use ‘delta NSSet’ in predicate with further partial-string matches across full NSArray of library items. The final result is a unique deletion list. Sharing the code in-case anyone finds it useful on their ASOC journey…

use framework "Foundation"
tell application "Music"
	set theID to current application's NSArray's arrayWithArray:(get persistent ID of every file track of library playlist 1)
	set theLocation to current application's NSArray's arrayWithArray:(get location of every file track of library playlist 1)
end tell

set myLocationDict to current application's NSDictionary's dictionaryWithObjects:theID forKeys:theLocation
set myUniqueLocationID to current application's NSSet's setWithArray:(myLocationDict's allObjects())
set myDuplicateLocationID to current application's NSMutableSet's setWithArray:theID
myDuplicateLocationID's minusSet:myUniqueLocationID

set thePosixPath to (theLocation's valueForKey:"path")
set myPosixArray to my makeArrayQuickly(theID, thePosixPath)
set thePred to current application's NSPredicate's predicateWithFormat_("thePosixPath = nil OR NOT thePosixPath CONTAINS '/Music/' OR thePosixPath CONTAINS '/.Trash/' OR theID IN %@", myDuplicateLocationID)
set myDeletionArray to (myPosixArray's filteredArrayUsingPredicate:thePred)

on makeArrayQuickly(theID, thePosixPath)
	set theID to theID as list
	set thePosixPath to thePosixPath as list
	script o
		property oID : theID's items
		property oPosixPath : thePosixPath's items
		property oResult : {}
	end script
	repeat with i from 1 to (count o's oID)
		set end of o's oResult to {theID:item i of o's oID, thePosixPath:item i of o's oPosixPath}
	end repeat
	set theResult to current application's NSMutableArray's arrayWithArray:(o's oResult)
	return theResult
end makeArrayQuickly

Although I’m content with the time to complete for 15K items, I still keep coming back to makeArrayQuickly(). Using script object properties is amazingly fast, although it seems like a ‘workaround’ and I hope I’m not missing something simple when it comes to array creation. We can create a NSDictionary in a split second but not an NSArray without a repeat loop for the results similar in the format in the example below. Or is there another way?

(NSArray) {
{
theID:“3F80FAB1127481EF”,
thePosixPath:“/Users/USER/.Trash/01 - Downtown Shutdown (Eva Shaw Remix).mp3”
},
{
theID:“42D67D14E4598F82”,
thePosixPath:“/Users/USER/Music/TEST/04 - Downtown Shutdown (The Revenge Dubstramental).mp3”
},
{
theID:“04A543FB688D9C23”,
thePosixPath:“/Users/USER/Music/TEST/03 - Downtown Shutdown (The Revenge Remix).mp3”
},
{
theID:“9D18E2D0D48445C8”,
thePosixPath:“/Users/USER/Music/TEST/1-01 - Sho-Nuff - Tonite.mp3”
},
{
theID:“237E8FB89A0FBC17”,
thePosixPath:“/Users/USER/Music/TEST/1-01 - Sho-Nuff - Tonite.mp3”
}
}

Below are 4 ways I have tested. I’m interested in learning the fastest way for NS values only. At the moment, this example reading NS values only (#1 NSArray) is actually the slowest out of all.

use framework "Foundation"
tell application "Music"
	
	--Setup NS Arrays for testing comparison
	set theNSID to current application's NSArray's arrayWithArray:(get persistent ID of every file track of library playlist 1)
	set theNSLocation to current application's NSArray's arrayWithArray:(get location of every file track of library playlist 1)
	set theNSPosixPath to (theNSLocation's valueForKey:"path")
	
	--Setup Applescript lists for testing comparison
	set theASID to theNSID as list
	set theASPosixPath to theNSPosixPath as list
	
	
	--Test 2 methods of creating the same NSMutableDictionary
	
	log "creating NSDictionary from NS values" ---0.02 SECONDS
	set theNSDict to current application's NSMutableDictionary's dictionaryWithObjects:theNSPosixPath forKeys:theNSID
	log "finished dictionary from NS values"
	
	log "creating NSDictionary from Applescript List values" ---0.38 SECONDS
	set theASDict to current application's NSMutableDictionary's dictionaryWithObjects:theASPosixPath forKeys:theASID
	log "finished dictionary from Applescript List values"
	
	--Test 4 methods of creating the same NSMutableArray
	
	--#1 NSArray : 21 SECONDS
	log "#1 NSArray : create NSMutableArray from NS values"
	set myArrayWithNSvalues to current application's NSMutableArray's alloc's init()
	repeat with i from 1 to theNSID's |count|()
		(myArrayWithNSvalues's addObject:{theNSID:theNSID's objectAtIndex:(i - 1), theNSPosixPath:theNSPosixPath's objectAtIndex:(i - 1)})
	end repeat
	log "finished NSMutableArray from NS values"
	
	--#2 NSArray : 17 SECONDS
	log "#2 NSArray : create vanilla list from applescript list values - then convert to NSMutableArray"
	set MyListWithASListValues to {}
	repeat with i from 1 to count of theASID
		set end of MyListWithASListValues to {theID:item i of theASID, thePosixPath:item i of theASPosixPath}
	end repeat
	set myConvertedFromListArrayWithApplescriptLists to current application's NSMutableArray's arrayWithArray:MyListWithASListValues
	log "finished NSMutableArray from applescript list values - then convert to NSMutableArray"
	
	--#3 NSArray : 11 SECONDS
	log "#3 NSArray : create NSMutableArray from applescript list values" ---
	set myArrayWithApplescriptLists to current application's NSMutableArray's alloc's init()
	repeat with i from 1 to count of theASID
		(myArrayWithApplescriptLists's addObject:{theID:item i of theASID, thePosixPath:item i of theASPosixPath})
	end repeat
	log "finished NSMutableArray from applescript list values"
	
	--#4 NSArray : 0.01 SECONDS
	log "#4 NSArray : create NSMutableArray from script object properties" ---0.01 SECONDS
	script o
		property oID : theASID's items
		property oPosixPath : theASPosixPath's items
		property oResult : {}
		repeat with i from 1 to count of o's oID
			set end of o's oResult to {item i of o's oID, item i of o's oPosixPath}
		end repeat
	end script
	set myArrayWithScriptObjectProperties to current application's NSMutableArray's arrayWithArray:(o's oResult)
	log "finished NSMutableArray from script object properties"
	
end tell

Let me know if anyone has any further suggestions to try?
Thanks