Delete other files with the exact same size.

If you are sure that there is no problem with a total destruction of the unwanted files, you may replace the instruction

tell application "Finder" to delete deleteList

which move the files to the trash by:

tell application "System Events"
	repeat with anItem in deleteList
		delete anItem
	end repeat
end tell

which really destroy them.

As I am really lazy, I would also replace

set previousFileSizes to {}
set beginning of previousFileSizes to item 1 of fileSizes

by

set previousFileSizes to {item 1 of fileSizes}

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) dimanche 1 septembre 2019 20:10:56

Here, no need to coerce the file references list to aliases list. It is enough this:

set theFiles to every file in sourceFolder

No need text Item delimeters too. And default button there should always be a button with less dangerous consequences:


repeat with anItem in deleteList
	tell application "Finder" to display dialog "Delete the following file?" & ¬
		return & return &  name of anItem default button "Cancel"
end repeat

Otherwise, the script is good and simple enough to remove duplicates without searching in subfolders.


set sourceFolder to choose folder

set fileSizes to {}
tell application "Finder"
	set theFiles to every file in sourceFolder
	repeat with aFile in theFiles
		set the end of fileSizes to size of aFile
	end repeat
end tell

set previousFileSizes to {item 1 of fileSizes} # as suggested by Yvan
set deleteList to {}
repeat with i from 2 to (count fileSizes)
	set anItem to item i of fileSizes
	if (previousFileSizes contains anItem) then set end of deleteList to item i of theFiles
	set end of previousFileSizes to anItem
end repeat

repeat with anItem in deleteList
	tell application "Finder"
		try
			display dialog "Delete the following file?" & ¬
				return & return & name of anItem default button "Cancel"
			if button returned of result is "OK" then delete anItem
		end try
	end tell
end repeat

NOTE: try block is need to avoid interruption of process from “User cancelled” error.

This offers no warnings and may not be as efficient, but it’s a little simpler:

set thePath to "Macintosh HD:Users:shane:Desktop:Sizes"
tell application id "com.apple.finder" -- Finder
	set theFiles to sort files of folder thePath by size
	set theSize to size of item 1 of theFiles
	repeat with aFile in rest of theFiles
		set thisSize to size of aFile
		if thisSize = theSize then
			delete aFile
		else
			set theSize to thisSize
		end if
	end repeat
end tell

Yes, this is much simpler. And as I see, it should be much efficient too. As for warnings, it is easy to add one similar display dialog before delete aFile code line. :slight_smile:

Several knowledgeable forum members have reported that it takes less time for the Finder to create a list of files as an alias list than it does for the Finder to create a list of files using Finder’s own syntax. For example, see Marc Anthony’s post number 21 in the following thread.

https://macscripter.net/viewtopic.php?pid=196460

Nigel appears to be reporting something similar when he wrote:

“One of the things which takes so long with the Finder is that it has to put together a long list of its own specifiers. As with System Events, if you can get it to return the results in some other form instead, it’s often quicker (that is, quicker than it otherwise would be)… The as alias list is a Finder speciality which works with the preceding specifier rather than coercing a returned list after the fact.”

https://macscripter.net/viewtopic.php?pid=195806

Finally, I often find it necessary to process files outside a Finder tell statement and an alias list makes this simpler and often quicker. So, I would disagree with your statement.

Peavine.

You are right. As alias list “Finder” gets results much faster.
I tested this with 45459 files in my home folder. Without alias list time was 920 seconds, and with alias list time was 385 seconds. The recursive test script I used was this:

set logTime to {}
tell application "Finder"
	set theFolder to (path to home folder) --the last created folder on my desktop
	set startTime to my (current date)'s time --my used to escape standard addition error
	set allFiles to {}
	my getAllFiles(theFolder, allFiles)
	set logTime's end to (my ((current date)'s time)) - startTime
	delay 0.5
	set startTime to my (current date)'s time
	set allFiles to {}
	my getAllFiles2(theFolder, allFiles)
	set logTime's end to (my ((current date)'s time)) - startTime
end tell
logTime

on getAllFiles(theFolder, allFiles)
	tell application "Finder"
		set fileList to files of theFolder
		repeat with i from 1 to (count fileList)
			set end of allFiles to item i of fileList
		end repeat
		set subFolders to folders of theFolder
		repeat with subFolderRef in subFolders
			my getAllFiles(subFolderRef, allFiles)
		end repeat
	end tell
end getAllFiles

on getAllFiles2(theFolder, allFiles)
	tell application "Finder"
		set fileList to files of theFolder as alias list
		repeat with i from 1 to (count fileList)
			set end of allFiles to item i of fileList
		end repeat
		set subFolders to folders of theFolder as alias list
		repeat with subFolderRef in subFolders
			my getAllFiles2(subFolderRef, allFiles)
		end repeat
	end tell
end getAllFiles2

I created a test folder of 300 text files, all of which were the same size, and then timed the scripts in this thread. I first modified each script to exclude the dialog prompt (except Shanes which doesn’t have one). The results were:

peavine - 1 to 2 seconds

Shane - 8 to 10 seconds

KniazidisR - 12 to 14 seconds

The difference would appear to be that my script bulk deletes the duplicate files, while the other scripts don’t. Also, Shane clearly stated that his script was written for simplicity not speed, and my test is worse-case in that it deletes 299 of 300 files.

BTW, I modified Shane’s script to bulk delete the duplicate files by moving the delete command out of the repeat loop, and the timing was 2 seconds. This may be the best script–simple and fast.

I don’t know how you edited Shane’s script.

It seems that according to the original post, the simpler code would be :

set thePath to (path to desktop as text) & "Sizes"
tell application id "com.apple.finder" -- Finder
	set theFiles to sort files of folder thePath by size
	delete (rest of theFiles)
end tell

Testing that I discovered something.
When it apply upon the Desktop,the instruction [format]set theFiles to sort files of folder thePath by size[/format] return a list of aliases.
For an other folder it return a list of document files.

Yvan KOENIG running High Sierra 10.13.6 in French (VALLAURIS, France) mardi 3 septembre 2019 17:31:34

I couldn’t resist finding a role for my Custom Iterative Ternary Merge Sort. :slight_smile: This is a bit faster than Peavine’s posted script with my 6592-file test folder, but presumably the OP’s folder won’t have that many files!

use sorter : script "Custom Iterative Ternary Merge Sort" -- <https://macscripter.net/viewtopic.php?pid=194430#p194430>
use scripting additions

on main()
	script o
		property filePaths : missing value
		property fileSizes : missing value
	end script
	
	set thePath to (choose folder) as text
	-- Get corresponding lists of the folder's files' paths and sizes. System Events is faster than the Finder for this.
	tell application "System Events" to set {o's filePaths, o's fileSizes} to {path, size} of files of folder thePath
	
	-- Sort both lists on the file's paths (can be omitted), then (stably) on the sizes.
	tell sorter to sort(o's filePaths, 1, -1, {slave:{o's fileSizes}}) -- If desired.
	tell sorter to sort(o's fileSizes, 1, -1, {slave:{o's filePaths}})
	
	-- Initialise a "current size" variable to some figure below the lowest size.
	set currentSize to (beginning of o's fileSizes) - 1024
	-- Work through the sizes. At each increase, replace the corresponding path with missing value.
	repeat with i from 1 to (count o's fileSizes)
		set thisSize to item i of o's fileSizes
		if (thisSize > currentSize) then
			set item i of o's filePaths to missing value
			set currentSize to thisSize
		end if
	end repeat
	
	-- Get a list containing only the unreplaced paths and bulk-delete those files.
	-- The Finder accepts list of paths for this. System Events's dictionary says it does too, but it doesn't accept lists of anything on my machine.
	set filesToDelete to o's filePaths's text
	tell application "Finder" to delete filesToDelete
	
	return
end main

main()

Hi Yvan.

I think the OP wants to keep one instance of each size rather than just the smallest file in the folder.

Here’s an option using ASObjC. The differences are (a) it ignores packages (and invisible files), which is probably reasonable in this situation, and (b) if the size of two files match it also checks that their contents match.

use AppleScript version "2.5" -- macOS 10.11 or later
use framework "Foundation"
use scripting additions

-- constants and enums used
property NSDirectoryEnumerationSkipsHiddenFiles : a reference to 4
property NSURLFileSizeKey : a reference to current application's NSURLFileSizeKey

set thePath to "/Users/shane/Desktop/Size test" --POSIX path of (choose folder with prompt "choose the folder")
set theFolder to current application's NSURL's fileURLWithPath:thePath
set fileManager to current application's NSFileManager's |defaultManager|()
set {theFiles, theError} to fileManager's contentsOfDirectoryAtURL:theFolder includingPropertiesForKeys:{NSURLFileSizeKey} options:NSDirectoryEnumerationSkipsHiddenFiles |error|:(reference)
set sizeInfo to current application's NSMutableDictionary's dictionary() -- keys will be size, objects will be array of URLs
repeat with aFile in theFiles
	set {theResult, theSize} to (aFile's getResourceValue:(reference) forKey:NSURLFileSizeKey |error|:(missing value))
	if theSize is not missing value then -- skip packages and folders
		if (sizeInfo's allKeys()'s containsObject:theSize) as boolean then -- check if same size already found
			set matchingFiles to (sizeInfo's objectForKey:theSize) -- get files that had same size
			set matchFlag to false
			repeat with aMatch in matchingFiles -- compare contents, delete if the same
				if (fileManager's contentsEqualAtPath:(aMatch's |path|()) andPath:(aFile's |path|())) as boolean then
					(fileManager's trashItemAtURL:aFile resultingItemURL:(missing value) |error|:(missing value))
					set matchFlag to true
					exit repeat
				end if
			end repeat
			if not matchFlag then
				(matchingFiles's addObject:aFile)
			end if
		else
			(sizeInfo's setObject:(current application's NSMutableArray's arrayWithObject:aFile) forKey:theSize)
		end if
	end if
end repeat

Hi, Shane.

I don’t now why, but your pure AppleScript variant runs faster - 32 mseconds on my machine (after compiling). AppleScriptObjC variant runs 602 mseconds (after compiling).

Peavine’s script runs 17 mseconds on my machine (after compiling). I removed warning dialog from his script to test.

It’s because the first script just compares file sizes. The second one first compares sizes, and if they match it compares the entire contents of the files to see if they match exactly. Not what was asked for, but safer.

Oh, got it. Thanks for clarifying. By the way, is this a byte comparison or some other algorithm? With hash, for example.

A byte comparison, I believe. The documents say: “For files, this method checks to see if they’re the same file, then compares their size, and finally compares their contents.”

Thanks for new version. A believe, no need matchFlag:

repeat with aMatch in matchingFiles -- compare contents, delete if the same
	if not ((fileManager's contentsEqualAtPath:(aMatch's |path|()) andPath:(aFile's |path|())) as boolean) then
		(matchingFiles's addObject:aFile)
	else
		(fileManager's trashItemAtURL:aFile resultingItemURL:(missing value) |error|:(missing value))
		exit repeat
	end if
end repeat

And, I don’t know if this comparison method leaves the byte comparison at the first byte mismatch. If not, then perhaps this method has some additional options for the faster behavior.

From documentation I read: “For files, this method checks to see if they’re the same file, then compares their size, and finally compares their contents. This method does not traverse symbolic links, but compares the links themselves.”

This means that this method has its own comparison order. So, checking files for equality of size before calling this method is doing one job twice.

No, you’re modifying the number of items in matchingFiles while you loop through it. You’re also risking adding the same file multiple times, and therefore ending up with the array containing items that have been trashed. In any event, it would be the tiniest of optimizations.

Yes, but the difference is that we’re storing the size for re-use, rather than having to read it from two files for each comparison.

Look at it this way. Suppose there are 10 files, all different. My script gets their size once each, and that’s all. If we just used contentsEqualAtPath:andPath:, you’d need to call it 9 times with the first item, 8 times with the second item, and so on.

Ah yes. I’d overlooked them in my script. System Events includes invisibles when returning disk items. While the Finder will error if specifically asked to return those invisibles, it seems it will delete them if passed their paths in a list.

My script can easily be adapted in either of two ways, depending on which seems the best idea at the time. One is to get the files’ ‘visible’ properties too and eliminate any paths where the corresponding ‘visible’ is false:

use sorter : script "Custom Iterative Ternary Merge Sort" -- <https://macscripter.net/viewtopic.php?pid=194430#p194430>
use scripting additions

on main()
	script o
		property filePaths : missing value
		property fileSizes : missing value
		property fileVisibles : missing value
	end script
	
	set thePath to (choose folder) as text
	-- Get corresponding lists of the folder's files' paths, sizes, and visibles. System Events is faster than the Finder for this.
	tell application "System Events" to set {o's filePaths, o's fileSizes, o's fileVisibles} to {path, size, visible} of files of folder thePath
	
	-- Sort all three lists on the files' paths (can be omitted), then (stably) on the sizes.
	tell sorter to sort(o's filePaths, 1, -1, {slave:{o's fileSizes, o's fileVisibles}}) -- If desired.
	tell sorter to sort(o's fileSizes, 1, -1, {slave:{o's filePaths, o's fileVisibles}})
	-- Initialise a "current size" variable to some figure below the lowest size.
	set currentSize to (beginning of o's fileSizes) - 1024
	-- Work through the visibles and sizes. At each non-visible or size increase, replace the corresponding path with missing value.
	repeat with i from 1 to (count o's fileVisibles)
		if (item i of o's fileVisibles) then -- visible = true.
			set thisSize to item i of o's fileSizes
			if (thisSize > currentSize) then
				set item i of o's filePaths to missing value
				set currentSize to thisSize
			end if
		else -- visible = false.
			set item i of o's filePaths to missing value
		end if
	end repeat
	
	-- Get a list containing only the unreplaced paths and bulk-delete those files.
	-- The Finder accepts list of paths for this. System Events's dictionary says it does too, but it doesn't accept lists of anything on my machine.
	set filesToDelete to o's filePaths's text
	tell application "Finder" to delete filesToDelete
	
	return
end main

main()

Or, if you take invisible files simply as those whose names begins with “.”, and you’re only searching one folder, the paths to such files will be sorted to the beginning when the paths are sorted and so they can be eliminated then:

use sorter : script "Custom Iterative Ternary Merge Sort" -- <https://macscripter.net/viewtopic.php?pid=194430#p194430>
use scripting additions

on main()
	script o
		property filePaths : missing value
		property fileSizes : missing value
	end script
	
	set thePath to (choose folder) as text
	-- Get corresponding lists of the folder's files' paths and sizes. System Events is faster than the Finder for this.
	tell application "System Events" to set {o's filePaths, o's fileSizes} to {path, size} of files of folder thePath
	
	-- Sort both lists on the files' paths.
	tell sorter to sort(o's filePaths, 1, -1, {slave:{o's fileSizes}})
	set numberOfFiles to (count o's filePaths)
	-- Replace paths containing ":." with missing value. In a single-level hierarchy, these will have been sorted to the beginning of the path list.
	repeat with i from 1 to numberOfFiles
		if (item i of o's filePaths contains ":.") then
			set item i of o's filePaths to missing value
		else
			exit repeat
		end if
	end repeat
	
	-- If there are at least two visible paths left …
	if (i < numberOfFiles) then
		-- Sort the corresponding items in both lists on the files' sizes.
		tell sorter to sort(o's fileSizes, i, -1, {slave:{o's filePaths}})
		-- Initialise a "current size" variable to some figure below the lowest size.
		set currentSize to (item i of o's fileSizes) - 1024
		-- Work through the sizes. At each increase, replace the corresponding path with missing value.
		repeat with i from i to numberOfFiles
			set thisSize to item i of o's fileSizes
			if (thisSize > currentSize) then
				set item i of o's filePaths to missing value
				set currentSize to thisSize
			end if
		end repeat
		
		-- Get a list containing only the unreplaced paths and bulk-delete those files.
		-- The Finder accepts list of paths for this. System Events's dictionary says it does too, but it doesn't accept lists of anything on my machine.
		set filesToDelete to o's filePaths's text
		tell application "Finder" to delete filesToDelete
	end if
	
	return
end main

main()

Shane, thanks again for clarification. I will convert your script into a recursive version to remove any duplicates on my home folder.

I already have one unusual script https://www.macscripter.net/viewtopic.php?id=46812 that I wrote using Smart Folders future. It is as fast as possible and it will be interesting for me then to compare it with your script.

Shane Stanley recently posted another Script Geek update, and I remembered this topic. Nigel Garvey managed to get a code that was slightly faster than the code published by Peavine. Then I didn’t have a Script Geek to test it out and I took their word for it.

Now I have Script Geek . I also remembered that:

  1. the Peavine code contained an unnecessary repeat loop and its speed could be improved further. 2) I also introduced another improvement by applying the contents (of current item) property in the second (necessary) loop, which is speed efficient.

Thus, I was able to get the code 1.5 times faster than the Peavine’s original. For clarity, I removed the warning dialogs. Now, it is as fast as it is simple. Which is what I try to achieve for every “good” script.


set sourceFolder to choose folder

tell application "Finder"
	set theFiles to every file in sourceFolder as alias list
	set fileSizes to size of every file in sourceFolder
end tell

set previousFileSizes to {}
set deleteList to {}
set i to 0
repeat with theSize in fileSizes
	set i to i + 1
	if (theSize is in previousFileSizes) then
		set end of deleteList to item i of theFiles
	else
		set end of previousFileSizes to contents of theSize
	end if
end repeat

-- tell application "Finder" to delete deleteList