A recent thread in the Code Exchange forum includes several scripts that identify all files in a selected folder that have a duplicate file. The last script in the thread is by Nigel and is the one to use.
Just on a proof-of-concept basis, I wrote a shortcut that does the same thing. However, it should be noted:
The Get Contents of Folder action returns hidden files and package contents, which makes the shortcut unusable in some circumstances.
Occasionally, the shortcut appears to run without end. In some cases, this may be a reflection of how slow the shortcut is, and, in other instances, this may be the result of hidden and package files. Also, of course, there may be an error in the shortcut.
The Shortcuts app does not support sets or subtracting one list from another, and, as a result, I had to employ a different approach to get the duplicate files. I think this works correctly but further testing is needed.
The shortcut uses the md5 hash algorithm, but this can be changed to one of three other options. I ran timing tests with all four hash algorithms, and there was no difference.
Nigel’s script employs pre-filtering by file size, but I don’t believe this can be done with a shortcut, except by employing a shell command.
I’ll work to see if I can address a few of the above issues. If anyone tests this shortcut, it’s best done with a folder that contains a relatively small number of files. The following screenshot only shows a portion of the shortcut.
I worked a bit to optimize the above shortcut and got some unexpected results.
I moved the Generate Hash action out of the repeat loop (see screenshot below), thinking this might improve matters. The timing result after the Generate Hash action was 0.88 second and after the repeat loop was 2.72 second. Why the repeat loop would take 1.84 second to run is a mystery. I rewrote the repeat loop in every way possible, and the only change that made a difference was to include the Generate Hash action inside the repeat loop (as in my original shortcut), which reduced the timing result to 1.63 second.
The thought occurred that the Shortcuts app is doing some housekeeping after the repeat loop, which distorts the timing result. However, the 1-second discrepancy persists when I run the timing tests on both shortcut versions to their ends at the Save File action. So, including the Get Hash action inside the repeat loop seems the way to go.
To avoid the overhead of the Shortcuts editor, I ran the timing tests by way of the Shortcuts menu.
The following shortcut is significantly faster than that included above.
I have a small concern that the temp files that list file hashes and paths might get out of order, which would cause the shortcut to return an inaccurate result. In limited testing, that has not been an issue, but it’s something to keep in mind.
Before running this shortcut for the first time, the location of the folder that will contain the temp files must be set. This folder has to be on the computer boot drive.
Note should be made that filtering in a shortcut by file extension is not case sensitive. So, a pdf filter will also return files with a PDF extension.
This shortcut is similar to that above, differing in that hash values are not included in the saved text file. Groups of duplicate files are separated from one another by a blank line.
Never mind, I found it (and, I had actually posted in that thread). I’ll be trying out your shortcuts, as after turning Nigel script into an Apple Application, it throws an error "No user interaction allowed (-1713).
In Script Debugger, as long as you don’t click on “Applet is background only” the resulting Apple Application runs just fine. Totally embarrassed by my error.
I tested my shortcut against Nigel’s AppleScript on several large folders, and the duplicate files returned were identical. The AppleScript is faster, although the shortcut is fast enough for normal use, and the AppleScript is more robust when thousands of files are processed. About the only other difference is that the shortcut descends into packages, although this shouldn’t be an issue in most instances.
With a little help from Google AI, I was able to eliminate the use of temporary files. The revised shortcut is only marginally faster but eliminating the temp files is worthwhile by itself.
There has to be some limit to the allowed length of a string in a shortcut, which would restrict the number of files that can be handled. I successfully tested the shortcut where 1,227 files with the specified extension were processed, although it took 4.6 seconds to finish.