This is my fourth attempt to write a script for this which doesn’t simply beachball Script Debugger or Script Editor! It successfully compares a test hierarchy on my iMac’s own hard disk (26 “letter” folders, 7172 “name” folders, and 1,700,000 files) with itself in around 22-25 minutes. I do’t know how this compares with what t.spoon’s been using or even if it works with those other systems! Presumably it would take longer to complete over a network and would need even more time to analyse any differences between different sources. However, it only searches for files in folders common to both hierarchies, which could save time in some cases. It logs any relative paths not common to both sources and the relative path of any matching files whose modification dates are too far apart. The tolerated interval is set in a property at the top of the script.
I noticed when testing the difference-reporting functions (with two much smaller folders!) that nominally equal modification-date NSDates, returned as URL resource values, weren’t recognised as equal when compared. I think that something about copying the files for testing may have added a few nanoseconds to the copies’ modification dates. There are a few ways around this. I’ve gone for extracting the relevant NSDateComponents and using dates reconstituted from these if needed.
(Edit: I’ve now changed the workaround to the one suggested by Shane in post#10 below: keeping the original NSDates and using NSCalendar’s isDate:equalToDate:toUnitGranularity:
method to catch any dates which should be considered equal if they haven’t already been caught by preceding tests. I’ve also corrected a bug and removed a test line which somehow got left in. :rolleyes: )
(Further edit: The script’s now been revamped to switch judiciously between ASObjC and vanilla AppleScript for the things they do fastest. Even with the coercions involved, this has reduced the running time to about fifteen-and-a-half minutes with the test folder on my hard disk. The modification dates are now converted to AppleScript dates, which don’t see nanosecond differences. However, in case they gain this ability in the future, a zero-nanosecond reference date is subtracted from each modification date and the difference is rounded to the nearest second towards zero. It’s these differences which are compared rather than the dates themselves. The extra work only adds a couple of seconds to the running time with 3,400,000 dates.)
As I said, I don’t know if it works for the intended situation. But it’s been an interesting exercise getting it to work at all. 
use AppleScript version "2.5" -- El Capitan (10.11) or later
use framework "Foundation"
use scripting additions
-- Edit these properties as required. The paths must be POSIX paths.
property primaryPath : POSIX path of (path to desktop) & "Primary"
property backupPath : POSIX path of (path to desktop) & "Primary"
property reportPath : "~/Desktop/freeNAS Primary and Backup differences.txt"
property fileLevel : 3 -- Equivalent to "-depth 3" in "find".
property toleratedBackupDelay : 8 * hours -- Report relative paths of corresponding primary and backup files whose modification dates are further apart than this.
property skipHiddenFiles : true -- Ignore hidden files?
main()
on main()
script mainScript
-- Preset some potentially often needed values!
property |⌘| : current application
property primaryURL : |⌘|'s class "NSURL"'s fileURLWithPath:((|⌘|'s class "NSString"'s stringWithString:(primaryPath))'s stringByExpandingTildeInPath())
property backupURL : |⌘|'s class "NSURL"'s fileURLWithPath:((|⌘|'s class "NSString"'s stringWithString:(backupPath))'s stringByExpandingTildeInPath())
property primaryRelativePathOffset : ((primaryURL's |path|() as text)'s length) + 2
property backupRelativePathOffset : ((backupURL's |path|() as text)'s length) + 2
property fileManager : |⌘|'s class "NSFileManager"'s defaultManager()
property directoryKeys : |⌘|'s class "NSArray"'s arrayWithArray:({|⌘|'s NSURLIsDirectoryKey, |⌘|'s NSURLIsPackageKey})
property skipsHiddenFiles : |⌘|'s NSDirectoryEnumerationSkipsHiddenFiles
property directoryResult : (|⌘|'s class "NSDictionary"'s dictionaryWithObjects:({true, false}) forKeys:(directoryKeys)) as record
property fileAndModDateKeys : |⌘|'s class "NSArray"'s arrayWithArray:({|⌘|'s NSURLIsRegularFileKey, |⌘|'s NSURLIsPackageKey, |⌘|'s NSURLContentModificationDateKey})
property noHiddenFiles : (|⌘|'s NSDirectoryEnumerationSkipsHiddenFiles) * (skipHiddenFiles as integer)
property |{true}| : {true}
property referenceDate : (current date)
property FinderSort : |⌘|'s class "NSSortDescriptor"'s sortDescriptorWithKey:("path") ascending:(true) selector:("localizedStandardCompare:")
property regex : |⌘|'s NSRegularExpressionSearch
property regexEscapedPrimaryPath : (|⌘|'s class "NSRegularExpression"'s escapedPatternForString:(primaryURL's |path|())) as text
property regexEscapedBackupPath : (|⌘|'s class "NSRegularExpression"'s escapedPatternForString:(backupURL's |path|())) as text
property LF : |⌘|'s class "NSString"'s stringWithString:(linefeed)
property LFLF : |⌘|'s class "NSString"'s stringWithString:(linefeed & linefeed)
property LFLFLF : |⌘|'s class "NSString"'s stringWithString:(linefeed & linefeed & linefeed)
property emptyString : |⌘|'s class "NSString"'s new()
property |path| : |⌘|'s class "NSString"'s stringWithString:("path")
property |modDateMinusRefDate| : |⌘|'s class "NSString"'s stringWithString:("modDateMinusRefDate")
property |%@%@%@%@| : |⌘|'s class "NSString"'s stringWithString:("%@%@%@%@")
property |PRIMARY FILES NOT IN BACKUP| : |⌘|'s class "NSString"'s stringWithString:("PRIMARY FILES NOT IN BACKUP:")
property |BACKUP FILES NOT IN PRIMARY| : |⌘|'s class "NSString"'s stringWithString:("BACKUP FILES NOT IN PRIMARY:")
property |BACKUPS WITH MODIFICATION DATES TOO LONG BEFORE THE PRIMARIES'| : |⌘|'s class "NSString"'s stringWithString:("BACKUPS WITH MODIFICATION DATES TOO LONG BEFORE THE PRIMARIES':")
property |path IN %@| : |⌘|'s class "NSString"'s stringWithString:("path IN %@")
property report : |⌘|'s class "NSMutableString"'s new()
on main()
-- Get URLs for the file-containing folders common to both the primary and backup folders, logging any folders NOT common to both in the report string.
set {primaryFileContainerURLs, backupFileContainerURLs} to checkSubfolders()
-- Compare the file contents of the two sets of file-containing folders, logging any differences in the report string.
checkFiles(primaryFileContainerURLs, backupFileContainerURLs)
-- Write the report to a text file.
if (report's |length|() is 0) then set report to |⌘|'s class "NSString"'s stringWithString:("The files and modification dates in both folders are the same.")
set expandedReportPath to (|⌘|'s class "NSString"'s stringWithString:(reportPath))'s stringByExpandingTildeInPath()
tell report to writeToFile:(expandedReportPath) atomically:(true) encoding:(|⌘|'s NSUTF8StringEncoding) |error|:(missing value)
return
end main
(* Compare the folders in the primary and backup hierarchies down to the level of the file-containing folders and log any differences. Return URLs for the file-containing folders common to both hierarchies. *)
on checkSubfolders()
-- Get the names of the primary file-container URLs.
set primarySubfolderURLs to getSubfolderURLs(primaryURL) -- Mutable array
set primarySubfolderNames to primarySubfolderURLs's valueForKey:("lastPathComponent")
-- Ditto the backup file-container URLs.
set backupSubfolderURLs to getSubfolderURLs(backupURL) -- Mutable array.
set backupSubfolderNames to backupSubfolderURLs's valueForKey:("lastPathComponent")
-- If the two set of names are not the same, analyse, add to the report, and filter the URLs to leave just those for folders whose names are common to both hierarchies.
if not (backupSubfolderNames's isEqualToArray:(primarySubfolderNames)) then
reportOnAndFilterOutSubfolderDifferences(regexEscapedPrimaryPath, primarySubfolderURLs, backupSubfolderNames, "PRIMARY SUBFOLDERS NOT IN BACKUP:")
reportOnAndFilterOutSubfolderDifferences(regexEscapedBackupPath, backupSubfolderURLs, primarySubfolderNames, "BACKUP SUBFOLDERS NOT IN PRIMARY!:")
end if
-- Filter further to leave just URLs for the folders at the file-container level.
filterByPathComponentCount(regexEscapedPrimaryPath, primarySubfolderURLs, fileLevel - 1)
filterByPathComponentCount(regexEscapedBackupPath, backupSubfolderURLs, fileLevel - 1)
return {primarySubfolderURLs, backupSubfolderURLs}
end checkSubfolders
(* Recursively find the folders in this particular hierarchy and return URLs for them. *)
on getSubfolderURLs(topFolderURL)
script localScript
property subfolderURLs : {} --|⌘|'s class "NSMutableArray"'s new()
on doRecursiveStuff(folderURL, currentLevel)
set contentsURLs to (fileManager)'s contentsOfDirectoryAtURL:(folderURL) includingPropertiesForKeys:(directoryKeys) options:(skipsHiddenFiles) |error|:(missing value)
set nextLevel to currentLevel + 1
set gettingNextLevel to (nextLevel < fileLevel)
repeat with thisURL in contentsURLs
if ((thisURL's resourceValuesForKeys:(directoryKeys) |error|:(missing value)) as record is directoryResult) then
set end of my subfolderURLs to thisURL
if (gettingNextLevel) then doRecursiveStuff(thisURL, nextLevel)
end if
end repeat
end doRecursiveStuff
end script
tell localScript to doRecursiveStuff(topFolderURL, 1)
set subfolderURLs to |⌘|'s class "NSMutableArray"'s arrayWithArray:(localScript's subfolderURLs)
tell subfolderURLs to sortUsingDescriptors:({FinderSort})
return subfolderURLs
end getSubfolderURLs
(* Log the relative paths of folders which occur in one hierarchy but not the other and filter out the URLs corresponding to those paths. *)
on reportOnAndFilterOutSubfolderDifferences(regexEscapedTopFolderPath, subfolderURLs, otherSubfolderNames, heading)
set filter to |⌘|'s class "NSPredicate"'s predicateWithFormat_("NOT (lastPathComponent in %@)", otherSubfolderNames)
set unmatchedSubfolderURLs to subfolderURLs's filteredArrayUsingPredicate:(filter)
if ((count unmatchedSubfolderURLs) > 0) then
addToReport(heading, unmatchedSubfolderURLs's valueForKey:(|path|))
tell report to replaceOccurrencesOfString:("(?m)^" & regexEscapedTopFolderPath & "/") withString:(emptyString) options:(regex) range:({0, its |length|()})
set filter to |⌘|'s class "NSPredicate"'s predicateWithFormat_("NOT (self IN %@)", unmatchedSubfolderURLs)
tell subfolderURLs to filterUsingPredicate:(filter)
end if
end reportOnAndFilterOutSubfolderDifferences
(* Append the "path"(s) in a given array to the report text along with with a given heading. *)
on addToReport(heading, anArray)
tell report to appendFormat_(|%@%@%@%@|, heading, LFLF, anArray's componentsJoinedByString:(LF), LFLFLF)
end addToReport
(* Filter a hierarchy's folder URLs to leave just those for folders at the file-container level. *)
on filterByPathComponentCount(regexEscapedTopFolderPath, subfolderURLs, containerLevel)
set filter to |⌘|'s class "NSPredicate"'s predicateWithFormat:("path MATCHES '^" & regexEscapedTopFolderPath & "(?:/[^/]++){" & containerLevel & "}+$'")
tell subfolderURLs to filterUsingPredicate:(filter)
end filterByPathComponentCount
(* Compare the files in each corresponding primary and backup folder and log any differences. *)
on checkFiles(primaryFileContainerURLs, backupFileContainerURLs)
-- The modification dates of files modified on APFS systems may lose nanosecond components when the files are copied to HFS disks or are backed up by processes ignorant of date nanoseconds. Since this script converts the file dates to AppleScript dates, such differences won't currently affect their comparison. However, in case AS dates ever gain nanosecond precision in the future, a nanosecondless reference date is subtracted from each date and it's these differences, rounded towards zero to the nearest second, which are compared rather than the dates themselves. The extra work involved only adds a couple of seconds to the running time with 3,400,000 dates. The reference date used doesn't matter (it can be in the future!) so long as its 'time' or 'seconds' is an integer.
tell referenceDate to set {its day, its year, its month, its time} to {1, 1904, January, 0}
repeat with i from 1 to (count primaryFileContainerURLs)
set primaryFileInfo to getFileInfo(item i of primaryFileContainerURLs, primaryRelativePathOffset)
set backupFileInfo to getFileInfo(item i of backupFileContainerURLs, backupRelativePathOffset)
if not (backupFileInfo's isEqualToArray:(primaryFileInfo)) then analyseFileDifferences(primaryFileInfo, backupFileInfo)
end repeat
end checkFiles
(* Get an array of dictionaries containing the relative paths of the files in a particular folder and the differences in whole seconds between the files' modification dates and the reference date. *)
on getFileInfo(containerURL, topFolderRelativePathOffset)
script o
property fileInfo : {}
end script
set contentURLs to fileManager's contentsOfDirectoryAtURL:(containerURL) includingPropertiesForKeys:(fileAndModDateKeys) options:(noHiddenFiles) |error|:(missing value)
repeat with thisURL in contentURLs
set fileAndModDateValues to (thisURL's resourceValuesForKeys:(fileAndModDateKeys) |error|:(missing value)) as record
if ((fileAndModDateValues as list) contains |{true}|) then -- The best option if hedging one's bets, otherwise:
-- if ((fileAndModDateValues's NSURLIsRegularFileKey) or (fileAndModDateValues's NSURLIsPackageKey)) then -- Faster if the files are known to be mostly regular files.
-- if ((fileAndModDateValues's NSURLIsPackageKey) or (fileAndModDateValues's NSURLIsRegularFileKey)) then -- Faster if the files are known to be mostly packages.
set relativePath to (thisURL's |path|() as text)'s text topFolderRelativePathOffset thru end
set modDate to fileAndModDateValues's NSURLContentModificationDateKey
set end of o's fileInfo to {|path|:relativePath, |modDateMinusRefDate|:(modDate - referenceDate) div 1}
end if
end repeat
-- Convert the list to an NSArray and sort by 'path', Finder-style, for later comparison with an array for the corresponding other folder.
-- This assumes the files are likely to match in the majority of cases. (Creating, sorting, and comparing NSMutableArrays is faster than the same with NSMutableOrderedSets.) If they're more likely NOT to match, it may be better to set an NSMutableOrderedSet here instead, use 'isEqualToOrderedSet:' instead of 'isEqualToArray:' in the checkFiles() handler above, and cut the first two instructions in the analyseFileDifferences() handler below.
set fileInfo to |⌘|'s class "NSMutableArray"'s arrayWithArray:(o's fileInfo)
tell fileInfo to sortUsingDescriptors:({FinderSort})
return fileInfo
end getFileInfo
(* Knowing that two arrays of dictionaries containing paths and modification date/reference date differences aren't equal, analyse the differences and add to the report. *)
on analyseFileDifferences(primaryFileInfo, backupFileInfo)
-- Switching to ordered sets is useful here.
set primaryFileInfo to |⌘|'s class "NSOrderedSet"'s orderedSetWithArray:(primaryFileInfo)
set backupFileInfo to |⌘|'s class "NSMutableOrderedSet"'s orderedSetWithArray:(backupFileInfo)
-- Reduce each set to its dictionaries with no counterpart in the other.
set inPrimaryButNotInBackup to primaryFileInfo's mutableCopy()
tell inPrimaryButNotInBackup to minusOrderedSet:(backupFileInfo)
set inBackupButNotInPrimary to backupFileInfo -- 's mutableCopy()
tell inBackupButNotInPrimary to minusOrderedSet:(primaryFileInfo)
-- Get the relative paths from the remaining dictionaries (also as ordered sets).
set primaryPaths to inPrimaryButNotInBackup's valueForKey:(|path|)
set backupPaths to inBackupButNotInPrimary's valueForKey:(|path|)
-- Analyse and report on any paths which don't exist in both sets.
set pathsOnlyInPrimary to getPathDifferences(primaryPaths, backupPaths)
if ((count pathsOnlyInPrimary) > 0) then addToReport(|PRIMARY FILES NOT IN BACKUP|, pathsOnlyInPrimary's array())
set pathsOnlyInBackup to getPathDifferences(backupPaths, primaryPaths)
if ((count pathsOnlyInBackup) > 0) then addToReport(|BACKUP FILES NOT IN PRIMARY|, pathsOnlyInBackup's array())
-- Analyse and report on any paths which DO exist in both sets. These belong to matching files with different modification dates (or with modification dates which are nominally equal but actually a few nanoseconds apart, which can happen under some circumstances).
set pathsWithDifferentModificationDates to getModDateDifferences(primaryPaths, backupPaths, inPrimaryButNotInBackup, inBackupButNotInPrimary)
if ((count pathsWithDifferentModificationDates) > 0) then addToReport(|BACKUPS WITH MODIFICATION DATES TOO LONG BEFORE THE PRIMARIES'|, pathsWithDifferentModificationDates's array())
end analyseFileDifferences
(* Return the relative paths in one ordered set which aren't in the other. *)
on getPathDifferences(orderedSetA, orderedSetB)
set orderedSetA to orderedSetA's mutableCopy()
tell orderedSetA to minusOrderedSet:(orderedSetB)
return orderedSetA
end getPathDifferences
(* Return any relative paths common to two ordered sets if the modification dates of the files to which they point are more than the tolerated interval apart. *)
on getModDateDifferences(primaryPaths, backupPaths, inPrimaryButNotInBackup, inBackupButNotInPrimary)
set commonPaths to primaryPaths's mutableCopy()
tell commonPaths to intersectOrderedSet:(backupPaths)
set commonPathCount to (count commonPaths)
if (commonPathCount > 0) then
-- If there are relative paths in common, get the dictionaries containing those paths …
set infoFilter to |⌘|'s class "NSPredicate"'s predicateWithFormat_(|path IN %@|, commonPaths)
tell inPrimaryButNotInBackup to filterUsingPredicate:(infoFilter)
tell inBackupButNotInPrimary to filterUsingPredicate:(infoFilter)
-- … and extract their |modDateMinusRefDate| values as lists of AS integers.
script o
property primaryModDateDifferences : (inPrimaryButNotInBackup's array()'s valueForKey:(|modDateMinusRefDate|)) as list
property backupModDateDifferences : (inBackupButNotInPrimary's array()'s valueForKey:(|modDateMinusRefDate|)) as list
end script
repeat with i from commonPathCount to 1 by -1
-- Compare the |modDateMinusRefDate| differences corresponding to ith relative path in commonPaths.
set primaryModDateDifference to o's primaryModDateDifferences's item i
set backupModDateDifference to o's backupModDateDifferences's item i
-- If the difference between the differences is within the tolerated interval, remove the corresponding relative path from consideration.
if (primaryModDateDifference - backupModDateDifference is not greater than toleratedBackupDelay) then tell commonPaths to removeObjectAtIndex:(i - 1)
end repeat
end if
return commonPaths
end getModDateDifferences
end script
mainScript's main()
end main