problem with shell script: spaces as a character in files and folders

hi,
i’m trying to run this line in the Terminal, but its results are not completly right.

find SOMEFOLDERPATH -type d -exec find {} -name “*” ; | xargs md5 >> ANOTHERFOLDERPATH/database.txt

it stores the md5 signature of files in the database.txt
the thing is that it skips files and folders that contain a space in their names…
how can i fix that, without having to alter the names of the files & folders?

let me explain the project:
i have a Volume and its backup. at some point of the backing up process, everything went wrong. and now, both of those volumes have different sizes.
i want to do a script that scans troughout the whole original volume, and store every md5 signature in a txt. then, do the same with the backup volume. and then, compare the two databases to check what files are not in the other and vice-versa.
since i also can’t trust the names of the files or folders ( because they could be 2 files with the same name in both, but be different versions…), i belive this is the best way to do this.

thanks.

-Marto.

Are you using the quoted form of the filepaths that you are sending to the shell? That will typically preserve blanks.

grumble, the problem is xargs… Sorry man I hadn’t contemplated spaces in a file name. Say you have a file like this

/Volumes/HD/Users/Joe/Desktop/Folder/Picture 1.jpg

well xargs is feeding md5 two files to create a hash for from that

/Volumes/HD/Users/Joe/Desktop/Folder/Picture

and

1.jpg

Let me see what I can come up with.

Okay give this a try it should work (I hope :cool:)

find SOMEFOLDERPATH -type d -exec find {} ! -type d -name “*” -print0 ; | xargs -0 md5 >> ANOTHERFOLDERPATH/database.txt

works like magic.
i belive automagically is the term. :smiley:

so… just if you have the time, could you explain me what you did to fix it?

thank you very much James.

  • Marto.

In our command the things I had added are bolded here:

find SOMEFOLDERPATH -type d -exec find {} ! -type d -name “*” -print0 ; | xargs -0 md5 >> ANOTHERFOLDERPATH/database.txt

And now for the explanation.

! -type d
This really shouldn’t be needed, but I threw in there to make the command cleaner. When I was running the script before it was giving me feedback that it couldn’t generate a hash for folders. So this tells the second find command (where we are finding our files to hash) to ignore any found results of type:folder.

-print0 & -0
So as I said before the problem was that xargs was feeding “Picture 1.jpg” to md5 as “Picture” and as “1.jpg”. This was actually expected behavior that I failed to account for the first time around. From xargs man page:

which explains why spaces were throwing things off. To get around this we use -O with xargs which, again from the man page,:

And our find option -print0 does just that. This time from find’s man page:

So now that xargs is only evaluating NULL as a delimiter is evaulates “Picture 1.jpg” as “Picture 1.jpg” which ofcourse won’t fail!

Hope that all makes sense :smiley:

beautifull. couldn’t been easier for me to understand.

thanks!!!

marto.

Always glad to help, at least when it comes to the terminal. Really my applescripting knowledge is lacking, imo, I just happen to have good knowledge of *ix commands which is why most of my scripts end up [over]utlilizing “do shell script”.

well…
now i have the two databases of the two volumes.
i want to compare them, line by line, in order to delete from the text files, any lines that do not match (either cause of the path, or cause of the md5 signature). that way, in each file i’ll have a list of the files that are in one volume, but not in the other, and i know what i have to save.
the thing is that the path will always be different, unless i cut out the begging of the path of each line… let me explain:

MD5 (/Users/marto/Desktop/Volume-Comparing/test-folders/Scans copy/Scans//.DS_Store) = 194577a7e20bdcc7afbb718f502c134c
will always be different from
MD5 (/Users/marto/Desktop/Volume-Comparing/test-folders/Scans/Scans//.DS_Store) = 194577a7e20bdcc7afbb718f502c134c

but the file is the same.
so i have to cut the lines from ‘md5’ to ‘scans’ (un one file) and ‘scans copy’ (in the other).

that way, when i compare the lines, they’ll be different only if the path and/or the md5 is different. but i can’t find a way around this…

another thing:
i think that i could compare the lines in each file with every other line with grep -f file1 -f file2
but when i run this in the Terminal, it just sorta stucks, and stays there… returning nothing.

any pointers?

thanks!

done with the correction of the database…
i’ve used the subroutine thats posted on CodeXchange…
works great.

now… could i use grep to find the duplicate lines between those txt files and delete them, leaving the ones that do not match???

thanks!

I’m not quite following… Can you give a non ds_store example and then sorta walk through what exactly you would like to happen in the example scenario.

yes, forgive me.

i have two volumes, a & b. a was supposed to be b back up for a. in the middle of the back up process, somthing went wrong, and i’ve lost files from both the volumes. and the remaining size of the volumes is different. i don’t know if the bigger one has EVERY filein the smaller one and more… i think it could contain some of the missing files in the other.
so, i want to compare the md5 signature of every file in volume a, with the md5 signature of every file in volume b, and to print the ones that do not match in a txt. that way, i could put all the files in one volume, and then back that up.

i have one script, that now writes the full path of every file, along with the MD5 signature of it, to a txt.
so i’ve created databases for each volume. and i’ve made another script to cut out the first part of every line in both files. so that when i compare the lines, the part of /volumes/a/ is not compared to /volumes/b/.

the only thing that i need now is a way to compare each line of one database to the other, and write the not matching ones to a file.

i was wondering if that could be done with grep

PS: the scenario of the databases was an example, but i’ve already solved that part. i only need to get the lines compared. but when i tried grep -f database1 -f database2 the Terminal didn’t do anything…

what i’ve just asked for is idiotic… sorry.

what i want the final script, is to look for a match from one database, in the other. if there is a match, then nothing; but if there is no match then print that path into another txt.

if that’s done, i’ll get a txt with the paths of all the files of one volume, that are not in the other one.
then i run it again, but selecting the other database as the primary, and presto… i have a list of every file that is not duplicated on the two volumes.

right?

Are we still talking about spaces, or is this shifting back to your text database project?

well. its not about spaces anymore, thats for sure.
and, no, it isn’t for my database project exaclty, it only uses part of it.
i just didn’t want to make another post, and keep messing with the same thing. i don’t know if i should have. sorry.
do you want me to make another thread with this ‘new’ issue?

Okay this should work… this will ask for two MD5 logs. It compares each line of log 1 to log 2 and if there is not a match (ie a unique file) it writes the file path and hash to a log on the desktop.

You would then, currently, have to run this again but selecting the OTHER file first, then the other, since you could have a unique hash in either file.

set log_1 to read file ((choose file) as Unicode text)
set log_2 to read file ((choose file) as Unicode text)
set log_1_items to paragraphs of log_1
set AppleScript's text item delimiters to {" = "}
repeat with i from 1 to ((count of log_1_items) - 1) by 1
	if log_2 does not contain item 2 of (text items of (item i of log_1_items)) then
		do shell script "echo " & (quoted form of (item i of log_1_items)) & " >> ~/Desktop/NoMatch.txt"
	end if
end repeat

Yeah I’m sure there is a better way to do this as well LOL

Also, you’re not going to want to manipulate your orginal log data (just pulling out the hash) as you mentioned before as this does it for you.

great!
it works great!
can’t thank you enough James.

marto.

Glad to help marto!

Depending on how many entries you expect in the log you may want to write to the log using AppleScript rather then using my repeated shell call version :smiley: