Jan 21 2009

Find Duplicate Files with a Shell Script

Published by at 11:02 pm under Linux

This shell script finds duplicate files in a given directory comparing their (md5) checksum. This means the content is checked and is strictly identical, rather than the filename or date of creation.
This is usually useful to delete large files. ‘Find’ command option -size can help speeding up and finding the largest duplicate files.
 

admin@fileserver$
find /usr/bin -type f -print0 |
xargs -0 -n1 md5sum |
sort -k 1,32 |
uniq -w 32 -d --all-repeated=separate |
sed -e 's/^[0-9a-f]*\ *//;'

/usr/bin/c2ph
/usr/bin/pstruct

/usr/bin/pgrep
/usr/bin/pkill

/usr/bin/perl
/usr/bin/perl5.8.8
/usr/bin/suidperl
...

This could be run on Windows file systems mounted via Samba.


No responses yet

Comments RSS

Leave a Reply