shell删除重复文件

有一个任务,需要处理服务器上重复内容的文件,给出的思路是用md5来作验证,相信用md5sum验证之后应该有保障,如果是大文件的话,提取前面的1MB的内容作对比,如果md5码相同,就判定是相同的文件,在对去重的时候用了awk做了简单的处理,后来做删除的时候,投机取巧了下,用comm轻松搞定,最后吧操作过程记录到日志里。

[root@localhost ~]# more /var/log/DeleteMuplicate-20141006201221#### 2014-10-06 20:12:21 #### root #### Begin to Delet Duplicate File :/root/study/7/a.zip#### 2014-10-06 20:12:21 #### root #### Begin to Delet Duplicate File :/root/study/7/b#### 2014-10-06 20:12:21 #### root #### Begin to Delet Duplicate File :/root/study/7/b1.zip#### 2014-10-06 20:12:21 #### root #### Begin to Delet Duplicate File :/root/study/7/c1.zip

下面看下代码吧,测试有风险,本人不负责哦,测试之前最后做好备份哦

#!/bin/bashLogFile=DeleteMuplicatetime=$(date +%Y%m%d%H%M%S)LogFile=/var/log/DeleteMuplicate-$timeif [ $# != 1 ]then        echo "============================="        echo "============================="        echo "Usage: sh $0 path"        echo "Please Input Absolute Path"        echo "============================="        echo "============================="        exit 1fiecho "==============================="echo "==============================="echo "Begin To Analysis File Folder..."echo "==============================="echo "==============================="for i in `find $1 -type f`do        dd if=$i of=$i.md5 count=1 bs=1024k >/dev/null 2>&1        md5sum $i.md5 >> /tmp/result.txt        cd $1        rm -rf *.md5donefor i in `find $1 -type l`dodd if=$i of=$i.md5 count=1 bs=1024k >/dev/null 2>&1        md5sum $i.md5 >> /tmp/result.txt        cd $1        rm -rf *.md5donels -lh#cat /root/result.txtcat /tmp/result.txt | sort | awk '!i[$1]++'  | awk -F"  " '{print $2}' > /tmp/temp.txtcat /tmp/temp.txt | sed 's/.md5//' > /tmp/1sort /tmp/1 -o /tmp/1find $1 -type f  | sort > /tmp/2comm /tmp/1 /tmp/2 -3 > /tmp/temp-result.txtecho "================================"echo "================================"echo "Begin to Delete Duplicate File......"echo "================================"echo "================================"for i in `(cat /tmp/temp-result.txt)`do    echo "#### `date +%Y-%m-%d`  `date +%H:%M:%S`  #### `id -un` #### Begin to Record Duplicate File :$i" >> $LogFile    rm -rf $idoneecho "================================"echo "================================"echo "Record Duplicate File Has Finished......"echo "Begin To Delete Temp File......"rm -rf /tmp/result.txtrm -rf /tmp/2rm -rf /tmp/1rm -rf /tmp/temp.txtrm -rf /tmp/temp-result.txtecho "================================"echo "================================"echo "Script has done"[root@localhost ~]#
shell删除重复文件

相关文章:

你感兴趣的文章:

标签云: