文章作者:Tyan
博客:noahsnail.com | CSDN | 簡書
1. 引言
在Linux系統處理數據時,經常會遇到刪除重複文件的問題。例如,在進行圖片分類任務時,希望刪除訓練數據中的重複圖片。在Linux系統中,存在一個fdupes
命令可以查找並刪除重複文件。
2. Fdupes介紹
Fdupes是Adrian Lopez用C語言編寫的Linux實用程序,它能夠在給定的目錄和子目錄集中找到重複文件,Fdupes通過比較文件的MD5簽名然後進行字節比較來識別重複文件。其比較順序爲:
大小比較 > 部分MD5簽名比較 > 完整MD5簽名比較 > 字節比較
3. 安裝fdupes
以CentOS系統爲例,fdupes
的安裝命令爲:
sudo yum install -y fdupes
4. fdupes的使用
刪除重複文件,並且不需要詢問用戶:
$ fdupes -dN [folder_name]
其中,-d
參數表示保留一個文件,並刪除其它重複文件,-N
與-d
一起使用,表示保留第一個重複文件並刪除其它重複文件,不需要提示用戶。
使用說明:
$ fdupes -h
Usage: fdupes [options] DIRECTORY...
-r --recurse for every directory given follow subdirectories
encountered within
-R --recurse: for each directory given after this option follow
subdirectories encountered within (note the ':' at
the end of the option, manpage for more details)
-s --symlinks follow symlinks
-H --hardlinks normally, when two or more files point to the same
disk area they are treated as non-duplicates; this
option will change this behavior
-n --noempty exclude zero-length files from consideration
-A --nohidden exclude hidden files from consideration
-f --omitfirst omit the first file in each set of matches
-1 --sameline list each set of matches on a single line
-S --size show size of duplicate files
-m --summarize summarize dupe information
-q --quiet hide progress indicator
-d --delete prompt user for files to preserve and delete all
others; important: under particular circumstances,
data may be lost when using this option together
with -s or --symlinks, or when specifying a
particular directory more than once; refer to the
fdupes documentation for additional information
-N --noprompt together with --delete, preserve the first file in
each set of duplicates and delete the rest without
prompting the user
-I --immediate delete duplicates as they are encountered, without
grouping into sets; implies --noprompt
-p --permissions don't consider files with different owner/group or
permission bits as duplicates
-o --order=BY select sort order for output and deleting; by file
modification time (BY='time'; default), status
change time (BY='ctime'), or filename (BY='name')
-i --reverse reverse order while sorting
-v --version display fdupes version
-h --help display this help message