Linux刪除重複文件

文章作者:Tyan
博客:noahsnail.com  |  CSDN  |  簡書

1. 引言

在Linux系統處理數據時,經常會遇到刪除重複文件的問題。例如,在進行圖片分類任務時,希望刪除訓練數據中的重複圖片。在Linux系統中,存在一個fdupes命令可以查找並刪除重複文件。

2. Fdupes介紹

Fdupes是Adrian Lopez用C語言編寫的Linux實用程序,它能夠在給定的目錄和子目錄集中找到重複文件,Fdupes通過比較文件的MD5簽名然後進行字節比較來識別重複文件。其比較順序爲:

大小比較 > 部分MD5簽名比較 > 完整MD5簽名比較 > 字節比較

3. 安裝fdupes

以CentOS系統爲例,fdupes的安裝命令爲:

sudo yum install -y fdupes

4. fdupes的使用

刪除重複文件,並且不需要詢問用戶:

$ fdupes -dN [folder_name]

其中,-d參數表示保留一個文件,並刪除其它重複文件,-N-d一起使用,表示保留第一個重複文件並刪除其它重複文件,不需要提示用戶。

使用說明:

$ fdupes -h
Usage: fdupes [options] DIRECTORY...

 -r --recurse           for every directory given follow subdirectories
                        encountered within
 -R --recurse:          for each directory given after this option follow
                        subdirectories encountered within (note the ':' at
                        the end of the option, manpage for more details)
 -s --symlinks          follow symlinks
 -H --hardlinks         normally, when two or more files point to the same
                        disk area they are treated as non-duplicates; this
                        option will change this behavior
 -n --noempty           exclude zero-length files from consideration
 -A --nohidden          exclude hidden files from consideration
 -f --omitfirst         omit the first file in each set of matches
 -1 --sameline          list each set of matches on a single line
 -S --size              show size of duplicate files
 -m --summarize         summarize dupe information
 -q --quiet             hide progress indicator
 -d --delete            prompt user for files to preserve and delete all
                        others; important: under particular circumstances,
                        data may be lost when using this option together
                        with -s or --symlinks, or when specifying a
                        particular directory more than once; refer to the
                        fdupes documentation for additional information
 -N --noprompt          together with --delete, preserve the first file in
                        each set of duplicates and delete the rest without
                        prompting the user
 -I --immediate         delete duplicates as they are encountered, without
                        grouping into sets; implies --noprompt
 -p --permissions       don't consider files with different owner/group or
                        permission bits as duplicates
 -o --order=BY          select sort order for output and deleting; by file
                        modification time (BY='time'; default), status
                        change time (BY='ctime'), or filename (BY='name')
 -i --reverse           reverse order while sorting
 -v --version           display fdupes version
 -h --help              display this help message

參考資料

  1. https://www.tecmint.com/fdupes-find-and-delete-duplicate-files-in-linux/
  2. https://www.howtoing.com/fdupes-find-and-delete-duplicate-files-in-linux
  3. http://www.runoob.com/linux/linux-comm-who.html
發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章