{"id":12443,"date":"2019-03-27T01:20:47","date_gmt":"2019-03-27T01:20:47","guid":{"rendered":"http:\/\/www.appservgrid.com\/paw92\/?p=12443"},"modified":"2019-03-27T01:20:47","modified_gmt":"2019-03-27T01:20:47","slug":"2-useful-tools-to-find-and-delete-duplicate-files-in-linux","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw92\/index.php\/2019\/03\/27\/2-useful-tools-to-find-and-delete-duplicate-files-in-linux\/","title":{"rendered":"2 Useful Tools to Find and Delete Duplicate Files in Linux"},"content":{"rendered":"<p>Organizing your home directory or even system can be particularly hard if you have the habit of downloading all kinds of stuff from the internet.<\/p>\n<p>Often you may find you have downloaded the same mp3, pdf, epub (and all kind of other file extensions) and copied it to different directories. This may cause your directories to become cluttered with all kinds of useless duplicated stuff.<\/p>\n<p>In this tutorial you are going to learn how to find and delete duplicate files in Linux using\u00a0<strong>rdfind<\/strong>\u00a0and\u00a0<strong>fdupes<\/strong>command-line tools.<\/p>\n<p>A note of caution \u2013 always be careful what you delete on your system as this may lead to unwanted data loss. If you are using a new tool, first try it in a test directory where deleting files will not be a problem.<\/p>\n<h3>Rdfind \u2013 Finds Duplicate Files in Linux<\/h3>\n<p><strong>Rdfind<\/strong>\u00a0comes from redundant data find. It is a free tool used to find duplicate files across or within multiple directories. It uses checksum and find duplicates based on file contains not only names.<\/p>\n<p><strong>Rdfind<\/strong>\u00a0uses algorithm to classify the files and detects which of the duplicates is the original file and considers the rest as duplicates. The rules of ranking are:<\/p>\n<ul>\n<li>If\u00a0<strong>A<\/strong>\u00a0was found while scanning an input argument earlier than\u00a0<strong>B<\/strong>,\u00a0<strong>A<\/strong>\u00a0is higher ranked.<\/li>\n<li>If\u00a0<strong>A<\/strong>\u00a0was found at a depth lower than\u00a0<strong>B<\/strong>,\u00a0<strong>A<\/strong>\u00a0is higher ranked.<\/li>\n<li>If\u00a0<strong>A<\/strong>\u00a0was found earlier than\u00a0<strong>B<\/strong>,\u00a0<strong>A<\/strong>\u00a0is higher ranked.<\/li>\n<\/ul>\n<p>The last rule is used particularly when two files are found in the same directory.<\/p>\n<p>To install\u00a0<strong>rdfind<\/strong>\u00a0in Linux, use the following command as per your Linux distribution.<\/p>\n<pre>$ sudo apt-get install rdfind     [On <strong>Debian\/Ubuntu<\/strong>]\r\n$ sudo yum install epel-release &amp;&amp; $ sudo yum install rdfind    [On <strong>CentOS\/RHEL<\/strong>]\r\n$ sudo dnf install rdfind         [On <strong>Fedora 22+<\/strong>]\r\n<\/pre>\n<p>To run\u00a0<strong>rdfind<\/strong>\u00a0on a directory simply type\u00a0<strong>rdfind<\/strong>\u00a0and the target directory. Here is an example:<\/p>\n<pre>$ rdfind \/home\/user\r\n<\/pre>\n<div id=\"attachment_30913\" class=\"wp-caption aligncenter\">\n<p><a href=\"https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Find-Duplicate-Files-in-Linux.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-30913\" src=\"https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Find-Duplicate-Files-in-Linux.png\" sizes=\"auto, (max-width: 849px) 100vw, 849px\" srcset=\"https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Find-Duplicate-Files-in-Linux.png 849w, https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Find-Duplicate-Files-in-Linux-768x231.png 768w\" alt=\"Find Duplicate Files in Linux\" width=\"849\" height=\"255\" aria-describedby=\"caption-attachment-30913\" data-lazy-loaded=\"true\" \/><\/a><\/p>\n<p id=\"caption-attachment-30913\" class=\"wp-caption-text\">Find Duplicate Files in Linux<\/p>\n<\/div>\n<p>As you can see\u00a0<strong>rdfind<\/strong>\u00a0will save the results in file called\u00a0<strong>results.txt<\/strong>\u00a0located in the same directory from where you ran the program. The file contains all the duplicate files that rdfind has found. You can review the file and remove the duplicate files manually if you want to.<\/p>\n<p>Another thing you can do is to use the\u00a0<code>-dryrun<\/code>\u00a0option that will provide a list of duplicates without taking any actions:<\/p>\n<pre>$ rdfind -dryrun true \/home\/user\r\n<\/pre>\n<p>When you find the duplicates, you can choose to replace them with hardlinks.<\/p>\n<pre>$ rdfind -makehardlinks true \/home\/user\r\n<\/pre>\n<p>And if you wish to delete the duplicates you can run.<\/p>\n<pre>$ rdfind -deleteduplicates true \/home\/user\r\n<\/pre>\n<p>To check other useful options of\u00a0<strong>rdfind<\/strong>\u00a0you can use the\u00a0<strong>rdfind<\/strong>\u00a0manual with.<\/p>\n<pre>$ man rdfind \r\n<\/pre>\n<h3>Fdupes \u2013 Scan for Duplicate Files in Linux<\/h3>\n<p><a href=\"https:\/\/www.tecmint.com\/fdupes-find-and-delete-duplicate-files-in-linux\/\" target=\"_blank\" rel=\"noopener\">Fdupes<\/a>\u00a0is another program that allows you to identify duplicate files on your system. It is free and open source and written in C. It uses following methods to determine duplicate files:<\/p>\n<ul>\n<li>Comparing partial md5sum signatures<\/li>\n<li>Comparing full md5sum signatures<\/li>\n<li>Byte-by-byte comparison verification<\/li>\n<\/ul>\n<p>Just like\u00a0<strong>rdfind<\/strong>\u00a0it has similar options:<\/p>\n<ul>\n<li>Search recursively<\/li>\n<li>Exclude empty files<\/li>\n<li>Shows size of duplicate files<\/li>\n<li>Delete duplicates immediately<\/li>\n<li>Exclude files with different owner<\/li>\n<\/ul>\n<p><strong>Fdupes<\/strong>\u00a0syntax is similar to\u00a0<strong>rdfind<\/strong>. Simply type the command followed by the directory you wish to scan.<\/p>\n<pre>$ fdupes &lt;dir&gt;\r\n<\/pre>\n<p>To search files recursively, you will have to specify the\u00a0<code>-r<\/code>\u00a0option like this.<\/p>\n<pre>$ fdupes -r &lt;dir&gt;\r\n<\/pre>\n<p>You can also specify multiple directories and specify a\u00a0<strong>dir<\/strong>\u00a0to be searched recursively.<\/p>\n<pre>$ fdupes &lt;dir1&gt; -r &lt;dir2&gt;\r\n<\/pre>\n<p>To have fdupes calculate the size of the duplicate files use the\u00a0<code>-S<\/code>\u00a0option.<\/p>\n<pre>$ fdupes -S &lt;dir&gt;\r\n<\/pre>\n<p>To gather summarized information about the found files use the\u00a0<code>-m<\/code>\u00a0option.<\/p>\n<pre>$ fdupes -m &lt;dir&gt;\r\n<\/pre>\n<div id=\"attachment_30915\" class=\"wp-caption aligncenter\">\n<p><a href=\"https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Scan-Duplicate-Files-in-Linux.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-30915\" src=\"https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Scan-Duplicate-Files-in-Linux.png\" alt=\"Scan Duplicate Files in Linux\" width=\"499\" height=\"80\" aria-describedby=\"caption-attachment-30915\" data-lazy-loaded=\"true\" \/><\/a><\/p>\n<p id=\"caption-attachment-30915\" class=\"wp-caption-text\">Scan Duplicate Files in Linux<\/p>\n<\/div>\n<p>Finally if you want to delete all duplicates use the\u00a0<code>-d<\/code>\u00a0option like this.<\/p>\n<pre>$ fdupes -d &lt;dir&gt;\r\n<\/pre>\n<p><strong>Fdupes<\/strong>\u00a0will ask which of the found files to delete. You will need to enter the file number:<\/p>\n<div id=\"attachment_30916\" class=\"wp-caption aligncenter\">\n<p><a href=\"https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Delete-Duplicate-Files-in-Linux.png\"><img loading=\"lazy\" decoding=\"async\" class=\"size-full wp-image-30916\" src=\"https:\/\/www.tecmint.com\/wp-content\/uploads\/2018\/10\/Delete-Duplicate-Files-in-Linux.png\" alt=\"Delete Duplicate Files in Linux\" width=\"398\" height=\"94\" aria-describedby=\"caption-attachment-30916\" data-lazy-loaded=\"true\" \/><\/a><\/p>\n<p id=\"caption-attachment-30916\" class=\"wp-caption-text\">Delete Duplicate Files in Linux<\/p>\n<\/div>\n<p>A solution that is definitely not recommended is to use the\u00a0<code>-N<\/code>\u00a0option which will result in preserving the first file only.<\/p>\n<pre>$ fdupes -dN &lt;dir&gt;\r\n<\/pre>\n<p>To get a list of available options to use with\u00a0<strong>fdupes<\/strong>\u00a0review the help page by running.<\/p>\n<pre>$ fdupes -help\r\n<\/pre>\n<h5>Conclusion<\/h5>\n<p><strong>Rdfind<\/strong>\u00a0and\u00a0<strong>fdupes<\/strong>\u00a0are both very useful tools to find duplicated files on your Linux system, but you should be very careful when deleting such files.<\/p>\n<p>If you are unsure if you need a file or not, it would be better to create a backup of that file and remember its directory prior deleting it. If you have any questions or comments, please submit them in the comment section below.<\/p>\n<p><a href=\"https:\/\/www.tecmint.com\/find-and-delete-duplicate-files-in-linux\/\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Organizing your home directory or even system can be particularly hard if you have the habit of downloading all kinds of stuff from the internet. Often you may find you have downloaded the same mp3, pdf, epub (and all kind of other file extensions) and copied it to different directories. This may cause your directories &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw92\/index.php\/2019\/03\/27\/2-useful-tools-to-find-and-delete-duplicate-files-in-linux\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;2 Useful Tools to Find and Delete Duplicate Files in Linux&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-12443","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/12443","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/comments?post=12443"}],"version-history":[{"count":1,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/12443\/revisions"}],"predecessor-version":[{"id":12444,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/12443\/revisions\/12444"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/media?parent=12443"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/categories?post=12443"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/tags?post=12443"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}