{"id":8316,"date":"2019-01-18T15:53:40","date_gmt":"2019-01-18T15:53:40","guid":{"rendered":"https:\/\/www.appservgrid.com\/paw92\/?p=8316"},"modified":"2019-03-08T22:49:52","modified_gmt":"2019-03-08T22:49:52","slug":"back-to-basics-sort-and-uniq-linux-com","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw92\/index.php\/2019\/01\/18\/back-to-basics-sort-and-uniq-linux-com\/","title":{"rendered":"Back to Basics: Sort and Uniq | Linux.com"},"content":{"rendered":"<div class=\"col-sm-12 bs-region bs-region--top\">\n<div class=\"field field--name-node-title field--type-ds field--label-hidden field--item\">\n<h1><img decoding=\"async\" src=\"https:\/\/www.linuxjournal.com\/sites\/default\/files\/styles\/360_250\/public\/nodeimage\/story\/bigstock--187641571_2.jpg?itok=5xdscQ-t\" alt=\"&quot;&quot;\" \/><\/h1>\n<p><em>Learn the fundamentals of sorting and de-duplicating text on the command line.<\/em><\/p>\n<p>If you&#8217;ve been using the command line for a long time, it&#8217;s easy to take the commands you use every day for granted. But, if you&#8217;re new to the Linux command line, there are several commands that make your life easier that you may not stumble upon automatically. In this article, I cover the basics of two commands that are essential in anyone&#8217;s arsenal:\u00a0<code>sort<\/code>\u00a0and\u00a0<code>uniq<\/code>.<\/p>\n<p>The\u00a0<code>sort<\/code>\u00a0command does exactly what it says: it takes text data as input and outputs sorted data. There are many scenarios on the command line when you may need to sort output, such as the output from a command that doesn&#8217;t offer sorting options of its own (or the sort arguments are obscure enough that you just use the\u00a0<code>sort<\/code>\u00a0command instead). In other cases, you may have a text file full of data (perhaps generated with some other script), and you need a quick way to view it in a sorted form.<\/p>\n<p>Let&#8217;s start with a file named &#8220;test&#8221; that contains three lines:<\/p>\n<pre><code>\r\nFoo\r\nBar\r\nBaz\r\n<\/code><\/pre>\n<p><code>sort<\/code>\u00a0can operate either on STDIN redirection, the input from a pipe, or, in the case of a file, you also can just specify the file on the command. So, the three following commands all accomplish the same thing:<\/p>\n<pre><code>\r\ncat test | sort\r\nsort &lt; test\r\nsort test\r\n<\/code><\/pre>\n<p>And the output that you get from all of these commands is:<\/p>\n<pre><code>\r\nBar\r\nBaz\r\nFoo\r\n<\/code><\/pre>\n<h3>Sorting Numerical Output<\/h3>\n<p>Now, let&#8217;s complicate the file by adding three more lines:<\/p>\n<pre><code>\r\nFoo\r\nBar\r\nBaz\r\n1. ZZZ\r\n2. YYY\r\n11. XXX\r\n<\/code><\/pre>\n<p>If you run one of the above\u00a0<code>sort<\/code>\u00a0commands again, this time, you&#8217;ll see different output:<\/p>\n<pre><code>\r\n11. XXX\r\n1. ZZZ\r\n2. YYY\r\nBar\r\nBaz\r\nFoo\r\n<\/code><\/pre>\n<p>This is likely not the output you wanted, but it points out an important fact about\u00a0<code>sort<\/code>. By default, it sorts alphabetically, not numerically. This means that a line that starts with &#8220;11.&#8221; is sorted above a line that starts with &#8220;1.&#8221;, and all of the lines that start with numbers are sorted above lines that start with letters.<\/p>\n<p>To sort numerically, pass\u00a0<code>sort<\/code>\u00a0the\u00a0<code>-n<\/code>\u00a0option:<\/p>\n<pre><code>\r\nsort -n test\r\n\r\nBar\r\nBaz\r\nFoo\r\n1. ZZZ\r\n2. YYY\r\n11. XXX\r\n<\/code><\/pre>\n<h3>Find the Largest Directories on a Filesystem<\/h3>\n<p>Numerical sorting comes in handy for a lot of command-line output\u2014in particular, when your command contains a tally of some kind, and you want to see the largest or smallest in the tally. For instance, if you want to find out what files are using the most space in a particular directory and you want to dig down recursively, you would run a command like this:<\/p>\n<pre><code>\r\ndu -ckx\r\n<\/code><\/pre>\n<p>This command dives recursively into the current directory and doesn&#8217;t traverse any other mountpoints inside that directory. It tallies the file sizes and then outputs each directory in the order it found them, preceded by the size of the files underneath it in kilobytes. Of course, if you&#8217;re running such a command, it&#8217;s probably because you want to know which directory is using the\u00a0<em>most<\/em>\u00a0space, and this is where\u00a0<code>sort<\/code>comes in:<\/p>\n<pre><code>\r\ndu -ckx | sort -n\r\n<\/code><\/pre>\n<p>Now you&#8217;ll get a list of all of the directories underneath the current directory, but this time sorted by file size. If you want to get even fancier, pipe its output to the\u00a0<code>tail<\/code>\u00a0command to see the top ten. On the other hand, if you wanted the largest directories to be at the top of the output, not the bottom, you would add the<code>-r<\/code>\u00a0option, which tells\u00a0<code>sort<\/code>\u00a0to reverse the order. So to get the top ten (well, top eight\u2014the first line is the total, and the next line is the size of the current directory):<\/p>\n<pre><code>\r\ndu -ckx | sort -rn | head\r\n<\/code><\/pre>\n<p>This works, but often people using the\u00a0<code>du<\/code>\u00a0command want to see sizes in more readable output than kilobytes. The\u00a0<code>du<\/code>\u00a0command offers the\u00a0<code>-h<\/code>\u00a0argument that provides &#8220;human-readable&#8221; output. So, you&#8217;ll see output like\u00a0<code>9.6G<\/code>\u00a0instead of\u00a0<code>10024764<\/code>\u00a0with the\u00a0<code>-k<\/code>\u00a0option. When you pipe that human-readable output to\u00a0<code>sort<\/code>\u00a0though, you won&#8217;t get the results you expect by default, as it will sort 9.6G above 9.6K, which would be above 9.6M.<\/p>\n<p>The\u00a0<code>sort<\/code>\u00a0command has a\u00a0<code>-h<\/code>\u00a0option of its own, and it acts like\u00a0<code>-n<\/code>, but it&#8217;s able to parse standard human-readable numbers and sort them accordingly. So, to see the top ten largest directories in your current directory with human-readable output, you would type this:<\/p>\n<pre><code>\r\ndu -chx | sort -rh | head\r\n<\/code><\/pre>\n<h3>Removing Duplicates<\/h3>\n<p>The sort command isn&#8217;t limited to sorting one file. You might pipe multiple files into it or list multiple files as arguments on the command line, and it will combine them all and sort them. Unfortunately though, if those files contain some of the same information, you will end up with duplicates in the sorted output.<\/p>\n<p>To remove duplicates, you need the\u00a0<code>uniq<\/code>\u00a0command, which by default removes any duplicate lines that are adjacent to each other from its input and outputs the results. So, let&#8217;s say you had two files that were different lists of names:<\/p>\n<pre><code>\r\ncat namelist1.txt\r\nJones, Bob\r\nSmith, Mary\r\nBabbage, Walter\r\n\r\ncat namelist2.txt\r\nJones, Bob\r\nJones, Shawn\r\nSmith, Cathy\r\n<\/code><\/pre>\n<p>You could remove the duplicates by piping to\u00a0<code>uniq<\/code>:<\/p>\n<pre><code>\r\nsort namelist1.txt namelist2.txt | uniq\r\nBabbage, Walter\r\nJones, Bob\r\nJones, Shawn\r\nSmith, Cathy\r\nSmith, Mary\r\n<\/code><\/pre>\n<p>The\u00a0<code>uniq<\/code>\u00a0command has more tricks up its sleeve than this. It also can output only the duplicated lines, so you can find duplicates in a set of files quickly by adding the\u00a0<code>-d<\/code>\u00a0option:<\/p>\n<pre><code>\r\nsort namelist1.txt namelist2.txt | uniq -d\r\nJones, Bob\r\n<\/code><\/pre>\n<p>You even can have\u00a0<code>uniq<\/code>\u00a0provide a tally of how many times it has found each entry with the\u00a0<code>-c<\/code>\u00a0option:<\/p>\n<pre><code>\r\nsort namelist1.txt namelist2.txt | uniq -c\r\n1 Babbage, Walter\r\n2 Jones, Bob\r\n1 Jones, Shawn\r\n1 Smith, Cathy\r\n1 Smith, Mary\r\n<\/code><\/pre>\n<p>As you can see, &#8220;Jones, Bob&#8221; occurred the most times, but if you had a lot of lines, this sort of tally might be less useful for you, as you&#8217;d like the most duplicates to bubble up to the top. Fortunately, you have the\u00a0<code>sort<\/code>\u00a0command:<\/p>\n<pre><code>\r\nsort namelist1.txt namelist2.txt | uniq -c | sort -nr\r\n2 Jones, Bob\r\n1 Smith, Mary\r\n1 Smith, Cathy\r\n1 Jones, Shawn\r\n1 Babbage, Walter\r\n<\/code><\/pre>\n<h3>Conclusion<\/h3>\n<p>I hope these cases of using\u00a0<code>sort<\/code>\u00a0and\u00a0<code>uniq<\/code>\u00a0with realistic examples show you how powerful these simple command-line tools are. Half the secret with these foundational command-line tools is to discover (and remember) they exist so that they&#8217;ll be at your command the next time you run into a problem they can solve.<\/p>\n<\/div>\n<\/div>\n<p><a href=\"https:\/\/www.linux.com\/news\/back-basics-sort-and-uniq-1\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn the fundamentals of sorting and de-duplicating text on the command line. If you&#8217;ve been using the command line for a long time, it&#8217;s easy to take the commands you use every day for granted. But, if you&#8217;re new to the Linux command line, there are several commands that make your life easier that you &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw92\/index.php\/2019\/01\/18\/back-to-basics-sort-and-uniq-linux-com\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Back to Basics: Sort and Uniq | Linux.com&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-8316","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/8316","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/comments?post=8316"}],"version-history":[{"count":2,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/8316\/revisions"}],"predecessor-version":[{"id":10791,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/8316\/revisions\/10791"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/media?parent=8316"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/categories?post=8316"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/tags?post=8316"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}