{"id":2987,"date":"2018-11-09T22:38:23","date_gmt":"2018-11-09T22:38:23","guid":{"rendered":"https:\/\/www.appservgrid.com\/paw92\/?p=2987"},"modified":"2018-11-12T01:42:40","modified_gmt":"2018-11-12T01:42:40","slug":"removing-duplicate-path-entries-linux-journal","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/11\/09\/removing-duplicate-path-entries-linux-journal\/","title":{"rendered":"Removing Duplicate PATH Entries | Linux Journal"},"content":{"rendered":"<p>The goal here is to remove duplicate entries from the PATH variable.<br \/>\nBut before I begin, let&#8217;s be clear: there&#8217;s no compelling reason to<br \/>\nto do this. The shell will, in essence, ignore duplicates PATH entries;<br \/>\nonly the first occurrence of any one path is important.<br \/>\nTwo motivations drive this exercise.<br \/>\nThe first is to look at an awk one-liner that initially<br \/>\ndoesn&#8217;t really appear to do much at all.<br \/>\nThe second is to feed the needs of those who are annoyed by<br \/>\nsuch things as having duplicate PATH entries.<\/p>\n<p>I first had the urge to do this when working with <a href=\"https:\/\/www.cygwin.com\">Cygwin<\/a>.<br \/>\nOn Windows, which puts almost every executable in a different<br \/>\ndirectory, your PATH variable quickly can become overwhelming,<br \/>\nso removing duplicates makes it slightly less confusing<br \/>\nwhen you&#8217;re trying to decipher what&#8217;s actually in your PATH variable.<\/p>\n<p>Your first thought about how to this might be to break up the path<br \/>\ninto the individual elements with sed and<br \/>\nthen pass that through sort and uniq to get rid of duplicates.<br \/>\nBut you&#8217;d quickly realize that that doesn&#8217;t work, since you&#8217;ve<br \/>\nnow reordered the paths, and you don&#8217;t want that. You want to keep<br \/>\nthe paths in their original order, just with duplicates removed.<\/p>\n<p>The original idea for this was not mine. I found the basic<br \/>\ncode for it on the internet. I don&#8217;t remember exactly where, but<br \/>\nI believe it was on <a href=\"https:\/\/stackexchange.com\">Stack Exchange<\/a>.<br \/>\nThe original bash\/awk code was something like this:<\/p>\n<p>PATH=$(echo $PATH | awk -v RS=: -v ORS=: &#8216;!($0 in a) &#8216;)<\/p>\n<p>And it&#8217;s close. It almost works, but before looking at the output,<br \/>\nlet&#8217;s look at why\/how it works.<br \/>\nTo do that, first notice the -v options. Those set the input<br \/>\nand output Record Separator variables that awk uses to separate<br \/>\nthe input data into individual records of data<br \/>\nand how to reassemble them on output.<br \/>\nThe default is to separate them by newlines\u2014that is, each<br \/>\nline of input is a separate record.<br \/>\nInstead of newlines, let&#8217;s use colons as the separators,<br \/>\nwhich gives each of the individual paths in the PATH variable<br \/>\nas a separate record.<br \/>\nYou can see how this works in the following where you change only<br \/>\nthe input separator and leave the output separator as the newline,<br \/>\nand come up with a simple awk one-liner to print each of the elements<br \/>\nof the path on a separate line:<\/p>\n<p><b>$ cat showpath.sh<\/b><br \/>\nexport PATH=\/usr\/bin:\/bin:\/usr\/local\/bin:\/usr\/bin:\/bin<br \/>\nawk -v RS=: &#8221; &lt;&lt;&lt;$PATH<\/p>\n<p><b>$ bash showpath.sh<\/b><br \/>\n\/usr\/bin<br \/>\n\/bin<br \/>\n\/usr\/local\/bin<br \/>\n\/usr\/bin<br \/>\n\/bin<\/p>\n<p>So, back to the original code.<br \/>\nTo help understand it, let&#8217;s make it look at bit more awkish by reformatting<br \/>\nit so that it has the more normal pattern { action }<br \/>\nor condition { action } look to it:<\/p>\n<p>!($0 in a) {<br \/>\na[$0];<br \/>\nprint<br \/>\n}<\/p>\n<p>The condition here is !($0 in a).<br \/>\nIn this, $0 is the current input record, and a is an awk variable<br \/>\n(the use of the in operator, tells you that a is an array).<br \/>\nRemember, each input record is an individual path from the PATH variable.<br \/>\nThe part inside the parentheses, $0 in a tests to see if the path<br \/>\nis in the array a.<br \/>\nThe exclamation and the parentheses are to negate the condition.<br \/>\nSo, if the current path is not in a, the action executes.<br \/>\nIf the current path is in a, the action doesn&#8217;t execute,<br \/>\nand since that&#8217;s all there is to the script, nothing happens in that case.<\/p>\n<p>If the current path is not in the array,<br \/>\nthe code in the action uses the path as a key to<br \/>\nreference into the array.<br \/>\nIn awk, arrays are associative arrays, and referencing a<br \/>\nnon-existent element in an associate array automatically creates<br \/>\nthe element.<br \/>\nBy creating the element in the array, you&#8217;ve now set the array so<br \/>\nthat the next time you see the same path element, your condtiion !($0 in a)<br \/>\nwill fail and the acton will not execute.<br \/>\nIn other words the action will execute only the first time that you see a path.<br \/>\nAnd finally, after referencing the array, you print the current path,<br \/>\nand awk automatically adds the output separtor.<br \/>\nNote that an empty print is equivalent to print $0.<br \/>\nLet&#8217;s see it in action:<\/p>\n<p><b>$ cat nodupes.sh<\/b><br \/>\nexport PATH=\/usr\/bin:\/bin:\/usr\/local\/bin:\/usr\/bin:\/bin<br \/>\necho $PATH | awk -v RS=: -v ORS=: &#8216;!($0 in a) &#8216;<\/p>\n<p><b>$ bash nodupes.sh<\/b><br \/>\n\/usr\/bin:\/bin:\/usr\/local\/bin:\/bin<br \/>\n:<\/p>\n<p>As I said, it almost works.<br \/>\nThe only problem is there&#8217;s an extra newline and an extra colon on<br \/>\nthe following line.<br \/>\nThe extra newline comes from the fact that echo is adding a newline<br \/>\nonto the end of the path, and since awk is not treating newlines as<br \/>\nseparators, it gets added to the end of the last path,<br \/>\nwhich, in this case, causes it to look like awk failed to remove a duplicate.<br \/>\nBut awk doesn&#8217;t see them as duplicates, it sees<br \/>\n\/bin and \/binn.<br \/>\nYou can eliminate the trailing newline by using the -n option to echo:<\/p>\n<p><b>$ cat nodupes2.sh<\/b><br \/>\nexport PATH=\/usr\/bin:\/bin:\/usr\/local\/bin:\/usr\/bin:\/bin<br \/>\necho -n $PATH | awk -v RS=: -v ORS=: &#8216;!($0 in a) &#8216;<\/p>\n<p><b>$ bash nodupes2.sh<\/b><br \/>\n\/usr\/bin:\/bin:\/usr\/local\/bin:<\/p>\n<p>And you&#8217;re almost there, except for the trailing colon, which is not actually<br \/>\na problem. Empty PATH elements will be ignored, but since you&#8217;ve come this<br \/>\nfar on this somewhat pointless journey, you might as well go the distance.<br \/>\nTo fix the problem, use awk&#8217;s printf command rather than print.<br \/>\nUnlike print, printf does not automatically include output record separators,<br \/>\nso you have to output them yourself:<\/p>\n<p><b>$ cat nodupes3.sh<\/b><br \/>\nexport PATH=\/usr\/bin:\/bin:\/usr\/local\/bin:\/usr\/bin:\/bin<br \/>\necho -n $PATH | awk -v RS=: &#8216;!($0 in a) &#8216;<\/p>\n<p><b>$ bash nodupes3.sh<\/b><br \/>\n\/usr\/bin:\/bin:\/usr\/local\/bin<\/p>\n<p>You may be a bit confused by this at first glance.<br \/>\nRather than eliminating the trailing separtor,<br \/>\nyou&#8217;ve reversed the logic, and you&#8217;re outputting the separator first,<br \/>\nthen the PATH element, so instead of needing to eliminate the<br \/>\ntrailing separator, you need to suppress a leading separator.<br \/>\nThe record separator is output by the first %s format specifier<br \/>\nand comes from the length(a) &gt; 1 ? &#8220;:&#8221; : &#8220;&#8221;,<br \/>\nso it is only printed when there&#8217;s more than one element in the array<br \/>\n(that is, the second and subsequent times).<\/p>\n<p>As I said at the outset, there&#8217;s no reason you have to remove<br \/>\nduplicate path entries; they cause no harm.<br \/>\nHowever, for some, the simple fact that they are there is<br \/>\nreason enough to eliminate them.<\/p>\n<p><a href=\"https:\/\/www.linuxjournal.com\/content\/removing-duplicate-path-entries\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>The goal here is to remove duplicate entries from the PATH variable. But before I begin, let&#8217;s be clear: there&#8217;s no compelling reason to to do this. The shell will, in essence, ignore duplicates PATH entries; only the first occurrence of any one path is important. Two motivations drive this exercise. The first is to &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/11\/09\/removing-duplicate-path-entries-linux-journal\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Removing Duplicate PATH Entries | Linux Journal&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-2987","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/2987","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/comments?post=2987"}],"version-history":[{"count":1,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/2987\/revisions"}],"predecessor-version":[{"id":3194,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/2987\/revisions\/3194"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/media?parent=2987"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/categories?post=2987"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/tags?post=2987"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}