{"id":1970,"date":"2018-10-30T08:01:56","date_gmt":"2018-10-30T08:01:56","guid":{"rendered":"https:\/\/www.appservgrid.com\/paw92\/?p=1970"},"modified":"2018-10-31T09:24:13","modified_gmt":"2018-10-31T09:24:13","slug":"10-practical-grep-command-examples-in-linux","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/10\/30\/10-practical-grep-command-examples-in-linux\/","title":{"rendered":"10 Practical Grep Command Examples in Linux"},"content":{"rendered":"<p><em>Brief: The grep command is used to find patterns in files. This tutorial shows some of the most common grep command examples that would be specifically beneficial for software developers.<\/em><\/p>\n<p>Recently, I started working <em>with <\/em><a href=\"https:\/\/asciidoctor.org\/docs\/asciidoctor.js\/\">Asciidoctor.js<\/a> and <em>on<\/em> the <a href=\"https:\/\/github.com\/s-leroux\/asciidoctor.js-pug\">Asciidoctor.js-pug<\/a> and <a href=\"https:\/\/github.com\/asciidoctor\/asciidoctor-template.js\">Asciidoctor-templates.js<\/a> project. It is not always easy to be immediately effective when you dig for the first time into a codebase containing several thousand of lines. But my secret weapon to find my way through so many code lines is the <a href=\"https:\/\/linux.die.net\/man\/1\/grep\">grep<\/a> tool.<\/p>\n<p>I am going to share with you how to use grep command in Linux with examples.<\/p>\n<h2>Using grep commands in Linux<\/h2>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/i1.wp.com\/linuxhandbook.com\/wp-content\/uploads\/2018\/03\/grep-command-examples-e1522402930582.png?resize=702%2C395&amp;ssl=1\" alt=\"Grep command example\" width=\"702\" height=\"395\" \/><\/p>\n<p>If you look into the <a href=\"https:\/\/linux.die.net\/man\/1\/man\">man<\/a>, you will see that short description for the grep tool: <em>\u201cprint lines matching a pattern.\u201d<\/em> However, don\u2019t be fooled by such humble definition: grep is one of the most useful tools in the Unix toolbox and there are countless occasions to use it as soon as you work with text files.<\/p>\n<p>It is always better to have real-world examples to learn how things work. So, I will use the <a href=\"https:\/\/github.com\/asciidoctor\/asciidoctor.js\">Asciidoctor.js source tree<\/a> to illustrate some of the grep capabilities. You can download that source tree from GitHub, and if you want, you may even check out the same changeset I used when writing this article. That will ensure you obtain results perfectly identical to those described in the rest of this article:<\/p>\n<p>git clone https:\/\/github.com\/asciidoctor\/asciidoctor.js<br \/>\ncd asciidoctor.js<br \/>\ngit checkout v1.5.6-rc.1<\/p>\n<h3>1. Find all occurrences of a string (basic usage)<\/h3>\n<p>Asciidoctor.js is supporting the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Nashorn_(JavaScript_engine)\">Nashorn JavaScript engine<\/a> for the Java platform. I do not know Nashorn, so I could take that opportunity to learn more about it by exploring the project parts referencing that JavaScript engine.<\/p>\n<p>As a starting point, I checked if there were some settings related to Nashorn in the package.json file describing the project dependencies:<\/p>\n<p>sh$ grep nashorn package.json<br \/>\n&#8220;test&#8221;: &#8220;node npm\/test\/builder.js &amp;&amp; node npm\/test\/unsupported-features.js &amp;&amp; node npm\/test\/jasmine-browser.js &amp;&amp; node npm\/test\/jasmine-browser-min.js &amp;&amp; node npm\/test\/jasmine-node.js &amp;&amp; node npm\/test\/jasmine-webpack.js &amp;&amp; npm run test:karmaBrowserify &amp;&amp; npm run test:karmaRequirejs &amp;&amp; node npm\/test\/nashorn.js&#8221;,<\/p>\n<p>Yes, apparently there was some Nashorn-specific tests. So, let\u2019s investigate that a little bit more.<\/p>\n<h3>2. Case insensitive search in a file set<\/h3>\n<p>Now, I want to have a closer look at the files from the .\/npm\/test\/ directory mentioning explicitly Nashorn. A case-insensitive search (-i option) is probably better here since I need to find both references to nashorn and Nashorn (or any other combination of upper- and lower-case characters):<\/p>\n<p>sh$ grep -i nashorn npm\/test\/*.js<br \/>\nnpm\/test\/nashorn.js:const nashornModule = require(&#8216;..\/module\/nashorn&#8217;);<br \/>\nnpm\/test\/nashorn.js:log.task(&#8216;Nashorn&#8217;);<br \/>\nnpm\/test\/nashorn.js:nashornModule.nashornRun(&#8216;jdk1.8.0&#8217;);<\/p>\n<p>Indeed case insensitivity was useful here. Otherwise, I would have missed the require(&#8216;..\/module\/nashorn&#8217;) statement. No doubt I should examine that file in greater details later.<\/p>\n<h3>3. Find non-matching files<\/h3>\n<p>By the way, is there some non-Nashorm specific files in the npm\/test\/ directory? To answer that question, we can use the \u201cprint non-matching files\u201d option of grep (-L option):<\/p>\n<p>sh$ grep -iL nashorn npm\/test\/*<br \/>\nnpm\/test\/builder.js<br \/>\nnpm\/test\/jasmine-browser-min.js<br \/>\nnpm\/test\/jasmine-browser.js<br \/>\nnpm\/test\/jasmine-node.js<br \/>\nnpm\/test\/jasmine-webpack.js<br \/>\nnpm\/test\/unsupported-features.js<\/p>\n<p>Notice how with the -L option the output of grep has changed to display only filenames. So, none of the files above contain the string \u201cnashorn\u201d (regardless of the case). That does not mean they are not somehow related to that technology, but at least, the letters \u201cn-a-s-h-o-r-n\u201d are not present.<\/p>\n<h3>4. Finding patterns into hidden files and recursively into sub-directories<\/h3>\n<p>The last two commands used a shell <a href=\"https:\/\/en.wikipedia.org\/wiki\/Glob_(programming)\">glob pattern<\/a> to pass the list of files to examine to the grep command. However, this has some inherent limitations: the star (*) will not match hidden files. Neither it will match files (eventually) contained in sub-directories.<\/p>\n<p>A solution would be to combine grep with the <a href=\"https:\/\/linux.die.net\/man\/1\/find\">find<\/a> command instead of relying on a shell glob pattern:<\/p>\n<p># This is not efficient as it will spawn a new grep process for each file<br \/>\n$ find npm\/test\/ -type f -exec grep -iL nashorn {} ;<br \/>\n# This may have issues with filenames containing space-like characters<br \/>\ngrep -iL nashorn $(find npm\/test\/ -type f)<\/p>\n<p>As I mentioned it as comments it the code block above, each of these solutions has drawbacks. Concerning filenames containing space-like characters, I let you investigate the grep -z option which, combined with the -print0 option of the find command, can mitigate that issue. Don\u2019t hesitate to use the comment section at the end of this article to share your ideas on that topic!<\/p>\n<p>Nevertheless, a better solution would use the \u201crecursive\u201d (-r) option of grep. With that option, you give on the command line the root of your search tree (the starting directory) instead of the explicit list of filenames to examine. With the -r option, grep will examine all files in the search directory, including hidden ones, and then it will recursively descend into any sub-directory:<\/p>\n<p>grep -irL nashorn npm\/test\/npm\/<br \/>\nnpm\/test\/builder.js<br \/>\nnpm\/test\/jasmine-browser-min.js<br \/>\nnpm\/test\/jasmine-browser.js<br \/>\nnpm\/test\/jasmine-node.js<br \/>\nnpm\/test\/jasmine-webpack.js<br \/>\nnpm\/test\/unsupported-features.js<\/p>\n<p>Actually, with that option, I could also start my exploration one level above to see in there are non-npm tests that target Nashorn too:<\/p>\n<p>sh$ grep -irL nashorn npm\/<\/p>\n<p>I let you test that command by yourself to see its outcome; but as a hint, I can say you should find many more matching files!<\/p>\n<h3>5. Filtering files by their name (using regular expressions)<\/h3>\n<p>So, there seems to have some Nashorn specific tests in that project. Since Nashorn is Java, another question that could be raised would be <em>\u201cis there some Java source files in the project explicitly mentioning Nashorn?\u201d<\/em>.<\/p>\n<p>Depending the version of grep you use, there are at least two solutions to answer that question. The first one is to use grep to find all files containing the pattern \u201cnashorn\u201d, then pipe the output of that first command to a second grep instance filtering out non-java source files:<\/p>\n<p>sh $grep -ir nashorn .\/ | grep &#8220;^[^:]*.java&#8221;<br \/>\n.\/spec\/nashorn\/AsciidoctorConvertWithNashorn.java:public class AsciidoctorConvertWithNashorn {<br \/>\n.\/spec\/nashorn\/AsciidoctorConvertWithNashorn.java: ScriptEngine engine = engineManager.getEngineByName(&#8220;nashorn&#8221;);<br \/>\n.\/spec\/nashorn\/AsciidoctorConvertWithNashorn.java: engine.eval(new FileReader(&#8220;.\/spec\/nashorn\/asciidoctor-convert.js&#8221;));<br \/>\n.\/spec\/nashorn\/BasicJavascriptWithNashorn.java:public class BasicJavascriptWithNashorn {<br \/>\n.\/spec\/nashorn\/BasicJavascriptWithNashorn.java: ScriptEngine engine = engineManager.getEngineByName(&#8220;nashorn&#8221;);<br \/>\n.\/spec\/nashorn\/BasicJavascriptWithNashorn.java: engine.eval(new FileReader(&#8220;.\/spec\/nashorn\/basic.js&#8221;));<\/p>\n<p>The first half of the command should be understandable by now. But what about that \u201c^[^:]*\\.java\u201d part?<\/p>\n<p>Unless you specify the -F option, grep assumes the search pattern is a <a href=\"https:\/\/en.wikipedia.org\/wiki\/Regular_expression\">regular expression<\/a>. That means, in addition to plain characters that will match verbatim, you have access to a set of metacharacter to describe more complex patterns. The pattern I used above will only match:<\/p>\n<ul>\n<li>^ the start of the line<\/li>\n<li>[^:]* followed by a sequence of any characters except a colon<\/li>\n<li>. followed by a dot (the dot has a special meaning in <em>regex<\/em>, so I had to protect it with a backslash to express I want a literal match)<\/li>\n<li>java and followed by the four letters \u201cjava.\u201d<\/li>\n<\/ul>\n<p>In practice, since grep will use a colon to separate the filename from the context, I keep only lines having .java in the filename section. Worth mention it <em>would<\/em> match also .javascript filenames. This is something I let try solving by yourself if you want.<\/p>\n<h3>6. Filtering files by their name using grep<\/h3>\n<p>Regular expressions are extremely powerful. However, in that particular case, it seems overkill. Not mentioning with the above solution, we spend time examining all files in search for the \u201cnashorn\u201d pattern\u2014 most of the results being discarded by the second step of the pipeline.<\/p>\n<p>If you are using the GNU version of grep, something which is likely if you are using Linux, you have another solution though with the &#8211;include option. This instructs grep to search only into files whose name is matching the given glob pattern:<\/p>\n<p>sh$ grep -ir nashorn .\/ &#8211;include=&#8217;*.java&#8217;<br \/>\n.\/spec\/nashorn\/AsciidoctorConvertWithNashorn.java:public class AsciidoctorConvertWithNashorn {<br \/>\n.\/spec\/nashorn\/AsciidoctorConvertWithNashorn.java: ScriptEngine engine = engineManager.getEngineByName(&#8220;nashorn&#8221;);<br \/>\n.\/spec\/nashorn\/AsciidoctorConvertWithNashorn.java: engine.eval(new FileReader(&#8220;.\/spec\/nashorn\/asciidoctor-convert.js&#8221;));<br \/>\n.\/spec\/nashorn\/BasicJavascriptWithNashorn.java:public class BasicJavascriptWithNashorn {<br \/>\n.\/spec\/nashorn\/BasicJavascriptWithNashorn.java: ScriptEngine engine = engineManager.getEngineByName(&#8220;nashorn&#8221;);<br \/>\n.\/spec\/nashorn\/BasicJavascriptWithNashorn.java: engine.eval(new FileReader(&#8220;.\/spec\/nashorn\/basic.js&#8221;));<\/p>\n<h3>7. Finding words<\/h3>\n<p>The interesting thing about the Asciidoctor.js project is it is a multi-language project. At its core, Asciidoctor is written in Ruby, so, to be usable in the JavaScript world, it has to be \u201ctranspiled\u201d using <a href=\"https:\/\/opalrb.com\/\">Opal<\/a>, a Ruby to JavaScript source-to-source compiler. Another technology I did not know about before.<\/p>\n<p>So, after having examined the Nashorn specificities, I assigned to myself the task of better understanding the Opal API. As the first step in that quest, I searched all mentions of the Opal global object in the JavaScript files of the project. It could appear in affectations (Opal =), member access (Opal.) or maybe even in other contexts. A regular expression would do the trick. However, once again, grep has some more lightweight solution to solve that common use case. Using the -w option, it will match only <em>words<\/em>, that is patterns preceded and followed by a non-word character. A non-word character is either the begin of the line, the end of the line, or any character that is neither a letter, nor a digit, nor an underscore:<\/p>\n<p>sh$ grep -irw &#8211;include=&#8217;*.js&#8217; Opal .<br \/>\n&#8230;<\/p>\n<h3>8. coloring the output<\/h3>\n<p>I did not copy the output of the previous command since there are many matches. When the output is dense like that, you may wish to add a little bit of color to ease understanding. If this is not already configured by default on your system, you can activate that feature using the GNU &#8211;color option:<\/p>\n<p>sh $grep -irw &#8211;color=auto &#8211;include=&#8217;*.js&#8217; Opal .<br \/>\n&#8230;<\/p>\n<p>You should obtain the same long result as before, but this time the search string should appear in color if it was not already the case.<\/p>\n<h3>9. Counting matching lines or matching files<\/h3>\n<p>I mentioned twice the output of the previous commands was very long. How long exactly?<\/p>\n<p>sh$ grep -irw &#8211;include=&#8217;*.js&#8217; Opal . | wc -l<br \/>\n86<\/p>\n<p>That means we have a <em>total<\/em> 86 matching lines in <em>all<\/em> the examined files. However, how many different files are matching? With the -l option you can limit the grep output the matching <em>files<\/em> instead of displaying matching <em>lines<\/em>. So that simple change will tell how many files are matching:<\/p>\n<p>sh$ grep -irwl &#8211;include=&#8217;*.js&#8217; Opal . | wc -l<br \/>\n20<\/p>\n<p>If that reminds you of the -L option, no surprise: as it is relatively common, lowercase\/uppercase are used to distinguish complementary options. -l displays matching filenames. -L displays non-matching filenames. For another example, I let you check the manual for the -h\/-H options.<\/p>\n<p>Let\u2019s close that parenthesis and go back to our results: 86 matching lines. 20 matching files. However, how are distributed the matching <em>lines<\/em> in the matching <em>files<\/em>? We can know that using the -c option of grep that will count the number of matching lines per examined file (including files with zero matches):<\/p>\n<p>grep -irwc &#8211;include=&#8217;*.js&#8217; Opal .<br \/>\n&#8230;<\/p>\n<p>Often, That output needs some post-processing since it displays its results in the order in which the files were examined, and it also includes files without any match\u2014 something that usually does not interest us. That latter is quite easy to solve:<\/p>\n<p>grep -irwc &#8211;include=&#8217;*.js&#8217; Opal . | grep -v &#8216;:0$&#8217;<\/p>\n<p>As about ordering things, you may add the sort command at the end of the pipeline:<\/p>\n<p>sh$ grep -irwc &#8211;include=&#8217;*.js&#8217; Opal . | grep -v &#8216;:0$&#8217; | sort -t: -k2n<\/p>\n<p>I let you check the <a href=\"https:\/\/linux.die.net\/man\/1\/sort\">sort<\/a> command manual for the exact meaning of the options I used. Don\u2019t forget to share your findings using the comment section below!<\/p>\n<h3>10. Finding the difference between two matching sets<\/h3>\n<p>If you remember, few commands ago, I searched for the <em>word<\/em> \u201cOpal.\u201d However, if I search in the same file set for all occurrence of the <em>string<\/em> \u201cOpal,\u201d I obtain about twenty more answers:<\/p>\n<p>sh$ grep -irw &#8211;include=&#8217;*.js&#8217; Opal . | wc -l<br \/>\n86<br \/>\nsh$ grep -ir &#8211;include=&#8217;*.js&#8217; Opal . | wc -l<br \/>\n105<\/p>\n<p>Finding the difference between those two sets would be interesting. So, what are the lines containing the four letters \u201copal\u201d in a row, but where those four letters do not form an entire word?<\/p>\n<p>This is not that easy to answer that question. Because the <em>same<\/em> line can contains <em>both<\/em> the word Opal as well as some larger word containing those four letters. But as a first approximation, you may use that pipeline:<\/p>\n<p>sh$ grep -ir &#8211;include=&#8217;*.js&#8217; Opal . | grep -ivw Opal<br \/>\n.\/npm\/examples.js: const opalBuilder = OpalBuilder.create();<br \/>\n.\/npm\/examples.js: opalBuilder.appendPaths(&#8216;build\/asciidoctor\/lib&#8217;);<br \/>\n.\/npm\/examples.js: opalBuilder.appendPaths(&#8216;lib&#8217;);<br \/>\n&#8230;<\/p>\n<p>Apparently, my next stop would be to investigate the opalBuilder object, but that will be for another day.<\/p>\n<h3>The last word<\/h3>\n<p>Of course, you will not understand a project organization, even less the code architecture, by just issuing a couple of grep commands! However, I find that command unavoidable to identify benchmarks and starting points when exploring a new codebase. So, I hope this article helped you to understand the power of the grep command and that you will add it to your tool chest. No doubt you will not regret it!<\/p>\n<p><a href=\"http:\/\/lxer.com\/module\/newswire\/ext_link.php?rid=262272\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Brief: The grep command is used to find patterns in files. This tutorial shows some of the most common grep command examples that would be specifically beneficial for software developers. Recently, I started working with Asciidoctor.js and on the Asciidoctor.js-pug and Asciidoctor-templates.js project. It is not always easy to be immediately effective when you dig &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/10\/30\/10-practical-grep-command-examples-in-linux\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;10 Practical Grep Command Examples in Linux&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-1970","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/1970","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/comments?post=1970"}],"version-history":[{"count":1,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/1970\/revisions"}],"predecessor-version":[{"id":2126,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/1970\/revisions\/2126"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/media?parent=1970"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/categories?post=1970"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/tags?post=1970"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}