{"id":12170,"date":"2019-03-22T18:43:22","date_gmt":"2019-03-22T18:43:22","guid":{"rendered":"http:\/\/www.appservgrid.com\/paw92\/?p=12170"},"modified":"2019-03-22T18:43:22","modified_gmt":"2019-03-22T18:43:22","slug":"how-to-choose-the-best-open-source-software","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw92\/index.php\/2019\/03\/22\/how-to-choose-the-best-open-source-software\/","title":{"rendered":"How to Choose the Best Open Source Software"},"content":{"rendered":"<h1 class=\"graf graf--h3 graf--leading graf--title\"><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" style=\"font-size: 1rem;\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*Dsb3sI5IRCnbgzFZIXbXUQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*Dsb3sI5IRCnbgzFZIXbXUQ.jpeg\" \/><\/h1>\n<p id=\"7fd0\" class=\"graf graf--p graf-after--figure\">After reading the O\u2019Reilly book \u201c<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/shop.oreilly.com\/product\/0636920161417.do\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/shop.oreilly.com\/product\/0636920161417.do\"><strong class=\"markup--strong markup--p-strong\">Foundations for Architecting Data Solutions<\/strong><\/a>\u201d, by Ted Malaska and Jonathan Seidman, I reflected on how I chose\u00a0<strong class=\"markup--strong markup--p-strong\">software\/tools\/solutions\u00a0<\/strong>in the past and how I should choose them going forward.<\/p>\n<p id=\"31d4\" class=\"graf graf--p graf-after--p\">As a bioinformatician you need to be able to quickly discern whether a publication\/tool is really a major advancement or just marginally better. I\u2019m not just talking about the newest single-cell RNA-seq technique or\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/omicsomics.blogspot.com\/2015\/08\/the-road-to-hell-is-paved-with.html?spref=tw&amp;m=1\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/omicsomics.blogspot.com\/2015\/08\/the-road-to-hell-is-paved-with.html?spref=tw&amp;m=1\">another file format<\/a>, but for every\u00a0<strong class=\"markup--strong markup--p-strong\">problem\u00a0<\/strong>case you have. Whether that be data visualization tools,\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/moldach.github.io\/xaringan-presentation_drake\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/moldach.github.io\/xaringan-presentation_drake\/\">presentation tools<\/a>, distributed storage systems\u00a0<em class=\"markup--em markup--p-em\">etc.<\/em><\/p>\n<p id=\"e5b6\" class=\"graf graf--p graf-after--p\">It\u2019s not just about how useful the tool may be, it also depends on the quality of the documentation, how simple it is to install, where it sits in the open-source life cycle,\u00a0<em class=\"markup--em markup--p-em\">etc.<\/em><\/p>\n<figure id=\"dba6\" class=\"graf graf--figure graf-after--p\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*9nMBMt-OugnruBr_M-WuEQ.png\" data-width=\"500\" data-height=\"283\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"41\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*9nMBMt-OugnruBr_M-WuEQ.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*9nMBMt-OugnruBr_M-WuEQ.png\" \/><\/div>\n<\/div>\n<\/figure>\n<p id=\"528b\" class=\"graf graf--p graf-after--figure\">Xkcd is funny but competing standards aren\u2019t. Don\u2019t believe me? Just look at how many\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/pditommaso\/awesome-pipeline\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/pditommaso\/awesome-pipeline\">pipeline tools<\/a>\u00a0exist!<\/p>\n<blockquote id=\"03e7\" class=\"graf graf--blockquote graf-after--p\"><p>When faced with so many options how can one choose the solutions that fit their need?<\/p><\/blockquote>\n<h3 id=\"fce2\" class=\"graf graf--h3 graf-after--blockquote\">Why open\u00a0source?<\/h3>\n<p id=\"2632\" class=\"graf graf--p graf-after--h3\">I\u2019ve worked with a few licensed software solutions in the past; for example,\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.blast2go.com\/blast2go-pro\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.blast2go.com\/blast2go-pro\">BLAST2GO<\/a>\u00a0(<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/github.com\/moldach\/Transcriptome_Assembly-Annotation\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/moldach\/Transcriptome_Assembly-Annotation\">plug: use dammit from Camille Scott instead!<\/a>),\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.mathworks.com\/products\/matlab.html\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.mathworks.com\/products\/matlab.html\">Matlab<\/a>, and an image stitching software called\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/www.kolor.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/www.kolor.com\/\">Autopano Giga<\/a>\u00a0(now defunct). One of my greatest frustrations was learning these tools only to later change roles and no longer have them available. As a consultant for the Department of Fisheries and Oceans the prohibitive cost of a Matlab was what pushed me over the edge into learning another high-level programming language\u200a\u2014\u200aR. FWIW:<\/p>\n<blockquote id=\"3148\" class=\"graf graf--blockquote graf--startsWithDoubleQuote graf-after--p\"><p>\u201c[Matlab] They obfuscate their source code in many cases, meaning bugs are much\u00a0<a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/uk.mathworks.com\/matlabcentral\/answers\/79714-how-do-we-know-that-matlabs-algorithms-are-working-properly\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/uk.mathworks.com\/matlabcentral\/answers\/79714-how-do-we-know-that-matlabs-algorithms-are-working-properly\">harder to spot<\/a>\u00a0and impossible to\u00a0<a class=\"markup--anchor markup--blockquote-anchor\" href=\"http:\/\/stackoverflow.com\/questions\/2470765\/can-i-distribute-my-matlab-program-as-open-source\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/stackoverflow.com\/questions\/2470765\/can-i-distribute-my-matlab-program-as-open-source\">edit ourselves without risking court action<\/a>. Moreover, using Matlab for science results in\u00a0<a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/github.com\/openjournals\/joss\/issues\/142\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/github.com\/openjournals\/joss\/issues\/142\">paywalling our code<\/a>. We are by definition making our computational science closed.\u201d\u200a\u2014\u200aexcerpt from\u00a0<a class=\"markup--anchor markup--blockquote-anchor\" href=\"http:\/\/neuroplausible.com\/matlab\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/neuroplausible.com\/matlab\"><strong class=\"markup--strong markup--blockquote-strong\">I Hate Matlab: How an IDE, a Language, and a Mentality Harm<\/strong><\/a><\/p><\/blockquote>\n<p id=\"4156\" class=\"graf graf--p graf-after--blockquote\">Most companies eschew third party solutions or build their product as a hybrid of proprietary and open-source to keep their costs lower. For example, Amazon Web Services (AWS) offers it\u2019s Simple Storage Service (Amazon S3) for a fee but is built upon open source software like Apache Hadoop. I\u2019m not saying\u00a0<em class=\"markup--em markup--p-em\">not\u00a0<\/em>to use AWS (or any other cloud provider) because sometimes you are constrained to\u00a0<em class=\"markup--em markup--p-em\">having\u00a0<\/em>to; I actually used AWS for a project (<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S1874778717303422\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S1874778717303422\">transcriptome assembly of a coral species<\/a>) with Docker. Currently I\u2019m working with sensitive information that must be kept on-site, under lock-and-key, so alternative solutions are used.<\/p>\n<p id=\"6ff4\" class=\"graf graf--p graf-after--p\">Most of the newer big data platforms, and successful open-source projects began as internal projects at companies or universities for the first couple years before going through an external incubation phase. For example:<\/p>\n<ul class=\"postList\">\n<li id=\"d97a\" class=\"graf graf--li graf-after--p\">LinkedIn\u200a\u2014\u200a\u201c<strong class=\"markup--strong markup--li-strong\">Apache Kafka<\/strong>\u201d<\/li>\n<li id=\"dbf1\" class=\"graf graf--li graf-after--li\">University of California at Berkeley\u200a\u2014\u200a\u201c<strong class=\"markup--strong markup--li-strong\">Apache Spark<\/strong>\u201d<\/li>\n<li id=\"a355\" class=\"graf graf--li graf-after--li\">Cloudera\u200a\u2014\u200a\u201c<strong class=\"markup--strong markup--li-strong\">Impala<\/strong>\u201d<\/li>\n<li id=\"cae2\" class=\"graf graf--li graf-after--li\">Yahoo!\u200a\u2014\u200a\u201c<strong class=\"markup--strong markup--li-strong\">Apache Hadoop<\/strong>\u201d<\/li>\n<li id=\"ac1b\" class=\"graf graf--li graf-after--li\">Google\u200a\u2014\u200a\u201c<strong class=\"markup--strong markup--li-strong\">Kubernetes\u201d<\/strong><\/li>\n<li id=\"2f4c\" class=\"graf graf--li graf-after--li\">Facebook\u200a\u2014\u200a\u201c<strong class=\"markup--strong markup--li-strong\">Apache Hive<\/strong>\u201d<\/li>\n<\/ul>\n<p id=\"fee5\" class=\"graf graf--p graf-after--li\">There are benefits to choosing open-source projects backed by solid sponsors with good reputation, solid devs, and track record of sponsoring successful projects. You can be fairly confident that these projects have a solid codebase, great documentation, received session time at conferences, and considerable public recognition (through blog posts and articles surrounding it).<\/p>\n<p id=\"4324\" class=\"graf graf--p graf-after--p\">When considering open-source solutions it\u2019s also important to gauge where they are in the\u00a0<em class=\"markup--em markup--p-em\">open-source life cycle.\u00a0<\/em>According to Malaska and Seidman, there are nine (potential) stages in the project life cycle based on the\u00a0<em class=\"markup--em markup--p-em\">Garnter Hype Cycle<\/em>; however, I think only a few are relevant to discuss here:<\/p>\n<h3 id=\"d95a\" class=\"graf graf--h3 graf-after--p\">Which Cycle Should You\u00a0Choose?<\/h3>\n<figure id=\"ff18\" class=\"graf graf--figure graf-after--h3\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*z_sloB4XJ1Mdc9rUJgs1BA.jpeg\" data-width=\"2763\" data-height=\"1839\" data-action=\"zoom\" data-action-value=\"1*z_sloB4XJ1Mdc9rUJgs1BA.jpeg\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"48\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*z_sloB4XJ1Mdc9rUJgs1BA.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*z_sloB4XJ1Mdc9rUJgs1BA.jpeg\" \/><\/div>\n<\/div><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<h4 id=\"2ce6\" class=\"graf graf--h4 graf-after--figure\">Don\u2019t believe the\u00a0hype<\/h4>\n<p id=\"23dc\" class=\"graf graf--p graf-after--h4\">This stage of the cycle is referred to as the \u201c<em class=\"markup--em markup--p-em\">curing cancer\u201d\u00a0<\/em>stage. The hype at this stage is important for attracting committers and contributors but unless you\u2019re looking to help out in a major way you should steer clear. Unless you\u2019re trying to be on the cutting edge (risk tolerance), or take on an active role as a contributor, it\u2019s best to wait 6\u201312 months before trying any new technology. By letting others hit walls first you\u2019ll encounter fewer bugs and have access to better documentation and blog posts.<\/p>\n<h4 id=\"d3a4\" class=\"graf graf--h4 graf-after--p\">A broken promise is not a\u00a0lie<\/h4>\n<p id=\"61fb\" class=\"graf graf--p graf-after--h4\">After the \u201c<em class=\"markup--em markup--p-em\">curing cancer\u201d\u00a0<\/em>stage is the\u00a0<em class=\"markup--em markup--p-em\">broken promises\u00a0<\/em>stage. At this point people are using the project and are finding issues or limitations. For example, a solution may not integrate nicely with other existing systems or there may be problems with scaleability. You should treat any open source project at this stage with cautious optimism.<\/p>\n<h4 id=\"59cd\" class=\"graf graf--h4 graf-after--p\">Go for dependable solutions whenever\u00a0possible<\/h4>\n<p id=\"3661\" class=\"graf graf--p graf-after--h4\">Projects in the\u00a0<em class=\"markup--em markup--p-em\">hardening\u00a0<\/em>or\u00a0<em class=\"markup--em markup--p-em\">enterprise\u00a0<\/em>stage have become mature technologies. The amount of commits will signal the level of investment in a project. Tthe type of commits tell a story, telling where the author(s) are trying to go with the code, revealing what they want to do by signalling interest in different features of the project. By now the initial excitement has died down and there is more demand for stability than new features. The initial development team may be working on other projects as it has developed a solid community\u200a\u2014\u200athis is often a good sign of success of a project.<\/p>\n<p id=\"9a0c\" class=\"graf graf--p graf-after--p\">Obviously recent activity signals that the project is alive and maintained. Remember that there are many dead and abandoned projects living on Github. That being said, activity doesn\u2019t always need to be<em class=\"markup--em markup--p-em\">\u00a0very<\/em>\u00a0recent! One prolific, \u201cRockstar Dev\u201d, put it this way:<\/p>\n<blockquote id=\"5ed5\" class=\"graf graf--blockquote graf-after--p\"><p>Context-switching is expensive, so if I worked on many packages at the same time, I\u2019d never get anything done. Instead, at any point in time, most of my packages are lying fallow, steadily accumulating issues and ideas for new feature. Once a critical mass has accumulated, I\u2019ll spend a couple of days on the package.\u200a\u2014\u200a<a class=\"markup--anchor markup--blockquote-anchor\" href=\"https:\/\/www.quora.com\/How-is-Hadley-Wickham-able-to-contribute-so-much-to-R-particularly-in-the-form-of-packages\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.quora.com\/How-is-Hadley-Wickham-able-to-contribute-so-much-to-R-particularly-in-the-form-of-packages\"><em class=\"markup--em markup--blockquote-em\">Hadley Wickham<\/em><\/a><\/p><\/blockquote>\n<p id=\"3263\" class=\"graf graf--p graf-after--blockquote\">Eventually projects enter the\u00a0<em class=\"markup--em markup--p-em\">decline\u00a0<\/em>stage and no one wants to adopt or contribute to a dead or dying project.<\/p>\n<h3 id=\"9bc0\" class=\"graf graf--h3 graf-after--p\">Can i trust\u00a0you?<\/h3>\n<figure id=\"1258\" class=\"graf graf--figure graf-after--h3\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*0_1arxZEvDUKjpr0QtDa4A.jpeg\" data-width=\"3841\" data-height=\"2561\" data-action=\"zoom\" data-action-value=\"1*0_1arxZEvDUKjpr0QtDa4A.jpeg\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*0_1arxZEvDUKjpr0QtDa4A.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*0_1arxZEvDUKjpr0QtDa4A.jpeg\" \/><\/div>\n<\/div><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<p id=\"67a1\" class=\"graf graf--p graf-after--figure\">I use R mostly so let me talk about where a project is hosted for a few moments. Code is often hosted on Github,\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/ropensci.github.io\/dev_guide\/softwarereviewintro.html\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/ropensci.github.io\/dev_guide\/softwarereviewintro.html\">ROpenSci<\/a>,\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/blog.revolutionanalytics.com\/2015\/08\/a-short-introduction-to-bioconductor.html\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/blog.revolutionanalytics.com\/2015\/08\/a-short-introduction-to-bioconductor.html\">Bioconductor<\/a>\u00a0or\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/cran.r-project.org\/web\/packages\/submission_checklist.html\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/cran.r-project.org\/web\/packages\/submission_checklist.html\">CRAN<\/a>. The\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/cran.r-project.org\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/cran.r-project.org\/\">Comprehensive R Archive Network<\/a>\u00a0(CRAN)\u00a0<em class=\"markup--em markup--p-em\">was<\/em>\u00a0the main repository for R packages.<\/p>\n<blockquote id=\"ec5b\" class=\"graf graf--blockquote graf--startsWithDoubleQuote graf-after--p\"><p>\u201cAs R users, we are spoiled. Early in the history of R, Kurt Hornik and Friedrich Leisch built support for\u00a0<em class=\"markup--em markup--blockquote-em\">packages<\/em>\u00a0right into R, and started the Comprehensive R Archive Network (CRAN). And R and CRAN had a fantastic run with. Roughly twenty years later, we are looking at over 12,000 packages which can (generally) be installed with absolute ease and no suprises. No other (relevant) open source language has anything of comparable rigour and quality.\u201d\u200a\u2014\u200aexcerpt from\u00a0<a class=\"markup--anchor markup--blockquote-anchor\" href=\"http:\/\/dirk.eddelbuettel.com\/blog\/2018\/02\/28\/#017_dependencies\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/dirk.eddelbuettel.com\/blog\/2018\/02\/28\/#017_dependencies\">Dirk Eddelbuettel<\/a><\/p><\/blockquote>\n<p id=\"b69b\" class=\"graf graf--p graf-after--blockquote\">On CRAN packages of almost any type are welcome (as long as strict policies are met) and packages are tested daily (on multiple systems).\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/milesmcbain.xyz\/ropensci-onboarding1\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/milesmcbain.xyz\/ropensci-onboarding1\/\">rOpenSci is the perfect antithesis of CRAN. CRAN can be notoriously opaque, inconsistent, and aloof<\/a>. It cannot deal with the volume of automation of CRAN but markets itself in terms of quality.<\/p>\n<p id=\"45c7\" class=\"graf graf--p graf-after--p\">For the field of Bioinformatics Bioconductor is where a package will end up. Projects that exist solely on Github should be viewed with more caution as they have no checklists or peer-review.<\/p>\n<h3 id=\"2c0c\" class=\"graf graf--h3 graf-after--p\">Let\u2019s talk about dependencies (a loaded topic\u200a\u2014\u200ano pun intended)<\/h3>\n<p id=\"0caf\" class=\"graf graf--p graf-after--h3\">Installing dependencies sucks!\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/research.swtch.com\/deps\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/research.swtch.com\/deps\">How often have you installed one package only to have a boatload pulled-in?<\/a>\u00a0You should try and avoid packages with many (changing) packages as this will be prohibitive to establish if your work is correct (hence ensuring reproducibility) because\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/dirk.eddelbuettel.com\/blog\/2019\/03\/14\/#020_dependency_badges\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/dirk.eddelbuettel.com\/blog\/2019\/03\/14\/#020_dependency_badges\">dependencies are hard to manage risks<\/a>.<\/p>\n<blockquote id=\"4746\" class=\"graf graf--blockquote graf--startsWithDoubleQuote graf-after--p\"><p><em class=\"markup--em markup--blockquote-em\">\u201c<\/em><a class=\"markup--anchor markup--blockquote-anchor\" href=\"http:\/\/dirk.eddelbuettel.com\/blog\/2019\/03\/14\/#020_dependency_badges\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/dirk.eddelbuettel.com\/blog\/2019\/03\/14\/#020_dependency_badges\"><em class=\"markup--em markup--blockquote-em\">More dependencies means more edges between more nodes.<\/em>\u00a0Which eventually means more breakage.<\/a>\u201d<\/p><\/blockquote>\n<p id=\"d106\" class=\"graf graf--p graf-after--blockquote\">Proponents of the\u00a0<em class=\"markup--em markup--p-em\">tinyverse\u00a0<\/em>tend to stay away from bloated dependencies,\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/medium.freecodecamp.org\/why-im-not-using-your-github-repository-2dff6c7ac7cf\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.freecodecamp.org\/why-im-not-using-your-github-repository-2dff6c7ac7cf\">no one wants to spend time in hell!<\/a><\/p>\n<p id=\"fb5d\" class=\"graf graf--p graf-after--p\">If you\u2019re a developer remember:<\/p>\n<blockquote id=\"7f42\" class=\"graf graf--blockquote graf--startsWithDoubleQuote graf-after--p\"><p>\u201c<a class=\"markup--anchor markup--blockquote-anchor\" href=\"http:\/\/www.win-vector.com\/blog\/2019\/03\/software-dependencies-and-risk\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/www.win-vector.com\/blog\/2019\/03\/software-dependencies-and-risk\/\">Not all dependencies are equal\u00a0\u2026 some popular packages [have] unstable APIs (a history of breaking changes) and high historic error rates (a history of complexity and adding features over fixing things).<\/a>\u201d<\/p><\/blockquote>\n<p id=\"3ed1\" class=\"graf graf--p graf-after--blockquote\">You can also include a\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/gitlab.com\/edwindj\/crandeps\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/gitlab.com\/edwindj\/crandeps\">badge for your repo showing the number of dependencies your package relies on<\/a><\/p>\n<h3 id=\"8d94\" class=\"graf graf--h3 graf-after--p\">Transparency is\u00a0good<\/h3>\n<figure id=\"e615\" class=\"graf graf--figure graf-after--h3\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*pPAwTWTT9yHQo1cQ956qNQ.jpeg\" data-width=\"3872\" data-height=\"2592\" data-action=\"zoom\" data-action-value=\"1*pPAwTWTT9yHQo1cQ956qNQ.jpeg\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*pPAwTWTT9yHQo1cQ956qNQ.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*pPAwTWTT9yHQo1cQ956qNQ.jpeg\" \/><\/div>\n<\/div>\n<\/figure>\n<p id=\"8039\" class=\"graf graf--p graf-after--figure\">When looking at projects on Github you should look for people\/packages with many stars, watchers, forks, contributors,\u00a0<em class=\"markup--em markup--p-em\">etc.\u00a0<\/em>These visible cues of community support indicate the community cares about a person, project, or action and that many others would benefit from it.<\/p>\n<p id=\"4048\" class=\"graf graf--p graf-after--p\">Remember that the amount of commits, issues and pull-requests (PRs) can be a signal of investment and commitment to a project. Are the issues and PRs being dealt with? The latter is literally an\u00a0<em class=\"markup--em markup--p-em\">offer\u00a0<\/em>of code that is being ignored rather than accepted, rejected or commented upon.<\/p>\n<p id=\"684a\" class=\"graf graf--p graf-after--p\">By following the actions on code, you can determine who founded the project, what happened across different releases and make inferences about the structure of the project and collaborator roles (who had expertise on which pieces of the system). Linked commits and issues communicates the reasoning behind a change to the code.<\/p>\n<p id=\"0a4a\" class=\"graf graf--p graf-after--p\">You can also gauge community interest by looking at the number of\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/www.meetup.com\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.meetup.com\/\">meetups,<\/a>and conferences (and their attendance levels)\u00a0, or at email lists, user groups, community forums\u00a0<em class=\"markup--em markup--p-em\">et<\/em>c.<\/p>\n<p id=\"4b37\" class=\"graf graf--p graf-after--p\">Google trends can also be a good measure of the level of interest in projects or technologies.<\/p>\n<figure id=\"ef39\" class=\"graf graf--figure graf-after--p\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*Swv5pqDoOdmzR4brQ6HRWA.png\" data-width=\"660\" data-height=\"360\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"40\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*Swv5pqDoOdmzR4brQ6HRWA.png\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*Swv5pqDoOdmzR4brQ6HRWA.png\" \/><\/div>\n<\/div><figcaption class=\"imageCaption\"><a class=\"markup--anchor markup--figure-anchor\" href=\"http:\/\/lindeloev.net\/spss-is-dying\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/lindeloev.net\/spss-is-dying\/\">Using metrics to track the decline of\u00a0SPSS<\/a><\/figcaption><\/figure>\n<h3 id=\"43f5\" class=\"graf graf--h3 graf-after--figure\">Things to look\u00a0for<\/h3>\n<figure id=\"71ee\" class=\"graf graf--figure graf-after--h3\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*JC-K-v7Kq3HmMZgxpvAvYA.jpeg\" data-width=\"5184\" data-height=\"3456\" data-action=\"zoom\" data-action-value=\"1*JC-K-v7Kq3HmMZgxpvAvYA.jpeg\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"50\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*JC-K-v7Kq3HmMZgxpvAvYA.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/1600\/1*JC-K-v7Kq3HmMZgxpvAvYA.jpeg\" \/><\/div>\n<\/div><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<ul class=\"postList\">\n<li id=\"6421\" class=\"graf graf--li graf-after--figure\">Easy to install<\/li>\n<li id=\"83fa\" class=\"graf graf--li graf-after--li\">Easy to run<\/li>\n<li id=\"4984\" class=\"graf graf--li graf-after--li\">Are there issues and PR raised<\/li>\n<\/ul>\n<p id=\"99d5\" class=\"graf graf--p graf-after--li\">Is the owner taking care of them (fixing bugs, helping users, adding features)? or was it abandoned?<\/p>\n<ul class=\"postList\">\n<li id=\"021c\" class=\"graf graf--li graf-after--p\"><a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1006561\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1006561\">How good is the documentation\/vignettes<\/a>\u00a0(<a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/academic.oup.com\/bib\/article\/19\/4\/693\/2907814\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/academic.oup.com\/bib\/article\/19\/4\/693\/2907814\">and here<\/a>)<\/li>\n<\/ul>\n<p id=\"b25c\" class=\"graf graf--p graf-after--li\">Does it list hardware requirements (RAM and disk size), example commands, toy data, example output, screenshots\/recordit\u2019s<\/p>\n<ul class=\"postList\">\n<li id=\"7384\" class=\"graf graf--li graf-after--p\"><a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/travis-ci.org\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/travis-ci.org\/\">Continuous integration status<\/a><\/li>\n<li id=\"183d\" class=\"graf graf--li graf-after--li\">Does it have a\u00a0<a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1002598\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/journals.plos.org\/ploscompbiol\/article?id=10.1371\/journal.pcbi.1002598\">LICENSE<\/a><\/li>\n<li id=\"c869\" class=\"graf graf--li graf-after--li\">Does it have a\u00a0<a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/gist.github.com\/PurpleBooth\/b24679402957c63ec426\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/gist.github.com\/PurpleBooth\/b24679402957c63ec426\">CONTRIBUTING<\/a>\u00a0doc<\/li>\n<li id=\"3960\" class=\"graf graf--li graf-after--li\">Does it have tests<\/li>\n<li id=\"d197\" class=\"graf graf--li graf-after--li\">Does it have a\u00a0<code class=\"markup--code markup--li-code\">Dockerfile<\/code><\/li>\n<li id=\"b7d9\" class=\"graf graf--li graf-after--li\"><a class=\"markup--anchor markup--li-anchor\" href=\"https:\/\/www.r-bloggers.com\/all-the-badges-one-can-earn-parsing-badges-of-cran-packages-readmes\/\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/www.r-bloggers.com\/all-the-badges-one-can-earn-parsing-badges-of-cran-packages-readmes\/\">Does it have badges<\/a><\/li>\n<\/ul>\n<p id=\"5c30\" class=\"graf graf--p graf-after--li\">For other things to look for see\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/itnext.io\/what-i-dont-like-in-your-repo-a602577a526b\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/itnext.io\/what-i-dont-like-in-your-repo-a602577a526b\">here<\/a>\u00a0and\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/medium.freecodecamp.org\/why-im-not-using-your-github-repository-2dff6c7ac7cf\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/medium.freecodecamp.org\/why-im-not-using-your-github-repository-2dff6c7ac7cf\">here<\/a>.<\/p>\n<h3 id=\"aa37\" class=\"graf graf--h3 graf-after--p\">Bench-marking<\/h3>\n<p id=\"fa65\" class=\"graf graf--p graf-after--h3\">If you\u2019re a software developer and considering incorporating one of a number of competing technologies you can perform internal benchmarks with your use cases and data.<\/p>\n<p id=\"dbd6\" class=\"graf graf--p graf-after--p\">If you\u2019re using R there is different levels of magnification that a benchmark can provide. For a macro analysis (when computation is more intensive) you should use the\u00a0<code class=\"markup--code markup--p-code\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/cran.r-project.org\/web\/packages\/rbenchmark\/index.html\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/cran.r-project.org\/web\/packages\/rbenchmark\/index.html\">rbenchmark<\/a><\/code>\u00a0package. For microscopic timing comparisons (<em class=\"markup--em markup--p-em\">e.g.\u00a0<\/em>nanoseconds elapsed) use the\u00a0<code class=\"markup--code markup--p-code\"><a class=\"markup--anchor markup--p-anchor\" href=\"https:\/\/cran.r-project.org\/web\/packages\/microbenchmark\/index.html\" target=\"_blank\" rel=\"noopener\" data-href=\"https:\/\/cran.r-project.org\/web\/packages\/microbenchmark\/index.html\">microbenchmark<\/a><\/code>\u00a0package<\/p>\n<p id=\"2bcb\" class=\"graf graf--p graf-after--p\">Sometimes other consortium&#8217;s will have already done the bench-marking for you (for example\u00a0<a class=\"markup--anchor markup--p-anchor\" href=\"http:\/\/assemblathon.org\/\" target=\"_blank\" rel=\"noopener\" data-href=\"http:\/\/assemblathon.org\/\">\u201c<strong class=\"markup--strong markup--p-strong\">The Assemblathon\u201d<\/strong><\/a><strong class=\"markup--strong markup--p-strong\">)\u00a0.\u00a0<\/strong>Nonetheless, one should be aware of hidden, or motivated biases trying to make unfair comparisons (use cases for which one tool clearly has an advantage). Also understand that testers could have been making an honest attempt at a fair test but made misunderstandings which lead to invalid results. Therefore it\u2019s important to perform your own internal benchmarking and hold others benchmarks to an open standard of repeatability and verification.<\/p>\n<h3 id=\"3787\" class=\"graf graf--h3 graf-after--p\">Final words<\/h3>\n<p id=\"af6c\" class=\"graf graf--p graf-after--h3 graf--trailing\">Ultimately choosing a software solution comes down to the requirements of your project (the timeline, budget, and so forth), how willing are you to be on the cutting-edge (risk tolerance), and how capable team-members will be able to master these solutions based on their skill levels (internal skill set). Then, test out the solutions before fully committing. This job can be given to\u00a0<em class=\"markup--em markup--p-em\">the prototyper\u00a0<\/em>role on your team; the person who likes experimenting\/investigating new software.<\/p>\n<p>\u00a0<a href=\"https:\/\/towardsdatascience.com\/how-to-choose-the-best-open-source-software-b1cbbe4f6398\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>After reading the O\u2019Reilly book \u201cFoundations for Architecting Data Solutions\u201d, by Ted Malaska and Jonathan Seidman, I reflected on how I chose\u00a0software\/tools\/solutions\u00a0in the past and how I should choose them going forward. As a bioinformatician you need to be able to quickly discern whether a publication\/tool is really a major advancement or just marginally better. &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw92\/index.php\/2019\/03\/22\/how-to-choose-the-best-open-source-software\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;How to Choose the Best Open Source Software&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-12170","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/12170","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/comments?post=12170"}],"version-history":[{"count":1,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/12170\/revisions"}],"predecessor-version":[{"id":12171,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/12170\/revisions\/12171"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/media?parent=12170"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/categories?post=12170"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/tags?post=12170"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}