{"id":742,"date":"2018-11-02T09:21:04","date_gmt":"2018-11-02T09:21:04","guid":{"rendered":"https:\/\/www.appservgrid.com\/paw93\/?p=742"},"modified":"2018-11-02T09:25:32","modified_gmt":"2018-11-02T09:25:32","slug":"comparing-10-container-monitoring-solutions-for-rancher","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw93\/index.php\/2018\/11\/02\/comparing-10-container-monitoring-solutions-for-rancher\/","title":{"rendered":"Comparing 10 Container Monitoring Solutions for Rancher"},"content":{"rendered":"<h5>Learn How Rancher 2.0 Solves Enterprise Kubernetes Challenges<\/h5>\n<p>Understand the comparitive advantage of Rancher 2.0 for DevOps teams, IT Admins, and Operations.<\/p>\n<p><a href=\"http:\/\/info.rancher.com\/451-report-rancher-goes-all-in-on-kubernetes\" target=\"blank\">Read the Report<\/a><\/p>\n<p>Container monitoring environments come in all shapes and sizes. Some are open source while others are commercial. Some are in the Rancher Catalog while others require manual configuration. Some are general purpose while others are aimed specifically at container environments. Some are hosted in the cloud while others require installation on own cluster hosts. In this post, I take an updated look at 10 container monitoring solutions. This effort builds on earlier work including Ismail Usman\u2019s<br \/>\n<a href=\"http:\/\/rancher.com\/comparing-monitoring-options-for-docker-deployments\/\">Comparing 7 Monitoring Options for Docker<\/a> from 2015 and <a href=\"http:\/\/rancher.com\/event\/great-container-monitoring-bake-off-october-online-meetup\/\">The Great Container Monitoring Bake Off Meetup<\/a> in October of 2016. The number of monitoring solutions is daunting. New solutions are coming on the scene continuously, and existing solutions evolve in functionality. Rather than looking at each solution in depth, I\u2019ve taken the approach of drawing high-level comparisons. With this approach, readers can hopefully \u201cnarrow the list\u201d and do more serious<br \/>\nevaluations of solutions best suited to their own needs.<\/p>\n<p>The monitoring solutions covered here include:<\/p>\n<ul>\n<li><a href=\"https:\/\/docs.docker.com\/engine\/reference\/commandline\/stats\/\">Native<br \/>\nDocker<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/google\/cadvisor\">cAdvisor<\/a><\/li>\n<li><a href=\"http:\/\/scoutapp.com\/\">Scout<\/a><\/li>\n<li><a href=\"http:\/\/pingdom.com\/\">Pingdom<\/a><\/li>\n<li><a href=\"https:\/\/www.datadoghq.com\/\">Datadog<\/a><\/li>\n<li><a href=\"http:\/\/sysdig.com\/\">Sysdig<\/a><\/li>\n<li><a href=\"https:\/\/prometheus.io\/\">Prometheus<\/a><\/li>\n<li><a href=\"https:\/\/github.com\/kubernetes\/heapster\">Heapster \/ Grafana<\/a><\/li>\n<li><a href=\"https:\/\/www.elastic.co\/\">ELK stack<\/a><\/li>\n<li><a href=\"http:\/\/sensuapp.org\/\">Sensu<\/a><\/li>\n<\/ul>\n<p>In the following sections, I suggest a framework for comparing monitoring solutions, present a high-level comparison of each, and then discuss each solution in more detail by addressing how each solution works with Rancher. I also cover a few additional solutions you may have come across that did not make my top 10.<\/p>\n<h2>A Framework for Comparison<\/h2>\n<p>A challenge with objectively comparing monitoring solutions is that architectures, capabilities, deployment models, and costs can vary widely. One solution may extract and graph Docker-related metrics from a single host while another aggregates data from many hosts, measures application response times, and sends automated alerts under particular conditions. Having a framework is useful when comparing solutions. I\u2019ve somewhat arbitrarily proposed the following tiers of functionality that most monitoring solutions have in common as a basis for my comparison. Like any self-respecting architectural stack, this one has seven layers.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17114746\/seven_layers_of_docker_monitoring.png\" alt=\"\" \/><\/p>\n<p>Figure 1: A seven-layer model for comparing monitoring solutions<\/p>\n<ul>\n<li>Host Agents \u2013 The host agent represents the \u201carms and legs\u201d of the monitoring solution, extracting time-series data from various sources like APIs and log files. Agents are usually installed on each cluster host (either on-premises or cloud-resident) and are themselves often packaged as Docker containers for ease of deployment and management.<\/li>\n<li>Data gathering framework \u2013 While single-host metrics are sometimes useful, administrators likely need a consolidated view of all hosts and applications. Monitoring solutions typically have some mechanism to gather data from each host and persist it in a shared data store.<\/li>\n<li>Datastore \u2013 The datastore may be a traditional database, but more commonly it is some form of scalable, distributed database optimized for time-series data comprised of key-value pairs. Some solutions have native datastores while others leverage pluggable open-source datastores.<\/li>\n<li>Aggregation engine \u2013 The problem with storing raw metrics from dozens of hosts is that the amount of data can become overwhelming. Monitoring frameworks often provide data aggregation capabilities, periodically crunching raw data into consolidated metrics (like hourly or daily summaries), purging old data that is no longer needed, or re-factoring data in some fashion to support anticipated queries and analysis.<\/li>\n<li>Filtering &amp; Analysis \u2013 A monitoring solution is only as good as the insights you can gain from the data. Filtering and analysis capabilities vary widely. Some solutions support a few pre-packaged queries presented as simple time-series graphs, while others have customizable dashboards, embedded query languages, and sophisticated analytic functions.<\/li>\n<li>Visualization tier \u2013 Monitoring tools usually have a visualization tier where users can interact with a web interface to generate charts, formulate queries and, in some cases, define alerting conditions. The visualization tier may be tightly coupled with the filtering and analysis functionality, or it may be separate depending on the solution.<\/li>\n<li>Alerting &amp; Notification \u2013 Few administrators have time to sit and monitor graphs all day. Another common feature of monitoring systems is an alerting subsystem that can provide notification if pre-defined thresholds are met or exceeded.<\/li>\n<\/ul>\n<p>Beyond understanding how each monitoring solution implements the basic capabilities above, users will be interested in other aspects of the monitoring solution as well:<\/p>\n<ul>\n<li>Completeness of the solution<\/li>\n<li>Ease of installation and configuration<\/li>\n<li>Details about the web UI<\/li>\n<li>Ability to forward alerts to external services<\/li>\n<li>Level of community support and engagement (for open-source projects)<\/li>\n<li>Availability in Rancher Catalog<\/li>\n<li>Support for monitoring non-container environments and apps<\/li>\n<li>Native Kubernetes support (Pods, Services, Namespaces, etc.)<\/li>\n<li>Extensibility (APIs, other interfaces)<\/li>\n<li>Deployment model (self-hosted, cloud)<\/li>\n<li>Cost, if applicable<\/li>\n<\/ul>\n<h2>Comparing Our 10 Monitoring Solutions<\/h2>\n<p>The diagram below shows a high-level view of how our 10 monitoring solutions map to our seven-layer model, which components implement the capabilities at each layer, and where the components reside. Each framework is complicated, and this is a simplification to be sure, but it provides a useful view of which component does what. Read on for<br \/>\nadditional detail.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17115425\/gord_comparing_ten_monitoring_solutions-1024x585.png\" alt=\"\" \/><\/p>\n<p>Figure 2 &#8211; 10 monitoring solutions at a glance Additional attributes of each monitoring solution are presented in a summary fashion below. For some solutions, there are multiple deployment options, so the comparisons become a little more nuanced.<\/p>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17115647\/gord_monitoring_functionality_comparison-1024x640.png\" alt=\"\" \/><\/p>\n<h2>Looking at Each Solution in More Depth<\/h2>\n<h3>Docker Stats<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17122404\/gord_docker_logo-150x150.png\" alt=\"docker stats\" \/><br \/>\nAt the most basic level, <a href=\"https:\/\/www.docker.com\/docker-community\">Docker<\/a> provides built-in command monitoring for Docker<br \/>\nhosts via the <a href=\"https:\/\/docs.docker.com\/engine\/reference\/commandline\/stats\/\">docker stats<\/a> command. Administrators can query the Docker daemon and obtain detailed, real-time information about container resource consumption metrics, including CPU and memory usage, disk and network I\/O, and the number of running processes. Docker stats leverages the <a href=\"https:\/\/docs.docker.com\/engine\/api\/\">Docker Engine API<\/a> to retrieve this information. Docker stats has no notion of history, and it can only monitor a single host, but clever administrators can write scripts to gather metrics from multiple hosts. Docker stats is of limited use on its own, but docker stats data can be combined with other data sources like <a href=\"https:\/\/docs.docker.com\/engine\/reference\/commandline\/logs\/\">Docker log files<\/a> and <a href=\"https:\/\/docs.docker.com\/engine\/reference\/commandline\/events\/\">docker events<\/a> to feed higher level monitoring services. Docker only knows about metrics reported by a single host, so Docker stats is of limited use monitoring Kubernetes or Swarm clusters with multi-host application services. With no visualization interface, no aggregation, no datastore, and no ability to collect data from multiple hosts, Docker stats does not fare well against our seven-layer model. Because <a href=\"http:\/\/rancher.com\">Rancher<\/a> runs on Docker, basic docker stats functionality is automatically available to Rancher users.<\/p>\n<h3>cAdvisor<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17122432\/gord_cadvisor_logo-150x150.png\" alt=\"\" \/><a href=\"https:\/\/github.com\/google\/cadvisor\">cAdvisor<\/a> (container advisor) is an <a href=\"https:\/\/github.com\/google\/cadvisor\">open-source project<\/a> that like Docker stats provides users with resource usage information about running containers. cAdvisor was originally developed by Google to manage its <a href=\"https:\/\/en.wikipedia.org\/wiki\/Lmctfy\">lmctfy<\/a> containers, but it now supports Docker as well. It is implemented as a daemon process that collects, aggregates, processes, and exports information about running containers. cAdvisor exposes a web interface and can generate multiple graphs but, like Docker stats, it monitors only a single Docker host. It can be installed on a Docker machine either as a container or natively<br \/>\non the Docker host itself. cAdvisor itself only retains information for 60 seconds. cAdvisor needs to be configured to log data to an external datastore. Datastores commonly used with cAdvisor data include <a href=\"https:\/\/prometheus.io\/\">Prometheus<\/a> and<br \/>\n<a href=\"https:\/\/github.com\/influxdata\/influxdb\">InfluxDB<\/a>. While cAdvisor itself is not a complete monitoring solution, it is often a component of other monitoring solutions. Before Rancher version 1.2 (late December), Rancher embedded cAdvisor in the rancher-agent (for internal use by Rancher), but this is no longer the case. More recent versions of Rancher use Docker stats to gather information exposed through the Rancher UI because they can do so with less overhead. Administrators can<br \/>\neasily deploy cAdvisor on Rancher, and it is part of several comprehensive monitoring stacks, but cAdvisor is no longer part of Rancher itself.<\/p>\n<h3>Scout<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17122457\/gord_scout_logo-150x150.png\" alt=\"\" \/><a href=\"http:\/\/scoutapp.com\/\">Scout<\/a> is a Colorado-based company that provides a cloud-based application and database-monitoring service aimed mainly at Ruby and Elixir environments. One of many use cases it supports is monitoring Docker containers leveraging its existing monitoring and alerting framework. We mention Scout because it was covered in previous comparisons as a solution for monitoring Docker. Scout provides comprehensive data gathering, filtering, and monitoring functionality with flexible alerts and integrations to third-party alerting services. The team at Scout provides guidance on how to write scripts using Ruby and <a href=\"https:\/\/github.com\/etsy\/statsd\/wiki\">StatsD<\/a> to tap the <a href=\"http:\/\/blog.scoutapp.com\/articles\/2015\/06\/22\/monitoring-docker-containers-from-scratch\">Docker Stats<br \/>\nAPI<\/a> (above), the Docker Event API, and relay metrics to Scout for monitoring. They\u2019ve also packaged a docker-scout container, available<br \/>\non Docker Hub (<a href=\"https:\/\/hub.docker.com\/r\/scoutapp\/docker-scout\/\">scoutapp\/docker-scout<\/a>), that makes installing and configuring the scout agent simple. The ease of use will depend on whether users configure the StatsD agent themselves or leverage the packaged docker-scout container. As a hosted cloud service, ScoutApp can save a lot of headaches when it comes to getting a container-monitoring solution up and running quickly. If you\u2019re deploying Ruby apps or running the database environments supported by Scout, it probably makes good sense to consolidate your Docker, application, and database-level monitoring and use the Scout solution. Users might want to watch out for a few things, however. At most service levels, the platform only allows for 30 days of data retention, and rather than being priced month per monitored host,<br \/>\nstandard packages are priced per transaction ranging from $99 to $299<br \/>\nper month. The solution out of the box is not Kubernetes-aware, and<br \/>\nextracts and relays a limited set of metrics. Also, while docker-scout<br \/>\nis available on Docker Hub, development is by Pingdom, and there have<br \/>\nbeen only minor updates in the last two years to the agent component.<br \/>\nScout is not natively supported in Rancher but, because it is a cloud<br \/>\nservice, it is easy to deploy and use, particularly when the<br \/>\ncontainer-based agent is used. At present, the docker-scout agent is<br \/>\nnot in the Rancher Catalog.<\/p>\n<h3>Pingdom<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17122523\/gord_pingdom_logo-150x150.png\" alt=\"\" \/> Because we\u2019ve mentioned Scout as a cloud-hosted app, we also need to mention a similar solution called <a href=\"http:\/\/server-monitor.pingdom.com\/\">Pingdom<\/a>. Pingdom<br \/>\nis a hosted-cloud service operated by<br \/>\n<a href=\"http:\/\/www.solarwinds.com\/company\/home\">SolarWinds<\/a>, an Austin, TX,<br \/>\ncompany focused on monitoring IT infrastructure. While the main use case<br \/>\nfor Pingdom is website monitoring, as a part of its server monitor<br \/>\nplatform, Pingdom offers approximately 90 plug-ins. In fact, Pingdom<br \/>\nmaintains<br \/>\n<a href=\"http:\/\/server-monitor.pingdom.com\/plugin_urls\/19761-docker-monitor\">docker-scout<\/a>,<br \/>\nthe same StatsD agent used by Scout. Pingdom is worth a look because its<br \/>\npricing scheme appears better suited to monitoring Docker<br \/>\nenvironments. Pricing is flexible, and users can choose between<br \/>\nper-server based plans and plans based on the number of StatsD metrics<br \/>\ncollected ($1 per 10 metrics per month). Pingdom makes sense for users<br \/>\nwho need a full-stack monitoring solution that is easy to set up and<br \/>\nmanage, and who want to monitor additional services beyond the container<br \/>\nmanagement platform. Like Scout, Pingdom is a cloud service that can be<br \/>\neasily used with Rancher.<\/p>\n<h3>Datadog<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17122544\/gord_datadog_logo-150x150.png\" alt=\"\" \/> Datadog is another commercial hosted-cloud monitoring service similar to Scout and Pingdom. Datadog also provides a Dockerized agent for installation on each Docker host; however, rather than using StatsD like the<br \/>\ncloud-monitoring solutions mentioned previously, Datadog has developed<br \/>\nan enhanced StatsD called<br \/>\n<a href=\"https:\/\/docs.datadoghq.com\/guides\/dogstatsd\/\">DogStatsD<\/a>. The Datadog<br \/>\nagent collects and relays the full set of metrics available from the<br \/>\nDocker API providing more detailed, granular monitoring. While Datadog<br \/>\ndoes not have native support for Rancher, a Datadog catalog entry in the<br \/>\nRancher UI makes the Datadog agent easy to install and configure on<br \/>\nRancher. Rancher tags can be used as well so that reporting in Datadog<br \/>\nreflects labels you\u2019ve used for hosts and applications in Rancher.<br \/>\nDatadog provides better access to metrics and more granularity in<br \/>\ndefining alert conditions than the cloud services mentioned earlier.<br \/>\nLike the other services, Datadog can be used to monitor other services<br \/>\nand applications as well, and it boasts a library of over 200<br \/>\nintegrations. Datadog also retains data at full resolution for 18<br \/>\nmonths, which is longer than the cloud services above. An advantage of<br \/>\nDatadog over some of other cloud services is that it has integrations<br \/>\nbeyond Docker and can collect metrics from Kubernetes, Mesos, etcd, and<br \/>\nother services that you may be running in your Rancher environment. This<br \/>\nversatility is important to users running Kubernetes on Rancher because<br \/>\nthey want to be able to monitor metrics for things like Kubernetes pods,<br \/>\nservices, namespaces, and kubelet health. The Datadog-Kubernetes<br \/>\nmonitoring solution uses DaemonSets in Kubernetes to automatically<br \/>\ndeploy the data collection agent to each cluster node. Pricing for<br \/>\nDatadog starts at approximately $15 per host per month and goes up from<br \/>\nthere depending services required and the number of monitored containers<br \/>\nper host.<\/p>\n<h3>Sysdig<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/18103837\/gord_sysdig_logo_square-150x150.png\" alt=\"\" \/> Sysdig is a California company that provides a cloud-based monitoring solution. Unlike some of the cloud-based monitoring solutions described so far, Sysdig focuses more narrowly on monitoring container environments including Docker, Swarm, Mesos, and Kubernetes. Sysdig also makes some of its functionality available in open-source projects, and they provide<br \/>\nthe option of either cloud or on-premises deployments of the Sysdig<br \/>\nmonitoring service. In these respects, Sysdig is different than the<br \/>\ncloud-based solutions looked at so far. Like Datadog, catalog entries<br \/>\nare available for Rancher, but for Sysdig there are separate entries for<br \/>\non-premises and cloud installations. Automated installation from the<br \/>\nRancher Catalog is not available for Kubernetes; however, it can be<br \/>\ninstalled on Rancher outside of the catalog. The commercial Sysdig<br \/>\nMonitor has Docker monitoring, alerting, and troubleshooting facilities<br \/>\nand is also Kubernetes, Mesos, and Swarm-aware. Sysdig is automatically<br \/>\naware of Kubernetes pods and services, making it a good solution if<br \/>\nyou\u2019ve chosen Kubernetes as your orchestration framework on Rancher.<br \/>\nSysdig is priced monthly per host like Datadog. While the entry price is<br \/>\nslightly higher, Sysdig includes support for more containers per host,<br \/>\nso actual pricing will likely be very similar depending on the user\u2019s<br \/>\nenvironment. Sysdig also provides a comprehensive CLI, csysdig,<br \/>\ndifferentiating it from some of the offerings.<\/p>\n<h3>Prometheus<\/h3>\n<p><img decoding=\"async\" src=\"https:\/\/prometheus.io\/)\n![](http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/17122800\/gord_prometheus_logo-150x150.png\" alt=\"\" \/><br \/>\nPrometheus is a popular, open-source monitoring and alerting toolkit originally built at SoundCloud. It is now a CNCF project, the company\u2019s second hosted project after Kubernetes. As a toolkit, it is substantially different<br \/>\nfrom monitoring solutions described thus far. A first major difference<br \/>\nis that rather being offered as a cloud service, Prometheus is modular<br \/>\nand self-hosted, meaning that users deploy Prometheus on their clusters<br \/>\nwhether on-premises or cloud-resident. Rather than pushing data to a<br \/>\ncloud service, Prometheus installs on each Docker host and pulls or<br \/>\n\u201cscrapes\u201d data from an extensive variety of<br \/>\n<a href=\"https:\/\/prometheus.io\/docs\/instrumenting\/exporters\/\">exporters<\/a><br \/>\navailable to Prometheus via HTTP. Some exporters are officially<br \/>\nmaintained as a part of the Prometheus GitHub project, while others are<br \/>\nexternal contributions. Some projects expose Prometheus metrics natively<br \/>\nso that exporters are not needed. Prometheus is highly extensible. Users<br \/>\nneed to mind the number of exporters and configure polling intervals<br \/>\nappropriately depending on the amount of data they are collecting. The<br \/>\nPrometheus server retrieves time-series data from various sources and<br \/>\nstores data in its internal datastore. Prometheus provides features like<br \/>\nservice discovery, a separate push gateway for specific types of metrics<br \/>\nand has an embedded query language (PromQL) that excels at querying<br \/>\nmultidimensional data. It also has an embedded web UI and API. The web<br \/>\nUI in Prometheus provides good functionality but relies on users knowing<br \/>\nPromQL, so some sites prefer to use Grafana as an interface for charting<br \/>\nand viewing cluster-related metrics. Prometheus has a discrete Alert<br \/>\nManager with a distinct UI that can work with data stored in Prometheus.<br \/>\nLike other alert managers, it works with a variety of external alerting<br \/>\nservices including email, Hipchat, Pagerduty, #Slack, OpsGenie,<br \/>\nVictorOps, and others. Because Prometheus is comprised of many<br \/>\ncomponents, and exporters need to be selected and installed depending on<br \/>\nthe services monitored, it is more difficult to install; but as a free<br \/>\noffering, the price is right. While not quite as refined as tools like<br \/>\nDatadog or Sysdig, Prometheus offers similar functionality, extensive<br \/>\nthird-party software integrations, and best-in-class cloud monitoring<br \/>\nsolutions. Prometheus is aware of Kubernetes and other container<br \/>\nmanagement frameworks. An entry in the Rancher Catalog developed by<br \/>\n<a href=\"https:\/\/www.infinityworks.com\/\">Infinityworks<\/a> makes getting started<br \/>\nwith Prometheus easier when Cattle is used as the Rancher orchestrator<br \/>\nbut, because of the wide variety of configuration options,<br \/>\nadministrators need to spend some time to get it properly installed and<br \/>\nconfigured. Infinityworks have contributed useful add-ons including the<br \/>\n<a href=\"https:\/\/github.com\/infinityworks\/prometheus-rancher-exporter\">prometheus-rancher-exporter<\/a> that<br \/>\nexposes the health of Rancher stacks and hosts obtained from the Rancher<br \/>\nAPI to a Prometheus compatible endpoint. For administrators who don\u2019t<br \/>\nmind going to a little more effort, Prometheus is one of the most<br \/>\ncapable monitoring solutions and should be on your shortlist for<br \/>\nconsideration.<\/p>\n<h3>Heapster<\/h3>\n<p>Heapster is another solution that often comes up related to monitoring-container<br \/>\nenvironments. Heapster is a project under the Kubernetes umbrella that<br \/>\nhelps enable container-cluster monitoring and performance analysis.<br \/>\nHeapster specifically supports Kubernetes and OpenShift and is most<br \/>\nrelevant for Rancher users running Kuberenetes as their orchestrator. It<br \/>\nis not typically be used with Cattle or Swarm. People often describe<br \/>\nHeapster as a monitoring solution, but it is more precisely a<br \/>\n\u201ccluster-wide aggregator of monitoring and event data.\u201d Heapster is<br \/>\nnever deployed alone; rather, it is a part of a stack of open-source<br \/>\ncomponents. The Heapster monitoring stack is typically comprised of:<\/p>\n<ul>\n<li>A data gathering tier \u2013 e.g., cAdvisor accessed with the<br \/>\nkubelet on each cluster host<\/li>\n<li>Pluggable storage backends \u2013 e.g., ElasticSearch, InfluxDB,<br \/>\nKafka, Graphite, or roughly <a href=\"https:\/\/github.com\/kubernetes\/heapster\/blob\/master\/docs\/sink-owners.md\">a dozen<br \/>\nothers<\/a><\/li>\n<li>A data visualization component \u2013 Grafana or Google Cloud<br \/>\nMonitoring<\/li>\n<\/ul>\n<p>A popular stack is comprised of Heapster, InfluxDB, and Grafana, and<br \/>\nthis combination is installed by default on Rancher when users choose to<br \/>\ndeploy Kubernetes. Note that these components are considered add-ons to<br \/>\nKubernetes, so they may not be automatically deployed with all<br \/>\nKubernetes distributions. One of the reasons that InfluxDB is popular is<br \/>\nthat it is one of the few data backends that supports both Kubernetes<br \/>\nevents and metrics, allowing for more comprehensive monitoring of<br \/>\nKubernetes. Note that Heapster does not natively support alerting or<br \/>\nservices related to Application Performance Management (APM) found in<br \/>\ncommercial cloud-based solutions or Prometheus. Users that need<br \/>\nmonitoring services can supplement their Heapster installation using<br \/>\n<a href=\"https:\/\/github.com\/hawkular\">Hawkular<\/a>, but this is not automatically<br \/>\nconfigured as part of the Rancher deployment and will require extra user<br \/>\neffort.<\/p>\n<h3>ELK Stack<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/rancher.com\/wp-content\/uploads\/2017\/10\/Elastic-Stack-Master4-150x150.jpg\" alt=\"\" \/>Another<br \/>\nopen-source software stack available for monitoring container environments is ELK, comprised of three open-source projects contributed by <a href=\"https:\/\/github.com\/elastic\">Elastic<\/a>. The ELK stack is versatile and<br \/>\nis widely used for a variety of analytic applications, log file<br \/>\nmonitoring being a key one. ELK is named for its key components:<\/p>\n<ul>\n<li><a href=\"https:\/\/github.com\/elastic\/elasticsearch\">Elasticsearch<\/a> \u2013 a<br \/>\ndistributed search engine based on Lucene<\/li>\n<li><a href=\"https:\/\/github.com\/elastic\/elasticsearch\">Logstash<\/a> \u2013 a<br \/>\ndata-processing pipeline that ingests data and sends it to<br \/>\nElastisearch (or other \u201cstashes\u201d)<\/li>\n<li><a href=\"https:\/\/github.com\/elastic\/kibana\">Kibana<\/a> \u2013 a visual search<br \/>\ndashboard and analysis tool for Elasticsearch<\/li>\n<\/ul>\n<p>An unsung member of the Elastic stack is<br \/>\n<a href=\"https:\/\/github.com\/elastic\/beats\">Beats<\/a>, described by the project<br \/>\ndevelopers as \u201clightweight data shippers.\u201d There are a variety of<br \/>\noff-the-shelf Beats shippers including Filebeat (used for log files),<br \/>\nMetricbeat (using for gathering data metrics from various sources), and<br \/>\nHeartbeat for simple uptime monitoring among others. Metricbeat is<br \/>\nDocker-aware, and the authors provide<br \/>\n<a href=\"https:\/\/www.elastic.co\/guide\/en\/beats\/metricbeat\/current\/running-on-docker.html\">guidance<\/a><br \/>\non how to use it to extract host metrics and monitor services in Docker<br \/>\ncontainers. There are variations in how the ELK stack is<br \/>\ndeployed. Lorenzo Fontana of Kiratech explains in <a href=\"https:\/\/blog.codeship.com\/monitoring-docker-containers-with-elasticsearch-and-cadvisor\">this<br \/>\narticle<\/a><br \/>\nhow to use cAdvisor to collect metrics from Docker Swarm hosts for<br \/>\nstorage in ElasticSearch and analysis using Kibana. In <a href=\"https:\/\/aboullaite.me\/docker-monitoring-with-the-elk-stack\/\">another<br \/>\narticle<\/a>,<br \/>\nAboullaite Mohammed describes a different use case focused on collecting<br \/>\nDocker log files for analysis focusing on analyzing various Linux and<br \/>\nNGINX log files (error.log, access.log, and syslog). There are<br \/>\ncommercial ELK stack providers such as <a href=\"http:\/\/logz.io\/\">logz.io<\/a> and<br \/>\n<a href=\"https:\/\/www.elastic.co\/\">Elastic Co<\/a> themselves that offer \u201cELK as a<br \/>\nservice\u201d supplementing the stack\u2019s capabilities with alerting<br \/>\nfunctionality. Additional information about using ELK with Docker is<br \/>\navailable at <a href=\"https:\/\/elk-docker.readthedocs.io\/\">https:\/\/elk-docker.readthedocs.io\/<\/a>. For Rancher users<br \/>\nthat wish to experiment with ELK, the stack is available as a Rancher<br \/>\nCatalog entry, and a <a href=\"http:\/\/rancher.com\/deploying-an-elasticsearch-cluster-using-rancher-catalog\/\">tutorial by Rachid<br \/>\nZaroualli<\/a><br \/>\nexplains how to deploy it. Zaroualli has contributed an <a href=\"http:\/\/rancher.com\/using-containers-elasticsearch-cluster-twitter-monitoring\/\">additional<br \/>\narticle<\/a><br \/>\non how the ELK stack can be used for monitoring Twitter data. While<br \/>\nknowledgeable administrators can use ELK for container monitoring, this<br \/>\nis a tougher solution to implement compared to Sysdig, Prometheus, or<br \/>\nDatadog, all of which are more directly aimed at container monitoring.<\/p>\n<h3>Sensu<\/h3>\n<p><img decoding=\"async\" src=\"http:\/\/cdn.rancher.com\/wp-content\/uploads\/2017\/10\/18103933\/gord_sensu_logo_square-150x150.png\" alt=\"\" \/> Sensu is a general-purpose, self-hosted monitoring solution that supports a variety of monitoring applications. A free Sensu Core edition is available under an MIT license, while an enterprise version with added functionality is available for $99 per month for 50 Sensu clients. Sensu uses the term client to refer to its monitoring agents, so depending on the number of hosts and application environments you are monitoring, the enterprise<br \/>\nedition can get expensive. Sensu has impressive capabilities outside of<br \/>\ncontainer management, but consistent with the other platforms I\u2019ve<br \/>\nlooked at it from the perspective of monitoring the container<br \/>\nenvironment and containerized applications. The number of Sensu<br \/>\n<a href=\"https:\/\/sensuapp.org\/plugins#third-party-plugins\">plug-ins<\/a> continues<br \/>\nto grow, and there are dozens of Sensu and community supported plug-ins<br \/>\nthat allow metrics to be extracted from various sources. In an earlier<br \/>\nevaluation of Sensu on Rancher in 2015, it was necessary for the author<br \/>\nto develop shell scripts to extract information from Docker, but an<br \/>\nactively developed <a href=\"https:\/\/github.com\/sensu-plugins\/sensu-plugins-docker\">Docker<br \/>\nplug-in<\/a> is now<br \/>\navailable for this purpose making Sensu easier to use with Rancher.<br \/>\nPlug-ins tend to be written in Ruby with gem-based installation scripts<br \/>\nthat need to run on the Docker host. Users can develop additional<br \/>\nplug-ins in the languages they choose. Sensu plug-ins are not deployed<br \/>\nin their own containers, as common with other monitoring solutions we\u2019ve<br \/>\nconsidered. (This is no doubt because Sensu does not come from a<br \/>\nheritage of monitoring containers.) Different users will want to mix and<br \/>\nmatch plug-ins depending on their monitoring requirements, so having<br \/>\nseparate containers for each plug-in would become unwieldy, and this is<br \/>\npossibly why containers are not used for deployment. Plug-ins are<br \/>\ndeployable using platforms like Chef, Puppet, and Ansible, however. For<br \/>\nDocker alone, for example, there are <a href=\"https:\/\/github.com\/sensu-plugins\/sensu-plugins-docker\/tree\/master\/bin\">six separate<br \/>\nplug-ins<\/a><br \/>\nthat gather Docker-related data from various sources, including Docker<br \/>\nstats, container counts, container health, Docker ps, and more. The<br \/>\nnumber of plug-ins is impressive and includes many of the application<br \/>\nstacks that users will likely be running in container environments<br \/>\n(ElasticSearch, Solr, Redis, MongoDB, RabbitMQ, Graphite, and Logstash,<br \/>\nto name a few). Plug-ins for management and orchestration frameworks<br \/>\nlike AWS services (EC2, RDS, ELB) are also provided with Sensi.<br \/>\nOpenStack and Mesos support is available in Sensu as well. Kubernetes<br \/>\nappears to be missing from the list of plug-ins a present. Sensu uses a<br \/>\nmessage bus implemented using RabbitMQ to facilitate communication<br \/>\nbetween the agents\/clients and the Sensu server. Sensu uses Redis to<br \/>\nstore data, but it is designed to route data to external time-series<br \/>\ndatabases. Among the databases supported are Graphite, Librato, and<br \/>\nInfluxDB. <a href=\"https:\/\/sensuapp.org\/docs\/1.0\/platforms\/sensu-on-ubuntu-debian.html#sensu-core\">Installing and configuring<br \/>\nSensu<\/a><br \/>\ntakes some effort. Pre-requisites to installing Sensu are Redis and<br \/>\nRabbitMQ. The Sensu server, Sensu clients, and the Sensu dashboard<br \/>\nrequire separate installation, and the process varies depending on<br \/>\nwhether you are deploying Sensu core or the enterprise version. Sensu as<br \/>\nmentioned, do not offer a container friendly deployment model. For<br \/>\nconvenience, a Docker image is available<br \/>\n(<a href=\"https:\/\/hub.docker.com\/r\/hiroakis\/docker-sensu-server\/\">hiroakis\/docker-sensu-server<\/a>)<br \/>\nthat runs redis, rabbitmq-server, uchiwa (the open-source web tier) and<br \/>\nthe Sensu server components, but this package is more useful for<br \/>\nevaluation than a production deployment. Sensu has a large number of<br \/>\nfeatures, but a drawback for container users is that the framework is<br \/>\nharder to install, configure, and maintain because the components are<br \/>\nnot themselves Dockerized. Also, many of the alerting features like<br \/>\nsending alerts to services like PagerDuty, Slack, or HipChat, for<br \/>\nexample, that are available in competing cloud-based solutions or<br \/>\nopen-source solutions like Prometheus require that purchase of the Sensu<br \/>\nenterprise license. Particularly if you are running Kubernetes, there<br \/>\nare probably better choices out there.<\/p>\n<h3>The Monitoring Solutions We Missed<\/h3>\n<ul>\n<li><a href=\"https:\/\/www.graylog.org\/\">Graylog<\/a> is another open-source solution<br \/>\nthat comes up when monitoring Docker. Like ELK, Graylog is suited to<br \/>\nDocker log file analysis. It can accept and parse logs and event<br \/>\ndata from multiple data sources and supports third-party collectors<br \/>\nlike Beats, Fluentd, and NXLog. There\u2019s a <a href=\"http:\/\/cloudlady911.com\/index.php\/2016\/06\/23\/graylog-ha-with-rancher\/\">good<br \/>\ntutorial<\/a><br \/>\non configuring Graylog for use with Rancher.<\/li>\n<li><a href=\"https:\/\/www.nagios.org\/\">Nagios<\/a> is usually viewed as better suited<br \/>\nfor monitoring cluster hosts rather than containers but, for those<br \/>\nof us who grew up monitoring clusters, Nagios is a crowd favorite.<br \/>\nFor those interested in <a href=\"https:\/\/github.com\/sshipway\/check_rancher\">using Nagios with<br \/>\nRancher<\/a>, some work has<br \/>\nbeen done here.<\/li>\n<li><a href=\"https:\/\/netsil.com\/\">Netsil<\/a> is a Silicon Valley startup offering a<br \/>\nmonitoring application with plugins for Docker, Kubernetes, Mesos,<br \/>\nand a variety of applications and cloud providers. Netsil\u2019s<br \/>\nApplication Operations Center (AOC) provides framework-aware<br \/>\nmonitoring for cloud application services. Like some of the other<br \/>\nmonitoring frameworks discussed, it is offered as a cloud\/SaaS or<br \/>\nself-hosted.<\/li>\n<\/ul>\n<p>Gord Sissons, Principal Consultant at StoryTek<\/p>\n<p><a href=\"https:\/\/rancher.com\/comparing-10-container-monitoring-solutions-rancher\/\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Learn How Rancher 2.0 Solves Enterprise Kubernetes Challenges Understand the comparitive advantage of Rancher 2.0 for DevOps teams, IT Admins, and Operations. Read the Report Container monitoring environments come in all shapes and sizes. Some are open source while others are commercial. Some are in the Rancher Catalog while others require manual configuration. Some are &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw93\/index.php\/2018\/11\/02\/comparing-10-container-monitoring-solutions-for-rancher\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Comparing 10 Container Monitoring Solutions for Rancher&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-742","post","type-post","status-publish","format-standard","hentry","category-kubernetes"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/742","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/comments?post=742"}],"version-history":[{"count":2,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/742\/revisions"}],"predecessor-version":[{"id":748,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/742\/revisions\/748"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/media?parent=742"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/categories?post=742"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/tags?post=742"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}