Continuous Deployment and Automated Canary Analysis with Spinnaker and Kubernetes // Jetstack Blog

Spinnaker is a cloud-native continuous delivery tool created at Netflix and was originally designed and built to help internal development teams release software changes with confidence. Since then it has been open-sourced and has gained the support of a growing number of mainstream cloud providers including Google, Amazon, Microsoft, IBM and Oracle.

At Jetstack we receive questions almost on a daily basis from our customers about how to deploy to Kubernetes across different environments and in some cases to clusters in multiple cloud providers/on-prem. Since Spinnaker runs natively on Kubernetes and has first-class support for Kubernetes manifests, it is a strong candidate as a tool for this purpose. However, being able to demonstrate the tool in action and more importantly how it might integrate with other tooling is vital for making a decision. For this reason we have been working on a series of demonstrators with various best-of-breed cloud-native technologies to help inform our customers. In this post, we’ll describe the architecture of the demo and how these cloud-native technologies can be used together and with Spinnaker.

Overview

The primary aim of the demo is to show how Spinnaker could be used to automate the deployment of a new version of an application to production with confidence. The chosen application is a simple webserver called the goldengoose that we use for our advanced wargaming training course. The techniques described below could of course be applied to a more complex application, but in order to keep the focus on Spinnaker’s capabilities rather than the intricacies of managing a particular application, we chose to keep the application simple.

The demo configures two pipelines within Spinnaker: Build and Deploy. When a commit is pushed to the master branch of the goldengoose GitHub repository, a GitHub webhook triggers the Build pipeline which builds an image on-cluster and pushes the result to Docker Hub. If successful, the Deploy pipeline is then triggered which deploys the new image in a controlled way to production.

One of the main components of the demo that provides the confidence and control mentioned above is the use of Spinnaker’s automated canary analysis (ACA) feature. This feature leverages Kayenta, a component responsible for querying relevant metrics from a configured sink and performing statistical analysis on the data to decide how to proceed (in this case, whether a canary deployment should be promoted to production or not). Deciding which metrics should be used to make such a decision can be challenging, however this feature provides operators with an incredibly flexible way of describing to Spinnaker what it means for a new version of their application to be ‘better’ than the previous version.

deploy pipeline

The whole demo (except for the GitHub and Docker Hub components, load balancers and disks) runs on a single GKE cluster. This cluster does not have any special requirements except that we have enabled autoscaling and made the nodes larger than the default (n1-standard-4).

More detail on how the tools used within the demo interact with each other will be described below, but the high-level steps are as follows:

  1. Make a local change to the goldengoose codebase and push to GitHub
  2. GitHub webhook triggers Spinnaker Build pipeline which applies Knative Build custom resource to the cluster
  3. Knative build controller triggers a build of the goldengoose image which is pushed to Docker Hub
  4. If the build is successful, the Spinnaker Deploy pipeline is triggered
  5. Canary deployment is deployed from the newly built image
  6. Baseline deployment is deployed using the image from the current production deployment
  7. Spinnkaker performs ACA on performance metrics collected from both the canary and baseline deployments
  8. If ACA is deemed successful, the canary image is promoted to production by performing a rolling update of the production deployment
  9. Canary and baseline deployments are cleaned up

The reason for deploying a baseline using the current production image rather than just using the production deployment itself is to avoid differences in performance metrics due to how long the deployment has been running. Heap size is one such metric that could be affected by this.

These steps could of course be extended to a more complex pipeline involving more environments and more testing, perhaps with a final manual promotion to production; a single Spinnaker deployment can interact with multiple Kubernetes clusters other than the cluster Spinnaker is running on by installing credentials for these other clusters.

Here we list the main tools that have been used and their purpose within the demo and how they relate to Spinnaker:

  • Knative: when a code change is pushed to our goldengoose repository, we want to trigger a build so that a new canary deployment can be rolled out. Knative’s build component worked nicely for this and allowed Spinnaker to apply a Build custom resource to the cluster whenever a commit was pushed to our master branch. This CI component of the demo is not strictly within Spinnaker’s domain as a CD tool, however by having Knative controllers handle the logic involved in building a new image we could still make use of Spinnaker’s first-class support for Kubernetes resources.
  • Prometheus: Spinnaker’s ACA requires access to a set of metrics from both a canary deployment and a baseline deployment. Spinnaker supports a number of metrics sinks but some of the reasons we chose Prometheus was its ubiquity in the cloud-native space and the fact that it integrates with Istio out of the box. By configuring Spinnaker to talk to our in-cluster Prometheus instance we were able to automate the decision to promote canary images to production.
  • Istio: as we were only making use of Knative’s build component, we did not have a strict dependency on Istio; however, by using Istio’s traffic shifting capabilities we were able to easily route equal and weighted production traffic to both our baseline and canary deployments, producing performance metrics to be used by Spinnaker’s ACA feature. Istio’s traffic mirroring feature could also be used if you did not want responses from the canary to be seen by users. We also made use of the Prometheus adapter to describe to Istio which goldengoose metrics we wanted to make available in Prometheus. Finally, the Istio Gateway was used to allow traffic to reach our goldengoose deployments.
  • cert-manager: to secure Spinnaker’s UI and API endpoints we needed TLS certificates; what else would we use?
  • nginx-ingress: the NGINX ingress controller was used to allow traffic to reach both the Spinnaker UI and API endpoints as well as for cert-manager Let’s Encrypt ACME HTTP challenges.
  • GitHub: used as both a source code respository and as an OAuth identity provider for Spinnaker. There are other authentication options available.
  • OpenLDAP: used for authorisation within Spinnaker. There are other authorisation options available.

Summary

We have described how Spinnaker can be used for continuous delivery (and integration) and how it can be integrated with other cloud-native tooling to provide powerful capabilities within your organisation.

It is still relatively early days for the Spinnaker project and we can expect to see lots of future development; the documentation that exists today is clean and easy to follow, however there are a number of undocumented features that I would like to see around exposing the internals of the various microservices that make up a Spinnaker deployment. Some interesting ones exist today for example writing custom stages and adding first-class support for particular Kubernetes custom resources but other changes such as letting Spinnaker know that a new CRD exists in the cluster and the recommended way of manually adding to generated Halyard configuration (for component sizing for example) would be nice to see. Fortunately the Spinnaker community is strong and responsive and has clearly outlined how best to get in touch here.

One potential barrier to Spinnaker adoption for some users is the amount it lays on top of Kubernetes; authentication, authorisation and configuration validation (e.g. for Spinnaker pipelines) are all handled by various Spinnaker components or external services, however upstream Kubernetes already has a lot of machinery to handle these exact problems which Spinnaker does not make use of. The ability to apply a pipeline custom resource for example that Spinnaker watches for would be very powerful, allowing RBAC rules to be configured to control which users are allowed to manage pipelines. Not relying on Kubernetes for these features does of course allow for more granular authentication for example and additionally makes Spinnaker’s deployment options more wide than just Kubernetes, however since the only production installation instructions require Kubernetes and since Kubernetes is becoming increasingly ubiquitous, it might ease adoption by working towards making that coupling tighter. Projects such as k8s-pipeliner do try to provide some of that glue but deeper integrations would be greatly valued for users already familiar with Kubernetes.

For more information on anything covered in this post please reach out to our team at hello@jetstack.io.

Source

Grafana Dashboard | Deploy Docker Image

Rancher Server has recently added Docker Machine support,
enabling us to easily deploy new Docker hosts on multiple cloud
providers via Rancher’s UI/API and automatically have those hosts
registered with Rancher. For now Rancher supports DigitalOcean and
Amazon EC2 clouds, and more providers will be supported in the future.
Another significant feature of Rancher is its networking implementation,
because it enhances and facilitates the way you connect Docker
containers and those services running on them. Rancher creates a private
network across all Docker hosts that allows containers to communicate as
if they were in the same subnet. In this post we will see how to use the
new Docker Machine support and Rancher networking by deploying a Grafana
dashboard installation on Amazon EC2. We are creating EC2 instances directly from
Rancher UI and all our containers are being connected through the
Rancher network. If you have never heard of Grafana, it is an open
source rich metric web dashboard and graph editor for Graphite, influxDB
and OpenTSBD metric storages. To set this up we are using these docker
images:

  • tutum/influxdb for storing metrics and grafana dashboards
  • tutum/grafana for graphing influxDB metrics and serving
    dashboards
  • a custom linux image that will send linux O.S. metrics to influxDB
    using sysinfo_influxdb (CPU, memory, load, disks I/O, network
    traffic).

In a test environment you may want to deploy docker images in the same
host, but we are using a total of 4 AWS instances listed below in order
to mimic a large-scale production deployment and also to see how Rancher
networking works.

  • 1 as a Rancher Server to provision and manage application stack AWS
    instances,
  • 1 running influxDB docker image (tutum/influxdb)
  • 1 running grafana docker image (tutum/grafana)
  • 1 running sysinfo docker image (nixel/sysinfo_influxdb)

Preparing AWS Environment

First you will need to create the following in AWS Console: a Key Pair
for connecting to your servers, a Security Group to give you access to
Rancher Console, and a Access Key for Rancher to provision EC2
instances. Creating a Key Pair Enter AWS EC2 Console, go to Key
Pairs
section, click Create Key Pair button and then enter a name for
your Key Pair. Once created, your browser downloads a pem certificate.
You will need it if you want to connect to your AWS instances.
Creating a Security Group First of all go to VPC Console and enter
Subnets section. You will get a list of available subnets in default
VPC, choose one for deploying AWS instances and copy its ID and CIDR.
Also copy VPC ID, you will need all this data later when creating Docker
hosts with Machine integration. I am using subnet 172.31.32.0/20 for
this tutorial.
AWS-VPC-Subnets
Then enter AWS EC2 Console, go to Security Groups section and click
Create Security Group button. Enter the following data:

  • Security Group Name: Rancher and Grafana
  • Description: Open Rancher and Grafana ports
  • VPC: select the default one
  • Add a new inbound rule to allow 22 TCP port to be accessible only
    from your IP
  • Add a new inbound rule to allow 8080 TCP port to be accessible only
    from your IP
  • Add a new inbound rule to allow 9345-9346 TCP ports to be accessible
    from anywhere
  • Add a new inbound rule to allow all traffic from your VPC network.
    In this case source is 172.31.32.0/20, change it accordingly to your
    environment.

AWS-Security-Group-RancherServer
Creating an Access Key Enter EC2 Console and click your user name in
the top menu bar, click Security Credentials and then expand Access
Keys
option. Click Create New Access Key button and after it has been
created you will click Show Access Key to get the ID and Secret Key.
Save them because you are needing them later to create Docker hosts.

Rancher Server Setup

For launching Rancher Server you will need an AWS instance. I am using
the t1.micro instance for writing this guide, but it is recommended to
use a larger instance for real environments. Enter AWS EC2 Console and
go to Instances section, click Launch Instance button, click
Community AMIs and then search for RancherOS and select last version,
for example rancheros-v0.2.1. Choose an instance type and click Next:
Configure instance details
button. In configuration screen be sure to
select the same subnet you chose for Security Group. Expand Advanced
Details
section and enter this user data to initialize your instance
and get Rancher Server installed and running.

#!/bin/bash
docker run -d -p 8080:8080 rancher/server:v0.14.1

AWS-RancherServer-userData
You may keep default options for all steps excepting Security Group
(choose Security Group named Rancher and Grafana). When launching AWS
instance you are asked to choose a Key Pair, be sure to select the one
that we created before. Go to Instances section and click your Rancher
Server instance to know its private and public IPs. Wait a few minutes
and then browse to
http://RANCHER_SERVER_PUBLIC_IP:8080
to enter Rancher Web Console and click Add Host. You will be asked to
confirm Rancher Server IP address, click Something else and enter
RANCHER_SERVER_PRIVATE_IP:8080, finally click Save button.
AWS-RancherServer-Host-setup

Docker hosts setup

Go to Rancher Console, click Add Host and select Amazon EC2. Here
you will need to enter the new host name, the Access Key and the
Secret Key. Also be sure to set the same Region, Zone, and VPC
ID
as those used by Rancher Server. Leave all other parameters with
their default values. Repeat this process to create our three Docker
hosts that will appear up and running after a while.
AWS-DockerHosts
Security Group for grafana Rancher Server has created a Security
Group named docker-machine for your Docker hosts. Now in order to be
able to connect to grafana you must go to VPC Console and add the
following Inbound rules:

  • Add a new inbound rule to allow 80 TCP port to be accessible only
    from your IP
  • Add a new inbound rule to allow 8083-8084 TCP ports to be accessible
    only from your IP
  • Add a new inbound rule to allow 8086 TCP port to be accessible only
    from your IP
  • Add a new inbound rule to allow all traffic from your VPC network.
    In this case source is 172.31.32.0/20, change it accordingly to your
    environment.

AWS-DockerMachine-SecurityGroup

Installing application containers

This step consists of installing and configuring influxDB, grafana, and
an ubuntu container running sysinfo_influxdb. This container will send
O.S. metrics to influxDB which will be graphed in grafana.
Installing influxDB container Go to Rancher Web Console and click +
Add Container
button at your first host, enter a container name like
influxdb and tutum/influxdb in Select Image field. Add these
three port mappings, all of them are TCP:

  • 8083 (on host) to 8083 (in container)
  • 8084 (on host) to 8084 (in container)
  • 8086 (on host) to 8086 (in container)

Expand Advanced Options section an add an environment variable named
PRE_CREATE_DB which value is grafana, so influxDB will create
an empty database for grafana metrics. Now go to Networking section
and enter a hostname like influxdb for this container. Be sure that
Network type is Managed Network on docker0 so this container can be
reached by grafana and sysinfo_influxdb. You can leave other options
with their default values. After a few minutes you will see your
influxDB container launched and running in your host. Note that influxdb
container has a private IP address, copy it to configure
sysinfo_influxdb later. Copy also the public IP of host that is running
this container, you will need it later to configure grafana.
AWS-grafana1-host

Installing grafana container Go to Rancher Web Console and click +
Add Container
button at your second host, enter a container name like
grafana and tutum/grafana in Select Image field. Add this TCP
port mapping:

  • 80 (on host) to 80 (in container)

Expand Advanced Options section and enter the following environment
variables needed by grafana:

Variable name Variable value Used for
HTTP_USER admin User login for grafana basic HTTP authentication
HTTP_PASS Some password User password for grafana basic HTTP authentication
INFLUXDB_HOST 52.11.32.51 InfluxDB host’s public IP. Adapt this to your environment
INFLUXDB_PORT 8086 InfluxDB port
INFLUXDB_NAME grafana Name of previously created database
INFLUXDB_USER root InfluxDB user credentials
INFLUXDB_PASS root InfluxDB user credentials
INFLUXDB_IS_GRAFANADB true Tell grafana to use InfluxDB for storing dashboards

AWS-Grafana-Env-Vars
Grafana makes your browser to connect to influxDB directly. This is why
we need to configure a public IP in INFLUXDB_HOST variable here. If
not, your browser could not reach influxDB when reading metric values.
Go to Networking section and enter a hostname like grafana for this
container. Be sure that Network type is Managed Network on docker0 so
this container can connect to influxdb. You can leave other options with
their default values and after a few minutes you will see your grafana
container launched and running in your host.
AWS-grafana2-host
Now go to Instances section in EC2 Console, click on the instance which
is running grafana container and copy its public IP. Type the following
url in your browser:
http://GRAFANA_HOST_PUBLIC_IP, use
HTTP_USER and HTTP_PASS credentials to log in.
Grafana-Main-Page
Installing sysinfo_influxdb container Go to Rancher Web Console and
click + Add Container button at your third host, enter sysinfo in
container name and nixel/sysinfo_influxdb in Select Image
field. No port mapping is needed. Expand Advanced Options section and
enter these environment variables which are needed by this container:

Variable name Variable value Used for
INFLUXDB_HOST 10.42.169.239 InfluxDB container private IP. Adapt this to your environment
INFLUXDB_PORT 8086 InfluxDB port
INFLUXDB_NAME grafana Name of previously created database
INFLUXDB_USER root InfluxDB user credentials
INFLUXDB_PASS root InfluxDB user credentials
SYSINFO_INTERVAL 5m Sysinfo frequency to update metric values. Default is 5m

Rancher-Sysinfo-Env-Vars
Note that in this case INFLUXDB_HOST contains influxDB container
private IP. This is because sysinfo_influxdb will directly connect to
influxDB, using the VPN created by Rancher. Go to Networking section
and be sure the container hostname is sysinfo because you will later
import a sample grafana dashboard which needs this. Be sure that Network
type is Managed Network on docker0 so this container can connect to
influxdb. You can leave other options with their default values and
after a few minutes you will see your sysinfo container launched and
running in your host.
AWS-grafana3-host

Graph metrics with grafana

At this point sysinfo container is collecting O.S. metrics and sending
them to influxDB every 5 minutes using Rancher networking. In this final
step we are graphing those metrics in grafana. First let’s import a
sample grafana dashboard that is already configured. Execute the
following command to download the dashboard definition:

curl -o https://raw.githubusercontent.com/nixelsolutions/sysinfo_influxdb/master/grafana_dashboard.json

Then open grafana web, browse
to http://GRAFANA_HOST_PUBLIC_IP and
click folder icon on top.
Grafana-Import-dashboard
Click import button and upload the file you have just downloaded. Click
save button on top and now you will be able to see CPU, Load Average,
RAM, Swap and Disks metrics that are being collected in your sysinfo
container.
Grafana-metrics

Conclusion

Rancher implements a networking solution that really simplifies the way
you bring connectivity to those services running in your containers.
Instead of managing port mappings it automatically puts all your
containers into the same network without requiring any configuration
from you. This is an important feature because, in fact, it brings
containers closer to enterprise production platforms because it makes
easier to deploy complex scenarios where some containers need to connect
with others. With Rancher you can deploy any container on any host at
any time without reconfiguring your environment, and there is no need to
worry about defining, configuring or maintaining port mappings when
interconnecting containers. To get more information on Rancher, feel
free at any time to request a demonstration from one of our
engineers
, or sign up
for an upcoming online meetup.

Manel Martinez
is a Linux systems engineer with experience in the design and management
of scalable, distributable and highly available open source web
infrastructures based on products like KVM, Docker, Apache, Nginx,
Tomcat, Jboss, RabbitMQ, HAProxy, MySQL and XtraDB. He lives in spain,
and you can find him on Twitter
@manel_martinezg.

Source

Running Nagios as a System Service on RancherOS

Nagios is a fantastic monitoring tool, and I wanted to see if I could
get the agent to run as a system container on RancherOS, in order to
monitor the host and any Docker containers running on it. It turned out
to be incredibly easy. In this blog post, I’ll walk through how to
launch the Nagios agent as system container in RancherOS. Specifically,
I’ll use two vagrant boxes to cover:

  1. Provisioning a server with the Rancher control plane
  2. Adding a second server running Rancher OS
  3. Installing a Nagios agent as system container on the second server
  4. Connecting the Nagios agent to the Nagios management server

System Containers in RancherOS

First, for anyone who isn’t familiar with RancherOS, it is a minimal
distribution of Linux designed specifically to run Docker. RancherOS
runs a Docker daemon as PID 1, a role typically occupied by the init
system or systemd in most distributions. This daemon runs essential
system services like SSH, syslog or NTP as containers, and is called
system docker.

A second Docker daemon, called user docker, is launched as a
container. This is where any new containers started by the user are
created, as well as containers placed by Rancher or other management
services.

To give the Nagios agent access to all of the data from the server, as
well as the system and user containers, it should run in the system
docker instance. I will run this setup in 2 Vagrant virtual machines.

Set up Rancher

Even though we could monitor RancherOS with Nagios directly, I’m going
to set up Rancher in this deployment to manage the containers we create.
The Rancher team provides a Vagrantfile to run RancherOS in a VM here:
https://github.com/rancher/os-vagrant and
another Vagrantfile for Rancher here:
https://github.com/rancher/rancher
But, since I want to have both in one Vagrant setup, I merged both
Vagrantfiles into one and added the option to run multiple RancherOS
instances in one.

You can find my new Vagrant file here:
[https://github.com/buster/rancher-tutorial]{.c10}

The first step (after installing Vagrant, of course) is to clone this
repository and edit the Vagrantfile to match your IP addresses in the
lines:

# The number of VMs will be added to the following string,
# so Rancher will be on 192.168.0.200, the first RancherOS instance on 192.168.0.201, etc.
$rancher_ip_start = “192.168.0.20”
$rancherui_ip = $rancher_ip_start + “0”
# the number of rancher instances
$n_rancher = 1

* *

Leave $n_rancher at 1 for now.

After editing this file, run `vagrant up’.

Vagrant will now first setup the Rancher VM, which means Vagrant will
download the Virtualbox image, start it and Docker will then download
and run the Rancher Server and the Rancher Agent. Afterwards, the second
VM, which will host our RancherOS instance, will be started and the
RancherOS instance will register itself at the Rancher Server.

When finished, browse to the Rancher IP (http://192.168.0.200:8080/ in
my case) and observe your new and shiny VMs:

Adding a System Container to Rancher

The next task is to set up the Nagios Agent on the RancherOS instance.

For that you will need to log in to the server, which you do by running
`vagrant ssh rancher1`.

There you will have access to the user docker (by calling `docker`)
and to the system docker by calling `sudo system-docker`.

A system container is not different from your usual docker container,
except that it is run by the system docker and that has no networking by
default. Thus, it needs to inherit the network of the host (–net=host
parameter):

sudo system-docker run -d –net=host –name nagios-agent buster/nagios-agent

This nagios agent container comes with a minimal configuration to check
the load on the second RancherOS
instance.

[]Deploying the Nagios Server to Rancher

In order for the Nagios agent to make any sense, we will also need a
Nagios Server which polls the Nagios Agent.

This is as easy as any other Rancher deployment, by clicking on “Add
Container” in the Rancher UI.

There we will make use of the already existing Nagios Server docker
container from
https://registry.hub.docker.com/u/cpuguy83/nagios/
Also don’t forget to go to the `Ports` tab and map port 80 to port
8081 so that you can login on nagios.

Add this container and after a while, the Nagios Server will be up and
running! Browse to
http://192.168.0.200:8081/ and
observe the Nagios UI running. The default username is
nagiosadmin and the password is nagios.

[]Configure Nagios Server

The Nagios Server only knows itself right now, so we will need to
configure it to poll the Nagios Agent.

This can be done in /opt/nagios/etc/conf.d/rancher1.cfg, for example.

Rancher offers a very nice terminal into the running containers, which
you can reach by click on the container and afterwards on the “execute
shell” url:

Now, you can edit the config file by running `nano
/opt/nagios/etc/conf.d/rancher1.cfg`.

Add the following lines to the file:

define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
define host{
use linux-server
host_name rancher1
address 192.168.0.201
}
define service{
use linux-server
host_name rancher1
service_description Current Users
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name rancher1
service_description Current Load
check_command check_nrpe!check_load
}

Afterwards you can check if the configuration file format is correct by
running `nagios -v /opt/nagios/etc/nagios.cfg`.

To check that the nrpe server on the second host is running you can also
run a check manually: `/opt/nagios/libexec/check_nrpe -H
192.168.0.201 -c check_load`

After you have verified the working Nagios setup you simply need to
restart the Nagios Server container by clicking on the symbol:

Now, you can login to Nagios again and see the Nagios Plugins doing
their work:

Conclusion

Using Nagios to monitor multiple RancherOS servers is as easy as running
a preconfigured publicly available Docker container from
https://registry.hub.docker.com

Starting a system docker container requires a few additional steps
compared to running a user container, but hopefully we’ve explained
them clearly here.

In the next few weeks RancherOS will ship 0.3, which includes support
for predefined system services. That will make configuration of new
agents in the Nagios server as easy as executing a docker run command.

If you’d like to get started with RancherOS, you can download it from
GitHub here. Also, we’re always demoing new features and answering lots
of questions at each months Rancher Meetup, which you can find a link
to for below.

Sebastian Schulze is a Technology Consultant from Germany, with
experience in Linux, Solaris, Docker, and Vagrant. You can contact him
via github at:
https://github.com/buster

Source

Docker Monitoring | Container Monitoring

A Detailed Overview of Rancher’s Architecture

This newly-updated, in-depth guidebook provides a detailed overview of the features and functionality of the new Rancher: an open-source enterprise Kubernetes platform.

Get the eBook

Update (October 2017): Gord Sissons revisited this topic and compared
the top 10 container-monitoring solutions for Rancher in a recent blog
post
.

*Update (October 2016): Our October online meetup demonstrated and
compared Sysdig, Datadog, and Prometheus in one go. Check
out

the recording. *

As Docker is used for larger deployments it becomes more important to
get visibility into the status and health of docker environments. In
this article, I aim to go over some of the common tools used to monitor
containers. I will be evaluating these tools based on the following

criteria:

  1. ease of deployment,
  2. level of detail of information presented,
  3. level of aggregation of information from entire deployment,
  4. ability to raise alerts from the data,
  5. ability to monitor non-docker resources, and
  6. cost. This list is by no means comprehensive however I have tried to highlight the most common tools and tools that optimize our six evaluation criteria.
A Detailed Overview of Rancher’s Architecture

This newly-updated, in-depth guidebook provides a detailed overview of the features and functionality of the new Rancher: an open-source enterprise Kubernetes platform.

Get the eBook

Docker Stats

All commands in this article have been specifically tested on
a RancherOS instance running on Amazon Web Services EC2. However, all
tools presented today should be usable on any Docker deployment.

The first tool I will talk about is Docker itself, yes you may not be
aware that docker client already provides a rudimentary command line
tool to inspect containers’ resource consumption. To look at the
container stats run docker stats with the name(s) of the running
container(s) for which you would like to see stats. This will present
the CPU utilization for each container, the memory used and total memory
available to the container. Note that if you have not limited memory for
containers this command will post total memory of your host. This does
not mean each of your container has access to that much memory. In
addition you will also be able to see total data sent and received over
the network by the container.

$ docker stats determined_shockley determined_wozniak prickly_hypatia
CONTAINER CPU % MEM USAGE/LIMIT MEM % NET I/O
determined_shockley 0.00% 884 KiB/1.961 GiB 0.04% 648 B/648 B
determined_wozniak 0.00% 1.723 MiB/1.961 GiB 0.09% 1.266 KiB/648 B
prickly_hypatia 0.00% 740 KiB/1.961 GiB 0.04% 1.898 KiB/648 B

For a more detailed look at container stats you may also use the Docker
Remote API via netcat (See below). Send an http get request for
/containers/[CONTAINER_NAME]/stats where CONTAINER_NAME is name of
the container for which you want to see stats. You can see an example of
the complete response for a container stats request
here. This
will present details of the metrics shown above for example you will get
details of caches, swap space and other details about memory. You may
want to peruse the Run
Metrics
section of the
Docker documentation to get an idea of what the metrics mean.

echo -e “GET /containers/[CONTAINER_NAME]/stats HTTP/1.0rn” | nc -U /var/run/docker.sock

Score Card:

  1. Easy of deployment: *****
  2. Level of detail: *****
  3. Level of aggregation: none
  4. Ability to raise alerts: none
  5. Ability to monitor non-docker resources: none
  6. Cost: Free

CAdvisor

The docker stats command and the remote API are useful for getting
information on the command line, however, if you would like to access
the information in a graphical interface you will need a tool such as
CAdvisor. CAdvisor provides a
visual representation of the data shown by the docker stats command
earlier. Run the docker command below and go to
http://<your-hostname>:8080/ in the browser of your choice to see
the CAdvisor interface. You will be shown graphs for overall CPU usage,
Memory usage, Network throughput and disk space utilization. You can
then drill down into the usage statistics for a specific container by
clicking the Docker Containers link at the top of the page and then
selecting the container of your choice. In addition to these statistics
CAdvisor also shows the limits, if any, that are placed on container,
using the Isolation section.

docker run
–volume=/:/rootfs:ro
–volume=/var/run:/var/run:rw
–volume=/sys:/sys:ro
–volume=/var/lib/docker/:/var/lib/docker:ro
–publish=8080:8080
–detach=true
–name=cadvisor
google/cadvisor:latest

Screen Shot 2015-03-19 at 11.50.29
PM

CAdvisor is a useful tool that is trivially easy to setup, it saves us
from having to ssh into the server to look at resource consumption and
also produces graphs for us. In addition the pressure gauges provide a
quick overview of when a your cluster needs additional resources.
Furthermore, unlike other options in this article CAdvisor is free as it
is open source and also it runs on hardware already provisioned for your
cluster, other than some processing resources there is no additional
cost of running CAdvisor. However, it has it limitations; it can only
monitor one docker host and hence if you have a multi-node deployment
your stats will be disjoint and spread though out your cluster. Note
that you can use
heapster to monitor
multiple nodes if you are running Kubernetes. The data in the charts is
a moving window of one minute only and there is no way to look at longer
term trends. There is no mechanism to kick-off alerting if the resource
usage is at dangerous levels. If you currently do not have any
visibility in to the resource consumption of your docker node/cluster
then CAdvisor is a good first step into container monitoring however, if
you intend to run any critical tasks on your containers a more robust
tool or approach is needed. Note that
Rancher runs CAdvisor on
each connected host, and exposes a limited set of stats through the UI,
and all of the system stats through the API.

Score Card (Ignoring heapster because only supported on Kubernetes):

  1. Easy of deployment: *****
  2. Level of detail: **
  3. Level of aggregation: *
  4. Ability to raise alerts: none
  5. Ability to monitor non-docker resources: none
  6. Cost: Free

ScoutScreen Shot 2015-03-21 at 9.30.08 AM

The next approach for docker monitoring is Scout and it addresses
several of limitations of CAdvisor. Scout is a hosted monitoring service
which can aggregate metrics from many hosts and containers and present
the data over longer time-scales. It can also create alerts based on
those metrics. The first step to getting scout running is to sign up
for a Scout account at https://scoutapp.com/, the free trial account
should be suitable for testing out integration. Once you have created
your account and logged in, click on your account name in the top right
corner and then Account Basics and take note of your Account Key as
you will need this to send metrics from our docker server.

accountidNow
on your host, create a file called scoutd.yml and copy the following
text into the the file, replacing the account_key with the key you
took note of earlier. You can specify any values that make sense for the
host, display_name, environment and roles properties. These will be
used to separate out the metrics when they are presented in the scout
dashboard. I am assuming an array of web-servers is run on docker so
will use the values shown below.

# account_key is the only required value
account_key: YOUR_ACCOUNT_KEY
hostname: web01-host
display_name: web01
environment: production
roles: web

You can now bring up your scout agent with the scout configuration file
by using the docker scout plugin.

docker run -d –name scout-agent
-v /proc:/host/proc:ro
-v /etc/mtab:/host/etc/mtab:ro
-v /var/run/docker.sock:/host/var/run/docker.sock:ro
-v `pwd`/scoutd.yml:/etc/scout/scoutd.yml
-v /sys/fs/cgroup/:/host/sys/fs/cgroup/
–net=host –privileged
soutapp/docker-scout

Now go back to the Scout web view and you should see an entry for your
agent which will be keyed by the display_name parameter (web01) that
you specified in your scoutd.yml earlier.

Screen Shot 2015-03-21 at 9.58.40
AMIf
you click the display name it will display detailed metrics for the
host. This includes the process count, CPU usage and memory utilization
for everything running on your host. Note these are not limited to
processes running inside docker.

Screen Shot 2015-03-21 at 10.00.47
AM

To add docker monitoring to your servers click the Roles tab and then
select All Servers. Now click the + Plugin Template Button and then
Docker Monitor from the following screen to load the details view.
Once you have the details view up select Install Plugin to add the
plugin to your hosts. In the following screen give a name to the plugin
installation and specify which containers you want to monitor. If you
leave the field blank the plugin will monitor all of the containers on
the host. Click complete installation and after a minute or so you can
go to [Server Name] > Plugins to see details from the docker monitor
plugin. The plugin shows the CPU usage, memory usage network throughput
and the number of containers for each host.

Screen Shot 2015-03-20 at 10.11.06
PM

Screen Shot 2015-03-20 at 10.11.39
PMIf
you click on any of the graphs you can pull a detailed view of the
metrics and this view allows you to see the trends in the metric values
across a longer time span. This view also allows you to filter the
metrics based on environment and server role. In addition you can create
“Triggers” or alerts to send emails to you if metrics go above or
below a configured threshold. This allows you to setup automated alerts
to notify you if for example some of your containers die and the
container count falls below a certain number. You can also setup alerts
for average CPU utilization so if for example if your containers are
running hot you will get an alert and you can launch more add more hosts
to your docker cluster. To create a trigger select Screen Shot
2015-03-22 at 6.30.25
PMRoles
> All Servers from the top menu and then docker monitor from the
plugins section. Then select triggers from the Plugin template
Administration
menu on the right hand side of the screen. You should
now see an option to ”Add a Trigger” which will apply to the entire
deployment. Below is an example of a trigger which will send out an
alert if the number of containers in the deployment falls below 3. The
alert was created for “All Servers” however you could tag your hosts
with different roles using the scoutd.yml created on the server. Using
the roles you can apply triggers to a sub-set of the servers on your
deployment. For example you could setup an alert for when the number of
containers on your web nodes falls below a certain number. Even with the
role based triggers I still feel that Scout alerting could be better.
This is because many docker deployments have heterogeneous containers on
the same host. In such a scenario it would be impossible to setup
triggers for specific types of containers as roles are applied to all
containers on the host.

Screen Shot 2015-03-22 at 6.33.12
PM

Another advantage of using Scout over CAdvisor is that it has a large
set of plugins
which can pull in
other data about your deployment in addition to docker information. This
allows Scout to be your one stop monitoring system instead of having a
different monitoring system for various resources in your system.

One drawback of Scout is that it does not present detailed information
about individual containers on each host like CAdvisor can. This is
problematic, if your are running heterogeneous containers on the same
server. For example if you want a trigger to alert you about issues in
your web containers but not about your Jenkins containers Scout will not
be able to support that use case. Despite the drawbacks Scout is a
significantly more useful tool for monitoring your docker deployments.
However this does come at a cost, ten dollars per monitored host. The
cost could be a factor if you are running a large deployment with many
hosts.

Score Card:

  1. Easy of deployment: ****
  2. Level of detail: **
  3. Level of aggregation: ***
  4. Ability to raise alerts: ***
  5. Ability to monitor non-docker resources: Supported
  6. Cost: $10 / host

Data Dog

From Scout lets move to another monitoring service, DataDog, which
addresses several of the short-comings of Scout as well as all of the
limitations of CAdvisor. To get started with DataDog, first sign up for
a DataDog account at https://www.datadoghq.com/. Once you are signed
into your account you will be presented with list of supported
integrations with instructions for each type. Select docker from the
list and you will be given a docker run command (show below) to copy
into your host. The command will have your API key preconfigured and
hence can be run the command as listed. After about 45 seconds your
agent will start reporting metrics to the DataDog system.

docker run -d –privileged –name dd-agent
-h `hostname`
-v /var/run/docker.sock:/var/run/docker.sock
-v /proc/mounts:/host/proc/mounts:ro
-v /sys/fs/cgroup/:/host/sys/fs/cgroup:ro
-e API_KEY=YOUR_API_KEY datadog/docker-dd-agent

Now that your containers are connected you can go to the Events tab in
the DataDog web console and see all events pertaining to your cluster.
All container launches and terminations will be part of this event
stream.

Screen Shot 2015-03-21 at 2.56.04
PM

You can also click the Dashboards tab and hit create dashboards to
aggregate metrics across your entire cluster. Datadog collects metrics
about CPU usage, memory and I/O for all containers running in the
system. In addition you get counts of running and stopped containers as
well as counts of docker images. The dashboard view allows you to create
graphs for any metric or set of metrics across the entire deployment or
grouped by host or container image. For example the graph below shows
the number of running containers broken down by the image type, I am
running 9 ubuntu:14.04 containers in my cluster at the moment.

Screen Shot 2015-03-21 at 2.35.21
PM
You could also split the same data by Hosts, as the second graph shows,
7 of the containers are running on my Rancher host and the remaining
ones on my local laptop.

Screen Shot 2015-03-21 at 3.14.10
PM

Data Dog also supports alerting using a feature called *Monitors. *A
monitor is DataDog’s equivalent to a Scout trigger and allows you to
define thresholds for various metrics. DataDog’s alerting system is a
lot more flexible and detailed then Scout’s. The example below shows
how to specify that you are concerned about Ubuntu containers
terminating hence you would monitor the docker.containers.running metric
for containers created from the ubuntu:14.04 docker image.

Screen Shot 2015-03-22 at 6.49.53
PM

Then specify the alert conditions to say that if there are fewer than
ten ubuntu containers in our deployment (on average) for the last 5
minutes, you would like to be alerted. Although not shown here you will
also be asked to specify the text of the message which is sent out when
this alert is triggered as well as the target audience for this alert.
In the current example I am using a simple absolute threshold. You can
also specify a delta based alert which triggers if say the avg stopped
container count was four over the last five minutes raise an alert.

Screen Shot 2015-03-22 at 6.49.58
PM

Lastly, using the Metrics Explorer tab you can make ad-hoc
aggregations over your metrics to help debug issues or extract specific
information from your data. This view allows your to graph any metric
over a slice based on container image or host. You may combine output
into a single graph or generate a set of graphs by grouping across
images or hosts.

Screen Shot 2015-03-21 at 2.40.30
PM
DataDog is a significant improvement over scout in terms feature set,
easy of use and user friendly design. However this level of polish comes
with additional cost as each DataDog agent costs $15.

Score Card:

  1. Easy of deployment: *****
  2. Level of detail: *****
  3. Level of aggregation: *****
  4. Ability to raise alerts: Supported
  5. Ability to monitor non-docker resources: *****
  6. Cost: $15 /
    hostsensu

Sensu Monitoring Framework

Scout and Datadog provide centralized monitoring and alerting however
both are hosted services that can get expensive for large deployments.
If you need a self-hosted, centralized metrics service, you may
consider the sensu open source monitoring
framework
. To run the Sensu server you can use
the hiroakis/docker-sensu-server
container. This container installs sensu-server, the uchiwa web
interface, redis, rabbitmq-server, and the sensu-api. Unfortunately
sensu does not have any docker support out of the box. However, using
the plugin system you can configure support for both container metrics
as well as status checks.

Before launch your sensu server container you must define a check that
you can load into the server. Create a file called check-docker.json
and add the following contents into the file. In this file you are
telling the Sensu server to run a script called load-docker-metrics.sh
every ten seconds on all clients which are subscribed to the docker tag.
You will define this script a little later.

{
“checks”: {
“load_docker_metrics”: {
“type”: “metric”,
“command”: “load-docker-metrics.sh”,
“subscribers”: [
“docker”
],
“interval”: 10
}
}
}

Now you can run the sensu server docker container with our check
configuration file using the command below. Once you run the command you
should be able to launch the uchiwa dashboard at
http://YOUR_SERVER_IP:3000 in your browser.

docker run -d –name sensu-server
-p 3000:3000
-p 4567:4567
-p 5671:5671
-p 15672:15672
-v $PWD/check-docker.json:/etc/sensu/conf.d/check-docker.json
hiroakis/docker-sensu-server

Now that the sensu server is up you can launch sensu clients on each of
the hosts running our docker containers. You told the server that the
containers will have a script called load-docker-metrics.sh so lets
create the script and insert it into our client containers. Create the
file and add the text shown below into the file, replacing HOST_NAME
with a logical name for your host . The script below is using the
Docker Remote
API
to
pull in the meta data for running containers, all containers and all
images on the host. It then prints the values out using sensu’s key
value notation. The sensu server will read the output values from the
STDOUT and collect those metrics. This example only pulls these three
values but you could make the script as detailed as required. Note that
you could also add multiple check scripts such as thos, as long as you
reference them in the server configuration file you created earlier. You
can also define that you want the check to fail if the number of running
containers ever falls below three. You can make a check fail by
returning a non-zero value from the check script.

#!/bin/bash
set -e

# Count all running containers
running_containers=$(echo -e “GET /containers/json HTTP/1.0rn” | nc -U /var/run/docker.sock
| tail -n +5
| python -m json.tool
| grep “Id”
| wc -l)
# Count all containers
total_containers=$(echo -e “GET /containers/json?all=1 HTTP/1.0rn” | nc -U /var/run/docker.sock
| tail -n +5
| python -m json.tool
| grep “Id”
| wc -l)

# Count all images
total_images=$(echo -e “GET /images/json HTTP/1.0rn” | nc -U /var/run/docker.sock
| tail -n +5
| python -m json.tool
| grep “Id”
| wc -l)

echo “docker.HOST_NAME.running_containers $”
echo “docker.HOST_NAME.total_containers $”
echo “docker.HOST_NAME.total_images $”

if [ $ -lt 3 ]; then
exit 1;
fi

Now that you have defined your load docker metrics check you need to to
start the sensu client using the
usman/sensu-client
container I defined for this purpose. You can use the command shown
below to launch sensu client. Note that the container must run as
privileged in order to be able to access unix sockets, it must have the
docker socket mounted in as a volume as well as the
load-docker-metrics.sh script you defined above. Make sure the
load-docker-metrics.sh script is marked as executable in your host
machine as the permissions carry through into the container. The
container also takes in SENSU_SERVER_IP, RABIT_MQ_USER,
RABIT_MQ_PASSWORD, CLIENT_NAME and CLIENT_IP as parameters, please
specify the value of these parameters for your setup. The default values
for the RABIT_MQ_USER RABIT_MQ_PASSWORD are sensu and password.

docker run -d –name sensu-client –privileged
-v $PWD/load-docker-metrics.sh:/etc/sensu/plugins/load-docker-metrics.sh
-v /var/run/docker.sock:/var/run/docker.sock
usman/sensu-client SENSU_SERVER_IP RABIT_MQ_USER RABIT_MQ_PASSWORD CLIENT_NAME CLIENT_IP

Screen Shot 2015-03-28 at 10.13.48
PM

A few seconds after running this command you should see the client
count increase to 1 in the uchiwa dashboard. If you click the clients
icon you should see a list of your clients including the client that you
just added. I named my client client-1 and specified the host IP as
192.168.1.1.

Screen Shot 2015-03-28 at 10.13.54
PM

If you click on the client name your should get further details of the
checks. You can see that the load_docker_metrics check was run at
10:22 on the 28th of March.

Screen Shot 2015-03-28 at 10.14.00
PMIf
you Click on the check name you can see further details of check runs.
The zeros indicate that there were no errors, if the script had failed
(if for example your docker Daemon dies) you would see an error code
(non zero) value. Although it is not covered this in the current article
you can also setup sensu to alert you when these checks fail using
Handlers. Furthermore,
uchiwa only shows the values of checks and not the metrics collected.
Note that sensu does not store the collected metrics, they have to be
forwarded to a time series database such as InfluxDB or Graphite. This
is also done through Handlers. Please find details of how to configure
metric forwarding to graphite
here.

Screen Shot 2015-03-28 at 10.27.59
PM

Sensu ticks all the boxes in our evaluation criteria; you can collect as
much detail about our docker containers and hosts as ypu want. In
addition you are able to aggregate the values of all of out hosts in one
place and raise alerts over those checks. The alerting is not as
advanced as DataDog or Scout, as you are only able to alert on checks
failing on individual hosts. However, the big drawback of Sensu is
difficulty of deployment. Although I have automated many steps in the
deployment using docker containers, Sensu remains a complicated system
requiring us to install, launch and maintain separate processes for
Redis, RabitMQ, Sensu API, uchiwa and Sensu Core. Furthermore, you would
require still more tools such as Graphite to present metric values and a
production deployment would require customizing the containers I have
used today for secure passwords and custom ssl certificates. In addition
were you to add more checks after launching the container you would have
to restart the Sensu server as that is the only way for it start
collecting new metrics. For these reasons I rate Sensu fairly low for
ease of deployment.

Easy of deployment: * Level of detail: **** Level of aggregation:
**** Ability to raise alerts: Supported but limited Ability to
monitor non-docker resources: ***** Cost: $Free

I also evaluated two other monitoring services, Prometheus and Sysdig
Cloud in a
second article,
and have included them in this post for simplicity.

Prometheus

First lets take a look at Prometheus; it is a self-hosted set of tools
which collectively provide metrics storage, aggregation, visualization
and alerting. Most of the tools and services we have looked at so far
have been push based, i.e. agents on the monitored servers talk to a
central server (or set of servers) and send out their metrics.
Prometheus on the other hand is a pull based server which expects
monitored servers to provide a web interface from which it can scrape
data. There are several exporters
available
for
Prometheus which will capture metrics and then expose them over http for
Prometheus to scrape. In addition there are
libraries which
can be used to create custom exporters. As we are concerned with
monitoring docker containers we will use the
container_exporter
capture metrics. Use the command shown below to bring up the
container-exporter docker container and browse to
http://MONITORED_SERVER_IP:9104/metrics to see the metrics it has
collected for you. You should launch exporters on all servers in your
deployment. Keep track of the respective *MONITORED_SERVER_IP*s as we
will be using them later in the configuration for Prometheus.

docker run -p 9104:9104 -v /sys/fs/cgroup:/cgroup -v /var/run/docker.sock:/var/run/docker.sock prom/container-exporter

Once we have got all our exporters running we are can launch Prometheus
server. However, before we do we need to create a configuration file for
Prometheus that tells the server where to scrape the metrics from.
Create a file called prometheus.conf and then add the following text
inside it.

global:
scrape_interval: 15s
evaluation_interval: 15s
labels:
monitor: exporter-metrics

rule_files:

scrape_configs:
– job_name: prometheus
scrape_interval: 5s

target_groups:
# These endpoints are scraped via HTTP.
– targets: [‘localhost:9090′,’MONITORED_SERVER_IP:9104’]

In this file there are two sections, global and job(s). In the global
section we set defaults for configuration properties such as data
collection interval (scrape_interval). We can also add labels which
will be appended to all metrics. In the jobs section we can define one
or more jobs that each have a name, an optional override scraping
interval as well as one or more targets from which to scrape metrics. We
are adding two targets, one is the Prometheus server itself and the
second is the container-exporter we setup earlier. If you setup more
than one exporter your can setup additional targets to pull metrics from
all of them. Note that the job name is available as a label on the
metric hence you may want to setup separate jobs for your various types
of servers. Now that we have a configuration file we can start a
Prometheus server using the
prom/prometheus
docker image.

docker run -d –name prometheus-server -p 9090:9090 -v $PWD/prometheus.conf:/prometheus.conf prom/prometheus -config.file=/prometheus.conf

After launching the container, Prometheus server should be available in
your browser on the port 9090 in a few moments. Select Graph from the
top menu and select a metric from the drop down box to view its latest
value. You can also write queries in the expression box which can find
matching metrics. Queries take the form
METRIC_NAME. You can find more
details of the query syntax here.

We are able to drill down into the data using queries to filter out data
from specific server types (jobs) and containers. All metrics from
containers are labeled with the image name, container name and the host
on which the container is running. Since metric names do not encompass
container or server name we are able to easily aggregate data across
our deployment. For example we can filter for the
container_memory_usage_bytes to get
information about the memory usage of all ubuntu containers in our
deployment. Using the built in functions we can also aggregate the
resulting set of of metrics. For example
average_over_time(container_memory_usage_bytes
[5m]) will show the memory used by ubuntu
containers, averaged over the last five minutes. Once you are happy with
with a query you can click over to the Graph tab and see the variation
of the metric over time.

Temporary graphs are great for ad-hoc investigations but you also need
to have persistent graphs for dashboards. For this you can use the
Prometheus Dashboard
Builder
. To launch
Prometheus Dashboard Builder you need access to an SQL database which
you can create using the official MySQL Docker
image
. The command to launch
the MySQL container is shown below, note that you may select any value
for database name, user name, user password and root password however
keep track of these values as they will be needed later.

docker run -p 3306:3306 –name promdash-mysql
-e MYSQL_DATABASE=<database-name>
-e MYSQL_USER=<database-user>
-e MYSQL_PASSWORD=<user-password>
-e MYSQL_ROOT_PASSWORD=<root-password>
-d mysql

Once you have the database setup, use the rake installation inside the
promdash container to initialize the database. You can then run the
Dashboard builder by running the same container. The command to
initialize the database and bring up the Prometheus Dashboard Builder
are shown below.

# Initialize Database
docker run –rm -it –link promdash-mysql:db
-e DATABASE_URL=mysql2://<database-user>:<user-password>@db:3306/<database-name> prom/promdash ./bin/rake db:migrate

# Run Dashboard
docker run -d –link promdash-mysql:db -p 3000:3000 –name prometheus-dash
-e DATABASE_URL=mysql2://<database-user>:<user-password>@db:3306/<database-name> prom/promdash

Once your container is running you can browse to port 3000 and load up
the dashboard builder UI. In the UI you need to click Servers in the
top menu and New Server to add your Prometheus Server as a datasource
for the dashboard builder. Add http://PROMETHEUS_SERVER_IP:9090 to
the list of servers and hit Create Server.

Now click Dashboards in the top menu, here you can create
Directories (Groups of Dashboards) and Dashboards. For example we
created a directory for Web Nodes and one for Database Nodes and in each
we create a dashboard as shown below.

Once you have created a dashboard you can add metrics by mousing over
the title bar of a graph and selecting the data sources icon (Three
Horizontal lines with an addition sign following them ). You can then
select the server which you added earlier, and a query expression which
you tested in the Prometheus Server UI. You can add multiple data
sources into the same graph in order to see a comparative view.

You can add multiple graphs (each with possibly multiple data sources)
by clicking the Add Graph button. In addition you may select the
time range over which your dashboard displays data as well as a refresh
interval for auto-loading data. The dashboard is not as polished as the
ones from Scout and DataDog, for example there is no easy way to explore
metrics or build a query in the dashboard view. Since the dashboard runs
independently of the Prometheus server we can’t ‘pin’ graphs
generated in the Prometheus server into a dashboard. Furthermore several
times we noticed that the UI would not update based on selected data
until we refreshed the page. However, despite its issues the dashboard
is feature competitive with DataDog and because Prometheus is under
heavy development, we expect the bugs to be resolved over time. In
comparison to other self-hosted solutions Prometheus is a lot more user
friendly than Sensu and allows you present metric data as graphs without
using third party visualizations. It also is able to provide much better
analytical capabilities than CAdvisor.

Prometheus also has the ability to apply alerting rules over the input
data and displaying those on the UI. However, to be able to do something
useful with alerts such send emails or notify
pagerduty we need to run the the Alert
Manager
. To run
the Alert Manager you first need to create a configuration file. Create
a file called alertmanager.conf and add the following text into it:

notification_config {
name: “ubuntu_notification”
pagerduty_config {
service_key: “<PAGER_DUTY_API_KEY>”
}
email_config {
email: “<TARGET_EMAIL_ADDRESS>”
}
hipchat_config {
auth_token: “<HIPCHAT_AUTH_TOKEN>”
room_id: 123456
}
}
aggregation_rule {
filter {
name_re: “image”
value_re: “ubuntu:14.04”
}
repeat_rate_seconds: 300
notification_config_name: “ubuntu_notification”
}

In this configuration we are creating a notification configuration
called ubuntu_notification, which specifies that alerts must go to
the PagerDuty, Email and HipChat. We need to specify the relevant API
keys and/or access tokens for the HipChat and PagerDutyNotifications to
work. We are also specifying that the alert configuration should only
apply to alerts on metrics where the label image has the value
ubuntu:14.04. We specify that a triggered alert should not retrigger
for at least 300 seconds after the first alert is raised. We can bring
up the Alert Manager using the docker image by volume mounting our
configuration file into the container using the command shown below.

docker run -d -p 9093:9093 -v $PWD:/alertmanager prom/alertmanager -logtostderr -config.file=/alertmanager/alertmanager.conf

Once the container is running you should be able to point your browser
to port 9093 and load up the Alarm Manger UI. You will be able to see
all the alerts raised here, you can ‘silence’ them or delete them once
the issue is resolved. In addition to setting up the Alert Manager we
also need to create a few alerts. Add rule_file:
“/prometheus.rules” in a new line into the global section of the
prometheus.conf file you created earlier. This line tells Prometheus
to look for alerting rules in the prometheus.rules file. We now need
to create the rules file and load it into our server container. To do so
create a file called prometheus.rules in the same directory where you
created prometheus.conf. and add the following text to it:

ALERT HighMemoryAlert
IF container_memory_usage_bytes > 1000000000
FOR 1m
WITH {}
SUMMARY “High Memory usage for Ubuntu container”
DESCRIPTION “High Memory usage for Ubuntu container on {{$labels.instance}} for container {{$labels.name}} (current value: {{$value}})”

In this configuration we are telling Prometheus to raise an alert called
HighMemoryAlert if the container_memory_usage_bytes metric
for containers using the Ubuntu:14.04 image goes above 1 GB for 1
minute. The summary and the description of the alerts is also specified
in the rules file. Both of these fields can contain placeholders for
label values which are replaced by Prometheus. For example our
description will specify the server instance (IP) and the container name
for metric raising the alert. After launching the Alert Manager and
defining your Alert rules, you will need to re-run your Prometheus
server with new parameters. The commands to do so are below:

# stop and remove current container
docker stop prometheus-server && docker rm prometheus-server

# start new container
docker run -d –name prometheus-server -p 9090:9090
-v $PWD/prometheus.conf:/prometheus.conf
-v $PWD/prometheus.rules:/prometheus.rules
prom/prometheus
-config.file=/prometheus.conf
-alertmanager.url=http://ALERT_MANAGER_IP:9093

Once the Prometheus Server is up again you can click Alerts in the top
menu of the Prometheus Server UI to bring up a list of alerts and their
statuses. If and when an alert is fired you will also be able to see it
in the Alert Manager UI and any external service defined in the
alertmanager.conf file.

Collectively the Prometheus tool-set’s feature set is on par with
DataDog which has been our best rated Monitoring tool so far. Prometheus
uses a very simple format for input data and can ingest from any web
endpoint which presents the data. Therefore we can monitor more or less
any resource with Prometheus, and there are already several libraries
defined to monitor common resources. Where Prometheus is lacking is in
level of polish and ease of deployment. The fact that all components are
dockerized is a major plus however, we had to launch 4 different
containers each with their own configuration files to support the
Prometheus server. The project is also lacking detailed, comprehensive
documentation for these various components. However, in caparison to
self-hosted services such as CAdvisor and Sensu, Prometheus is a much
better toolset. It is significantly easier setup than sensu and has the
ability to provide visualization of metrics without third party tools.
It is able has much more detailed metrics than CAdvisor and is also able
to monitor non-docker resources. The choice of using pull based metric
aggregation rather than push is less than ideal as you would have to
restart your server when adding new data sources. This could get
cumbersome in a dynamic environment such as cloud based deployments.
Prometheus does offer the Push
Gateway
to bridge the
disconnect. However, running yet another service will add to the
complexity of the setup. For these reasons I still think DataDog is
probably easier for most users, however, with some polish and better
packaging Prometheus could be a very compelling alternative, and out of
self-hosted solutions Prometheus is my pick.

Score Card:

  1. Easy of deployment: **
  2. Level of detail: *****
  3. Level of aggregation: *****
  4. Ability to raise alerts: ****
  5. Ability to monitor non-docker resources: Supported
  6. Cost: Free

Sysdig Cloud

Sysdig cloud is a hosted service that provides metrics storage,
aggregation, visualization and alerting. To get started with sysdig sign
up for a trial account at https://app.sysdigcloud.com. and complete
the registration form. Once you complete the registration form and log
in to the account, you will be asked to Setup your Environment and be
given a curl command similar to the shown below. Your command will have
your own secret key after the -s switch. You can run this command on the
host running docker and which you need to monitor. Note that you should
replace the [TAGS] place holder with tags to group your metrics. The
tags are in the format TAG_NAME:VALUE so you may want to add a tag
role:web or deployment:production. You may also use the containerized
sysdig agent.

# Host install of sysdig agent
curl -s https://s3.amazonaws.com/download.draios.com/stable/install-agent | sudo bash -s 12345678-1234-1234-1234-123456789abc [TAGS]

# Docker based sysdig agent
docker run –name sysdig-agent –privileged –net host
-e ACCESS_KEY=12345678-1234-1234-1234-123456789abc
-e TAGS=os:rancher
-v /var/run/docker.sock:/host/var/run/docker.sock
-v /dev:/host/dev -v /proc:/host/proc:ro
-v /boot:/host/boot:ro
-v /lib/modules:/host/lib/modules:ro
-v /usr:/host/usr:ro sysdig/agent

Even if you use docker you will still need to install Kernel headers in
the host OS. This goes against Docker’s philosophy of isolated micro
services. However, installing kernel headers is fairly benign.
Installing the headers and getting sysdig running is trivial if you are
using a mainstream kernel such us CentOS, Ubuntu or Debian. Even the
Amazon’s custom kernels are supported however RancherOS’s custom
kernel presented problems for sysdig as did the tinycore kernel. So be
warned if you would like to use Sysdig cloud on non-mainstream kernels
you may have to get your hands dirty with some system hacking.

After you run the agent you should see the Host in the Sysdig cloud
console in the Explore tab. Once you launch docker containers on the
host those will also be shown. You can see basic stats about the CPU
usage, memory consumption, network usage. The metrics are aggregated for
the host as well as broken down per container.

Screen Shot 2015-04-14 at 12.06.36
PMBy
selecting one of the hosts or containers you can get a whole host of
other metrics including everything provided by the docker stats API. Out
of all the systems we have seen so far sysdig certainly has the most
comprehensive set of metrics out of the box. You can also select from
several pre-configured dashboards which present a graphical or tabular
representation of your deployment.

Screen Shot 2015-04-16 at 11.26.53
AM

You can see live metrics, by selecting Real-time Mode (Target Icon)
or select a window of time over which to average values. Furthermore,
you can also setup comparisons which will highlight the delta of current
values and values at a point in the past. For example the table below
shows values compared with those from ten minutes ago. If the CPU usage
is significantly higher than 10 minutes ago you may be experiencing load
spikes and need to scale out. The UI is at par with, if not better than
DataDog for identifying and exploring trends in the data.Screen Shot
2015-04-19 at 4.59.09
PM

In addition to exploring data on an ad-hoc basis you can also create
persistent dashboards. Simply click the pin icon on any graph in the
explore view and save it to a named dashboard. You can view all the
dashboards and their associated graphs by clicking the Dashboards
tab. You can also select the bell icon on any graph and create an
alert from the data. The Sysdig cloud supports detailed alerting
criteria and is again one of the best we have seen. The example below
shows an alert which triggers if the count of containers labeled web
falls below three on average for the last ten minutes. We are also
segmenting the data by the region tag, so there will be a separate
check for web nodes in North America and Europe. Lastly, we also specify
a Name, description and Severity for the alerts. You can control where
alerts go by going to Settings (Gear Icon) > Notifications and add
email addresses or SNS Topics to send alerts too. Note all alerts go to
all notification endpoints which may be problematic if you want to wake
up different people for different alerts.Screen Shot 2015-04-19 at
4.55.35
PM

I am very impressed with Sysdig cloud as it was trivially easy to setup,
provides detailed metrics with great visualization tools for real-time
and historical data. The requirement to install kernel headers on the
host OS is troublesome though and lack of documentation and support for
non-standard kernels could be problematic in some scenarios. The
alerting system in the Sysdig cloud is among the best we have seen so
far, however, the inability to target different email addresses for
different alerts is problematic. In a larger team for example you would
want to alert a different team for database issues vs web server issues.
Lastly, since it is in beta the pricing for Sysdig cloud is not easily
available. I have reached out to their sales team and will update this
article if and when they get back to me. If sysdig is price competitive
then Datadog has serious competition in the hosted service category.

Score Card:

  1. Easy of deployment: ***
  2. Level of detail: *****
  3. Level of aggregation: *****
  4. Ability to raise alerts: ****
  5. Ability to monitor non-docker resources: Supported
  6. Cost: Must Contact Support

Conclusion

Today’s article has covered several options for monitoring docker
containers, ranging from free options; docker stats, CAdvisor,
Prometheus or Sensu to paid services such as Scout, Sysdig Cloud and
DataDog. From my research so far DataDog seems to be the best-in-class
system for monitoring docker deployments. The setup was complete in
seconds with a one-line command, all hosts were reporting metrics in one
place, historical trends were apparent in the UI and Datadog supports
deep diving into metrics as well as alerting. However, at $15 per host
the system can get expensive for large deployments. For larger scale,
self-hosted deployments Sensu is able to fulfill most requirements
however the complexity in setting up and managing a Sensu cluster may be
prohibitive. Obviously, there are plenty of other self-hosted options,
such as Nagios or Icinga, which are similar to Sensu.

Hopefully this gives you an idea of some of the options for monitoring
containers available today. I am continuing to investigate other
options, including a more streamlined self-managed container monitoring
system using CollectD, Graphite or InfluxDB and Grafana. Stay tuned for
more details.

ADDITIONAL INFORMATION: After publishing this article I had some
suggestions to also evaluate Prometheus and Sysdig Cloud, two other very
good options for monitoring Docker. We’ve now included them in this
article, for ease of discovery. You can find the original second part
of my
posthere.

To learn more about monitoring and managing Docker, please join us for
our next Rancher online meetup.

Usman is a server and infrastructure engineer, with experience in
building large scale distributed services on top of various cloud
platforms. You can read more of his work at
techtraits.com, or follow him on twitter
@usman_ismailor
onGitHub.

Source

VM Container | Introducing RancherVM

Expert Training in Kubernetes and Rancher

Join our free online training sessions to learn more about Kubernetes, containers, and Rancher.

Virtual machines and containers are two of my favorite technologies. I
have always wondered about different ways they can work together. It has
become clear over time these two technologies compliment each other.
True there is overlap, but most people who are running containers today
run them on virtual machines, and for good reason. Virtual machines
provide the underlying computing resources and are typically managed by
the IT operations teams. Containers, on the other hand, are managed by
application developers and devops teams. I always thought this was a
good approach, and that for most use cases containers would reside
inside virtual machines. Then, a few months ago, a meeting with Jeremy
Huylebroeck of Orange Silicon Valley changed my thinking. Jeremy
mentioned it might make sense to run virtual machines inside
containers. At first the concept seemed odd. But the more I thought
about it the more I saw its merit. Interestingly numerous use cases for
VM containers started to appear in our conversations with Rancher
users. We have heard three common use cases for VM containers:

  1. Isolation and security. The first reason one might want to run
    VM containers is to retain the isolation and security properties of
    virtual machines while still being able to package and distribute
    software as Docker containers. Despite the great deal of progress in
    container security, virtual machines are still better at isolating
    workloads. Compared with hundreds of Linux kernel interfaces,
    virtual machines have a smaller surface area (CPU, memory,
    networking and storage interfaces) to protect. It is thus not
    surprising that folks who want to host untrusted workloads (for
    example, managed hosting companies and continuous integration
    services) have expressed interest in continuing to use virtual
    machines.
  2. Docker on-boarding. On-boarding existing workloads is always a
    challenge for organizations starting to adopt container
    technologies. This is a second interesting use case for VM
    containers, as they offer a useful transition path. For example,
    while we expect a future version of Windows to support Docker
    containers natively, VM containers can enable organizations to run
    existing Windows virtual machines on the same infrastructure built
    for Linux containers today. The same approach applies to other
    non-Linux operating systems and older version of Linux operating
    systems or application packages that have not yet been
    containerized.
  3. KVM management. We have also seen a great deal of interest in
    better management tools for open source virtualization technologies
    like KVM. At its core, KVM is solid. It is reliable and efficient.
    However, KVM lacks the rich management tools in vSphere that IT
    operations teams love. KVM can benefit from Docker, which offers a
    superb experience for application developers and devops teams. If
    KVM runs inside Docker containers, the resulting VM container can
    retain the security, reliability, and efficiency of KVM, while
    offering the Docker management experience devops teams love. The
    ability to package virtual machines as Docker images and distribute
    them through Docker Hub is valuable. Powerful service discovery
    mechanisms developed for containers can now apply to virtual
    machines. Native container management systems like Rancher can now
    be used to manage virtual machine workloads at large scale.

Because of all of these use cases, I started experimenting with running
KVM inside Docker containers, and I have come up with an experimental
system called RancherVM. RancherVM allows you to package KVM images
inside Docker images and manage VM containers using the familiar Docker
commands. A VM container looks and feels like a regular container. It
can be created from Dockerfile, distributed using DockerHub, managed
using docker command line, and networked together using links and port
bindings. Inside each VM container, however, is a virtual machine
instance. You can package any QEMU/KVM image as RancherVM containers.
RancherVM accomplishes all this without introducing any performance
overhead against running KVM without containers. How RancherVM
Works RancherVM additionally comes with a
management container that provides a web UI for managing virtual
machines. The following command starts the RancherVM management
container on a server where Docker and KVM are installed:

docker run -v /var/run/docker.sock:/var/run/docker.sock -p 8080:80 -v /tmp/ranchervm:/ranchervm rancher/ranchervm

Once the management container is up, you can access a web-based virtual
machine management experience for VM containers at
https://<kvmhost>:8080/:
RancherVM-Mgmt The web-based UI allows you to perform basic life-cycle
operations for VM containers and access the VNC console for virtual
machines. VNC console access comes in handy when you need to perform
operations that cannot be performed with remote SSH or RDP, such as
troubleshooting a Windows VM’s network configuration:
RancherVM-Windows The web UI experience is attractive for users familiar
with VM management tools. A great benefit of RancherVM vs. traditional
VM management is we can now use the powerful Docker command lines to
manage virtual machines. The following command, for example, starts a
RancherOS VM:

docker run -e “RANCHER_VM=true” –cap-add NET_ADMIN -v /tmp/ranchervm:/ranchervm –device /dev/kvm:/dev/kvm –device /dev/net/tun:/dev/net/tun rancher/vm-rancheros

Other than some command-line options required to setup a Docker
container to host KVM, this is just a normal docker command used to
instantiate a container image called rancher/vm-rancheros. Additional
docker commands like docker stop, docker ps, docker images, and
docker inspect all work as expected. The following video shows the
live experience of using RancherVM.

Today we’re making RancherVM available on
GitHub. I hope the initial release of
RancherVM gives you some ideas about building and using VM containers.
If you are interested, please check out the demo video, download the
software, and create some VM containers for yourself. If you have any
questions or issues, please file them as issues in GitHub and we’ll
respond as quickly as possible. On May 13th we will be hosting an online
meetup to demonstrate RancherVM, show a few use cases, and answer any
questions you might have. Please register to attend below.

Source

Docker Monitoring Continued: Prometheus and Sysdig

I recently compared several docker monitoring tools and services. Since the article went live we have gotten feedback about additional tools that should be included in our survey. I would like to highlight two such tools;
Prometheus and Sysdig cloud. Prometheus is a capable self-hosted
solution which is easier to manage than sensu. Sysdig cloud on the other
hand provides us with another hosted service much like Scout and
Datadog. Collectively they help us add more choices to their respective
classes. As before I will be using the following six criteria to
evaluate Prometheus and Sysdig cloud: 1) ease of deployment, 2) level of
detail of information presented, 3) level of aggregation of information
from entire deployment, 4) ability to raise alerts from the data and 5)
Ability to monitor non-docker resources 6) cost.

Prometheus

First lets take a look at Prometheus; it is a self-hosted set of tools
which collectively provide metrics storage, aggregation, visualization
and alerting. Most of the tools and services we have looked at so far
have been push based, i.e. agents on the monitored servers talk to a
central server (or set of servers) and send out their metrics.
Prometheus on the other hand is a pull based server which expects
monitored servers to provide a web interface from which it can scrape
data. There are several exporters
available
for
Prometheus which will capture metrics and then expose them over http for
Prometheus to scrape. In addition there are
libraries which
can be used to create custom exporters. As we are concerned with
monitoring docker containers we will use the
container_exporter
capture metrics. Use the command shown below to bring up the
container-exporter docker container and browse to
http://MONITORED_SERVER_IP:9104/metrics to see the metrics it has
collected for you. You should launch exporters on all servers in your
deployment. Keep track of the respective *MONITORED_SERVER_IP*s as we
will be using them later in the configuration for Prometheus.

docker run -p 9104:9104 -v /sys/fs/cgroup:/cgroup -v /var/run/docker.sock:/var/run/docker.sock prom/container-exporter

Once we have got all our exporters running we are can launch Prometheus
server. However, before we do we need to create a configuration file for
Prometheus that tells the server where to scrape the metrics from.
Create a file called prometheus.conf and then add the following text
inside it.

global:
scrape_interval: 15s
evaluation_interval: 15s
labels:
monitor: exporter-metrics

rule_files:

scrape_configs:
– job_name: prometheus
scrape_interval: 5s

target_groups:
# These endpoints are scraped via HTTP.
– targets: [‘localhost:9090′,’MONITORED_SERVER_IP:9104’]

In this file there are two sections, global and job(s). In the global
section we set defaults for configuration properties such as data
collection interval (scrape_interval). We can also add labels which
will be appended to all metrics. In the jobs section we can define one
or more jobs that each have a name, an optional override scraping
interval as well as one or more targets from which to scrape metrics. We
are adding two targets, one is the Prometheus server itself and the
second is the container-exporter we setup earlier. If you setup more
than one exporter your can setup additional targets to pull metrics from
all of them. Note that the job name is available as a label on the
metric hence you may want to setup separate jobs for your various types
of servers. Now that we have a configuration file we can start a
Prometheus server using the
prom/prometheus
docker image.

docker run -d –name prometheus-server -p 9090:9090 -v $PWD/prometheus.conf:/prometheus.conf prom/prometheus -config.file=/prometheus.conf

After launching the container, Prometheus server should be available in
your browser on the port 9090 in a few moments. Select Graph from the
top menu and select a metric from the drop down box to view its latest
value. You can also write queries in the expression box which can find
matching metrics. Queries take the form
METRIC_NAME. You can find more
details of the query syntax here.

We are able to drill down into the data using queries to filter out data
from specific server types (jobs) and containers. All metrics from
containers are labeled with the image name, container name and the host
on which the container is running. Since metric names do not encompass
container or server name we are able to easily aggregate data across
our deployment. For example we can filter for the
container_memory_usage_bytes to get
information about the memory usage of all ubuntu containers in our
deployment. Using the built in functions we can also aggregate the
resulting set of of metrics. For example
average_over_time(container_memory_usage_bytes
[5m]) will show the memory used by ubuntu
containers, averaged over the last five minutes. Once you are happy with
with a query you can click over to the Graph tab and see the variation
of the metric over time.

Temporary graphs are great for ad-hoc investigations but you also need
to have persistent graphs for dashboards. For this you can use the
Prometheus Dashboard
Builder
. To launch
Prometheus Dashboard Builder you need access to an SQL database which
you can create using the official MySQL Docker
image
. The command to launch
the MySQL container is shown below, note that you may select any value
for database name, user name, user password and root password however
keep track of these values as they will be needed later.

docker run -p 3306:3306 –name promdash-mysql
-e MYSQL_DATABASE=<database-name>
-e MYSQL_USER=<database-user>
-e MYSQL_PASSWORD=<user-password>
-e MYSQL_ROOT_PASSWORD=<root-password>
-d mysql

Once you have the database setup, use the rake installation inside the
promdash container to initialize the database. You can then run the
Dashboard builder by running the same container. The command to
initialize the database and bring up the Prometheus Dashboard Builder
are shown below.

# Initialize Database
docker run –rm -it –link promdash-mysql:db
-e DATABASE_URL=mysql2://<database-user>:<user-password>@db:3306/<database-name> prom/promdash ./bin/rake db:migrate

# Run Dashboard
docker run -d –link promdash-mysql:db -p 3000:3000 –name prometheus-dash
-e DATABASE_URL=mysql2://<database-user>:<user-password>@db:3306/<database-name> prom/promdash

Once your container is running you can browse to port 3000 and load up
the dashboard builder UI. In the UI you need to click Servers in the
top menu and New Server to add your Prometheus Server as a datasource
for the dashboard builder. Add http://PROMETHEUS_SERVER_IP:9090 to
the list of servers and hit Create Server.

Now click Dashboards in the top menu, here you can create
Directories (Groups of Dashboards) and Dashboards. For example we
created a directory for Web Nodes and one for Database Nodes and in each
we create a dashboard as shown below.

Once you have created a dashboard you can add metrics by mousing over
the title bar of a graph and selecting the data sources icon (Three
Horizontal lines with an addition sign following them ). You can then
select the server which you added earlier, and a query expression which
you tested in the Prometheus Server UI. You can add multiple data
sources into the same graph in order to see a comparative view.

You can add multiple graphs (each with possibly multiple data sources)
by clicking the Add Graph button. In addition you may select the
time range over which your dashboard displays data as well as a refresh
interval for auto-loading data. The dashboard is not as polished as the
ones from Scout and DataDog, for example there is no easy way to explore
metrics or build a query in the dashboard view. Since the dashboard runs
independently of the Prometheus server we can’t ‘pin’ graphs
generated in the Prometheus server into a dashboard. Furthermore several
times we noticed that the UI would not update based on selected data
until we refreshed the page. However, despite its issues the dashboard
is feature competitive with DataDog and because Prometheus is under
heavy development, we expect the bugs to be resolved over time. In
comparison to other self-hosted solutions Prometheus is a lot more user
friendly than Sensu and allows you present metric data as graphs without
using third party visualizations. It also is able to provide much better
analytical capabilities than CAdvisor.

Prometheus also has the ability to apply alerting rules over the input
data and displaying those on the UI. However, to be able to do something
useful with alerts such send emails or notify
pagerduty we need to run the the Alert
Manager
. To run
the Alert Manager you first need to create a configuration file. Create
a file called alertmanager.conf and add the following text into it:

notification_config {
name: “ubuntu_notification”
pagerduty_config {
service_key: “<PAGER_DUTY_API_KEY>”
}
email_config {
email: “<TARGET_EMAIL_ADDRESS>”
}
hipchat_config {
auth_token: “<HIPCHAT_AUTH_TOKEN>”
room_id: 123456
}
}
aggregation_rule {
filter {
name_re: “image”
value_re: “ubuntu:14.04”
}
repeat_rate_seconds: 300
notification_config_name: “ubuntu_notification”
}

In this configuration we are creating a notification configuration
called ubuntu_notification, which specifies that alerts must go to
the PagerDuty, Email and HipChat. We need to specify the relevant API
keys and/or access tokens for the HipChat and PagerDutyNotifications to
work. We are also specifying that the alert configuration should only
apply to alerts on metrics where the label image has the value
ubuntu:14.04. We specify that a triggered alert should not retrigger
for at least 300 seconds after the first alert is raised. We can bring
up the Alert Manager using the docker image by volume mounting our
configuration file into the container using the command shown below.

docker run -d -p 9093:9093 -v $PWD:/alertmanager prom/alertmanager -logtostderr -config.file=/alertmanager/alertmanager.conf

Once the container is running you should be able to point your browser
to port 9093 and load up the Alarm Manger UI. You will be able to see
all the alerts raised here, you can ‘silence’ them or delete them once
the issue is resolved. In addition to setting up the Alert Manager we
also need to create a few alerts. Add rule_file:
“/prometheus.rules” in a new line into the global section of the
prometheus.conf file you created earlier. This line tells Prometheus
to look for alerting rules in the prometheus.rules file. We now need
to create the rules file and load it into our server container. To do so
create a file called prometheus.rules in the same directory where you
created prometheus.conf. and add the following text to it:

ALERT HighMemoryAlert
IF container_memory_usage_bytes > 1000000000
FOR 1m
WITH {}
SUMMARY “High Memory usage for Ubuntu container”
DESCRIPTION “High Memory usage for Ubuntu container on {{$labels.instance}} for container {{$labels.name}} (current value: {{$value}})”

In this configuration we are telling Prometheus to raise an alert called
HighMemoryAlert if the container_memory_usage_bytes metric
for containers using the Ubuntu:14.04 image goes above 1 GB for 1
minute. The summary and the description of the alerts is also specified
in the rules file. Both of these fields can contain placeholders for
label values which are replaced by Prometheus. For example our
description will specify the server instance (IP) and the container name
for metric raising the alert. After launching the Alert Manager and
defining your Alert rules, you will need to re-run your Prometheus
server with new parameters. The commands to do so are below:

# stop and remove current container
docker stop prometheus-server && docker rm prometheus-server

# start new container
docker run -d –name prometheus-server -p 9090:9090
-v $PWD/prometheus.conf:/prometheus.conf
-v $PWD/prometheus.rules:/prometheus.rules
prom/prometheus
-config.file=/prometheus.conf
-alertmanager.url=http://ALERT_MANAGER_IP:9093

Once the Prometheus Server is up again you can click Alerts in the top
menu of the Prometheus Server UI to bring up a list of alerts and their
statuses. If and when an alert is fired you will also be able to see it
in the Alert Manager UI and any external service defined in the
alertmanager.conf file.

Collectively the Prometheus tool-set’s feature set is on par with
DataDog which has been our best rated Monitoring tool so far. Prometheus
uses a very simple format for input data and can ingest from any web
endpoint which presents the data. Therefore we can monitor more or less
any resource with Prometheus, and there are already several libraries
defined to monitor common resources. Where Prometheus is lacking is in
level of polish and ease of deployment. The fact that all components are
dockerized is a major plus however, we had to launch 4 different
containers each with their own configuration files to support the
Prometheus server. The project is also lacking detailed, comprehensive
documentation for these various components. However, in caparison to
self-hosted services such as CAdvisor and Sensu, Prometheus is a much
better toolset. It is significantly easier setup than sensu and has the
ability to provide visualization of metrics without third party tools.
It is able has much more detailed metrics than CAdvisor and is also able
to monitor non-docker resources. The choice of using pull based metric
aggregation rather than push is less than ideal as you would have to
restart your server when adding new data sources. This could get
cumbersome in a dynamic environment such as cloud based deployments.
Prometheus does offer the Push
Gateway
to bridge the
disconnect. However, running yet another service will add to the
complexity of the setup. For these reasons I still think DataDog is
probably easier for most users, however, with some polish and better
packaging Prometheus could be a very compelling alternative, and out of
self-hosted solutions Prometheus is my pick.

Score Card:

  1. Easy of deployment: **
  2. Level of detail: *****
  3. Level of aggregation: *****
  4. Ability to raise alerts: ****
  5. Ability to monitor non-docker resources: Supported
  6. Cost: Free

Sysdig Cloud

Sysdig cloud is a hosted service that provides metrics storage,
aggregation, visualization and alerting. To get started with sysdig sign
up for a trial account at https://app.sysdigcloud.com. and complete
the registration form. Once you complete the registration form and log
in to the account, you will be asked to Setup your Environment and be
given a curl command similar to the shown below. Your command will have
your own secret key after the -s switch. You can run this command on the
host running docker and which you need to monitor. Note that you should
replace the [TAGS] place holder with tags to group your metrics. The
tags are in the format TAG_NAME:VALUE so you may want to add a tag
role:web or deployment:production. You may also use the containerized
sysdig agent.

# Host install of sysdig agent
curl -s https://s3.amazonaws.com/download.draios.com/stable/install-agent | sudo bash -s 12345678-1234-1234-1234-123456789abc [TAGS]

# Docker based sysdig agent
docker run –name sysdig-agent –privileged –net host
-e ACCESS_KEY=12345678-1234-1234-1234-123456789abc
-e TAGS=os:rancher
-v /var/run/docker.sock:/host/var/run/docker.sock
-v /dev:/host/dev -v /proc:/host/proc:ro
-v /boot:/host/boot:ro
-v /lib/modules:/host/lib/modules:ro
-v /usr:/host/usr:ro sysdig/agent

Even if you use docker you will still need to install Kernel headers in
the host OS. This goes against Docker’s philosophy of isolated micro
services. However, installing kernel headers is fairly benign.
Installing the headers and getting sysdig running is trivial if you are
using a mainstream kernel such us CentOS, Ubuntu or Debian. Even the
Amazon’s custom kernels are supported however RancherOS’s custom
kernel presented problems for sysdig as did the tinycore kernel. So be
warned if you would like to use Sysdig cloud on non-mainstream kernels
you may have to get your hands dirty with some system hacking.

After you run the agent you should see the Host in the Sysdig cloud
console in the Explore tab. Once you launch docker containers on the
host those will also be shown. You can see basic stats about the CPU
usage, memory consumption, network usage. The metrics are aggregated for
the host as well as broken down per container.

Screen Shot 2015-04-14 at 12.06.36
PMBy
selecting one of the hosts or containers you can get a whole host of
other metrics including everything provided by the docker stats API. Out
of all the systems we have seen so far sysdig certainly has the most
comprehensive set of metrics out of the box. You can also select from
several pre-configured dashboards which present a graphical or tabular
representation of your deployment.

Screen Shot 2015-04-16 at 11.26.53
AM

You can see live metrics, by selecting Real-time Mode (Target Icon)
or select a window of time over which to average values. Furthermore,
you can also setup comparisons which will highlight the delta of current
values and values at a point in the past. For example the table below
shows values compared with those from ten minutes ago. If the CPU usage
is significantly higher than 10 minutes ago you may be experiencing load
spikes and need to scale out. The UI is at par with, if not better than
DataDog for identifying and exploring trends in the data.Screen Shot
2015-04-19 at 4.59.09
PM

In addition to exploring data on an ad-hoc basis you can also create
persistent dashboards. Simply click the pin icon on any graph in the
explore view and save it to a named dashboard. You can view all the
dashboards and their associated graphs by clicking the Dashboards
tab. You can also select the bell icon on any graph and create an
alert from the data. The Sysdig cloud supports detailed alerting
criteria and is again one of the best we have seen. The example below
shows an alert which triggers if the count of containers labeled web
falls below three on average for the last ten minutes. We are also
segmenting the data by the region tag, so there will be a separate
check for web nodes in North America and Europe. Lastly, we also specify
a Name, description and Severity for the alerts. You can control where
alerts go by going to Settings (Gear Icon) > Notifications and add
email addresses or SNS Topics to send alerts too. Note all alerts go to
all notification endpoints which may be problematic if you want to wake
up different people for different alerts.Screen Shot 2015-04-19 at
4.55.35
PM

I am very impressed with Sysdig cloud as it was trivially easy to setup,
provides detailed metrics with great visualization tools for real-time
and historical data. The requirement to install kernel headers on the
host OS is troublesome though and lack of documentation and support for
non-standard kernels could be problematic in some scenarios. The
alerting system in the Sysdig cloud is among the best we have seen so
far, however, the inability to target different email addresses for
different alerts is problematic. In a larger team for example you would
want to alert a different team for database issues vs web server issues.
Lastly, since it is in beta the pricing for Sysdig cloud is not easily
available. I have reached out to their sales team and will update this
article if and when they get back to me. If sysdig is price competitive
then Datadog has serious competition in the hosted service category.

Score Card:

  1. Easy of deployment: ***
  2. Level of detail: *****
  3. Level of aggregation: *****
  4. Ability to raise alerts: ****
  5. Ability to monitor non-docker resources: Supported
  6. Cost: Must Contact Support

Source

Docker Load Balancing Now Available in Rancher 0.16

Hello, my name is Alena Prokharchyk and I am a part of the software
development team at Rancher Labs. In this article I’m going to give an
overview of a new feature I’ve been working on, which was released this
week with Rancher 0.16 – a Docker Load Balancing service. One of the
most frequently requested Rancher features, load balancers are used to
distribute traffic between docker containers. Now Rancher users
can configure, update and scale up an integrated load balancing service
to meet their application needs, using either Rancher’s UI or API. To
implement our load balancing functionality we decided to use HAproxy,
which is deployed as a contianer, and managed by the Rancher
orchestration functionality. With Rancher’s Load Balancing capability,
users are now able to use a consistent, portable load balancing service
on any infrastructure where they can run Docker. Whether it is running
in a public cloud, private cloud, lab, cluster, or even on a laptop, any
container can be a target for the load balancer.

Creating a Load Balancer

Once you have an environment running in Rancher, it is simple to create
a Load Balancer. You’ll see a new top level tab in the Rancher UI
called “Balancing” from which you can create and access your load
balancers. Screen Shot 2015-04-12 at 8.24.47
PM
To create a new load balancer click on + Add Load Balancer. You’ll
be given a configuration screen to provide details on how you want the
load balancer to function. Screen Shot 2015-04-12 at 8.25.05
PM
There are a number of different options for configuration, and I’ve
created a video demonstration to walk through the process.

Updating an active Load Balancer

In some cases after your Load Balancer has been created, you might want
to change its settings – for example to add or remove listener ports,
configure a health check, or simply add more target containers. Rancher
performs all the updates without any downtime for your application. To
update the Load Balancer, bring up the Load Balancer “Details” view by
clicking on its name in the UI:
UpdateNavigation
Then navigate to the toolbar of the setting you want to change, and
make the update:
LBUpdateConfig

###

Understanding Health Checks

Health checks can be incredibly helpful when running a production
application. Health checks monitor the availability of target
containers, so that if one of the load balanced containers in your app
becomes unresponsive, it can be excluded from the list of balanced
hosts, until its functioning again. You can delegate this task to the
Rancher Load Balancer by configuring the health check on it from the UI.
Just provide a monitoring URL for the target container, as well as
check intervals and healthy and unhealthy response thresholds. You can
see the UI for this in the image below.
healthCheck

Stickiness Policies

Some applications require that a user continues to connect to the same
backend server within the same login session. This persistence is
achieved by configuring Stickiness policy on the Load Balancer. With
stickiness, you can control whether the session cookie is provided by
the application, or directly from the load balancer.

Scaling your application

The Load Balancer service is primarily used to help scale up
applications as you add additional targets to the load balancer.
However, to provide an additional layer of scaling, the load balancer
itself can also scale across multiple hosts, creating a clustered load
balancing service. With the Load Balancer deployed on multiple hosts,
you can use a Global Load Balancing service, such as Amazon Web
Services, Route 53, to distribute incoming traffic across load
balancers. This can be especially useful when running load balancers in
different physical locations. The diagram below explains how this can be
done. Screen Shot 2015-04-08 at 5.55.10
PM

Load Balancing and Service Discovery

This new load balancing support has plenty of independent value, but it
will also be an important part of the work we’re doing on service
discovery, and support for Docker Compose. We’re still working on this
and testing it, but you should start to see this functionality in
Rancher over the next four to six weeks. If you’d like to learn about
load balancing, Docker Compose, service discovery and running
microservices with Rancher, please join our next online meetup where
we’ll be covering all of these topics by clicking the button
below.  Alena Prokharchyk @lemonjet https://github.com/alena1108

Source

Building a Continuous Integration Environment using Docker, Jenkins and OpenVPN

Build a CI/CD Pipeline with Kubernetes and Rancher 2.0

Recorded Online Meetup of best practices and tools for building pipelines with containers and kubernetes.

Watch the training

RancherVPNSince
I started playing with Docker I have been thinking that its network
implementation is something that will need to be improved before I could
really use it in production. It is based on container links and service
discovery but it only works for host-local containers. This creates
issues for a few use cases, for example when you are setting up services
that need advanced network features like broadcasting/multicasting for
clustering. In this case you must deploy your application stack
containers in the same Docker host, but it makes no sense to deploy a
whole cluster in the same physical or virtual host. Also I would like
containers networking to function without performing any action like
managing port mappings or exposing new ports. This is why networking is
one of my favorite features of Rancher, because it overcomes Docker
network limitations using a software defined network that connects all
docker containers under the same network as if all of them were
physically connected. This feature makes it much easier to interconnect
your deployed services because you don’t have to configure anything. It
just works. **** ****However I was still missing the possibility to
easily reach my containers and services from my PC as if I also was on
the same network again without configuring new firewall rules or mapping
ports. That is why I created a Docker image that extends Rancher network
using OpenVPN. This allows any device that may run OpenVPN client
including PCs, gateways, and even mobile devices or embedded systems to
access your Rancher network in an easy and secure way because all its
traffic is encrypted. There are many use cases and possibilities for
using this, I list some examples:

  • Allow all users in your office to access your containers
  • Enabling oncall sysadmins to access your containers from anywhere at
    any time
  • Or the example that we are carrying out: allowing a user who works
    at home to access your containers

And all this without reconfiguring your Rancher environment every time
that you grant access to someone. In this post we are installing a
minimalistic Continuous Integration (CI) environment on AWS using
Rancher and RancherOS. The main idea is to create a scenario where a
developer who teleworks can easily access our CI environment, without
adding IPs to a whitelist, exposing services to the Internet nor
performing special configurations. To do so we are installing and
configuring these docker images:

  • jenkins: a Jenkins instance to compile a sample WAR hosted in
    github. Jenkins will automatically deploy this application in tomcat
    after compiling it.
  • tutum/tomcat:7.0 – a Tomcat instance for deploying the sample WAR
  • nixel/rancher-vpn-server: a custom OpenVPN image I have created
    specially to extend Rancher network

And we are using a total of 4 Amazon EC2 instances:

  • 1 for running Rancher Server
  • 1 for running VPN server
  • 1 for running Tomcat server
  • 1 for running Jenkins

*RancherVPN* At the end the
developer will be able to browse to Jenkins and Tomcat webapp using his
VPN connection. As you will see, this is easy to achieve because you are
not configuring anything for accessing Tomcat or Jenkins from your PC,
you just launch a container and you are able to connect to it.

Preparing AWS cloud

You need to perform these actions on AWS before setting up the CI
environment. Creating a Key Pair Go to EC2 Console and enter Key
Pairs
section. When you create the Key Pair your browser will download
a private key that you will need later for connecting to your Rancher
Server instance using SSH if you want to. Save this file because you
won’t be able to download it from AWS anymore. Creating a Security
Group Before creating a Security Group go to VPC Console and choose
one VPC and Subnet where you will deploy your EC2 instances. Copy the
VPC ID and Subnet ID and CIDR. Go to EC2 Console and create a Security
Group named Rancher which will allow this inbound traffic:

  • Allow 22/tcp, 2376/tcp and 8080/tcp ports from any source, needed
    for Docker machine to provision hosts
  • Allow 500/udp and 4500/udp ports from any source, needed for Rancher
    network
  • Allow 9345/tcp and 9346/tcp ports from any source, needed for UI
    features like graphs, view logs, and execute shell
  • Allow 1194/tcp and 2222/tcp ports from any source, needed to publish
    our VPN server container

Be sure to select the appropriate VPC in the Security Group dialog.
Creating an Access Key On EC2 Console click your name in the top
menu bar and go to Security Credentials. Expand Access Keys (Access
Key ID and Secret Access Key)
option and create a new Access Key.
Finally click Download Key File because again you won’t be able to do
it later. You will need this for Rancher Server to create Docker hosts
for you.

Installing Rancher Server

Create a new instance on EC2 console that uses rancheros-0.2.1 AMI,
search for it in Community AMIS section. For this tutorial I am using
a basic t1.micro instance with 8GB disk, you may change this to better
fit your environment needs. Now enter Configure Instance Details
screen and select the appropriated Network and Subnet. Then expand
Advanced Details section and enter this user data:

#!/bin/bash
docker run -d -p 8080:8080 rancher/server:v0.14.2

This will install and run Rancher Server 0.14.2 when the instance boots.
Before launching the new instance be sure to choose the Security Group
and Key Pair we just created before. Finally go to Instances menu and
get your Rancher Server instance public IP. After some minutes navigate
to
http://RANCER_SERVER_PUBLIC_IP:8080
and you will enter Rancher UI.

Provisioning Docker hosts

In this section we are creating our Docker hosts. Go to Rancher UI and
click Add Host button, confirm your Rancher Server public IP and then
click Amazon EC2 provider. In this form you need to enter the
following data: host name, Access Key, Secret Key, Region, Zone, VPC
ID, Subnet ID,
and Security Group. Be sure to enter the appropriated
values for Region, Zone, VPC ID and Subnet ID because they must
match those used by Rancher Server instance. You must specify Security
Group name instead its ID, in our case it is named Rancher.
rancher-create-host
Repeat this step three times so Rancher will provision our three Docker
hosts. After some minutes you will see your hosts running in Rancher UI.

rancher-hosts-list

Installing VPN container

Now it’s time to deploy our VPN server container that will extend the
Rancher network. Go to your first host, click Add Container button and
follow these steps:

  1. Enter a name for this container like rancher-vpn-server
  2. Enter docker image: nixel/rancher-vpn-server:latest
  3. Add this TCP port map: 1194 (on Host) to 1194 (in Container)
  4. Add this TCP port map: 2222 (on Host) to 2222 (in Container)

Now expand Advanced Options section and follow these steps:

  1. In Volume section add this new volume to persist VPN
    configuration: /etc/openvpn:/etc/openvpn
  2. In Networking section be sure to select Managed Network on
    docker0
  3. In Security/Host section be sure to enable the Give the container
    full access to the host
    checkbox

After a while you will see your rancher-vpn-server container running on
your first host.
rancher-vpn-server-container
Now you are about to use another nice Rancher feature. Expand your
rancher-vpn-server container menu and click View Logs button as you can
see in the following image:
rancher-tomcat-view-container-logs
Now scroll to top and you will find the information you need in order to
connect with your VPN client. We are using this data later.
**** rancher-vpn-server-logs****

Installing Tomcat container

To install Tomcat container you have to click Add Container button on
your second host and follow these steps:

  1. Enter a name for this container like tomcat
  2. Enter docker image: tutum/tomcat:7.0
  3. *No port map is required*
  4. Expand Advanced Options and in Networking section be sure to
    select Managed Network on docker0

After a while you will see your Tomcat container running on your second
host.
rancher-tomcat-server-container
Now open Tomcat container logs in order to get its admin password, you
are needing it later when configuring Jenkins.
rancher-tomcat-logs

Installing Jenkins container

Click Add Container button on your third host and execute the following
steps:

  1. Enter a name for this container like jenkins
  2. Enter docker image: jenkins
  3. No port map is required

Now expand Advanced Options section and follow these steps:

  1. In Volume section add this new volume to persist Jenkins
    configuration: /var/jenkins_home
  2. In Networking section be sure to select Managed Network on
    docker0

After a while you will see your Jenkins container running on your third
host.
rancher-jenkins-container

Putting it all together

In this final step you are going to install and run the VPN client.
There are two ways to get the client working: using a Docker image I
have prepared that does not require any configuration, or using any
OpenVPN client that you will need to configure. Once the VPN client is
working you are browsing to Jenkins in order to create an example CI job
that will deploy the sample WAR application on Tomcat. You will finally
browse to the sample application so you can see how all this works
together. Installing Dockerized VPN client In a PC with Docker
installed you will execute the command that we saw before in
rancher-vpn-server container logs. According to my example I will
execute this command:

sudo docker run -ti -d –privileged –name rancher-vpn-client -e VPN_SERVERS=54.149.62.184:1194 -e VPN_PASSWORD=mmAG840NGfKEXw73PP5m nixel/rancher-vpn-client:latest

Adapt it to your environment. Then show rancher-vpn-client container
logs:

sudo docker logs rancher-vpn-client

You will see a message printing the route you need to add in your system
in order to be able to reach Rancher network.
rancher-vpn-client-route
In my case I’m executing this command:

sudo route add -net 10.42.0.0/16 gw 172.17.0.8

At this point you are able to ping all your containers, no matter in
which host they run. Now your PC is actually connected to Rancher
network and you can reach any container and service running on your
Rancher infrastructure. If you repeat this step in a Linux Gateway at
your office you will, in fact, expose Rancher network to all the
computers connected in your LAN, which is really interesting.
Installing a custom OpenVPN client If you prefer to use an existing
or custom OpenVPN client, you can do it. You will need your OpenVPN
configuration file that you can get executing the SSH command that we
got before in rancher-vpn-server container log. In my case I can get
RancherVPNClient.ovpn file executing this command:

sshpass -p mmAG840NGfKEXw73PP5m ssh -p 2222 -o ConnectTimeout=4 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no root@54.149.62.184 “get_vpn_client_conf.sh 54.149.62.184:1194” > RancherVPNClient.ovpn

Now, for example, you can execute OpenVPN executing this command:

/usr/sbin/openvpn –config RancherVPNClient.ovpn

You can also use OpenVPN iOS/Android application with this
RancherVPNClient.ovpn file and you will also be able to access your
Rancher network from your mobile or tablet. Again, you can extend your
VPN for all users in your LAN if you repeat this step in a Linux Gateway
in your office. Configuring Jenkins Now it’s time to configure
Jenkins to compile and deploy our sample WAR in Tomcat. Browse to
http://JENKINS_CONTAINER_IP:8080
(in my case http://10.42.13.224:8080) and you will see Jenkins
Dashboard.
jenkins-dashboard
Before starting you must install Github Plugin and Maven following these
steps:

  1. Click Manage Jenkins menu option and then Manage Plugins
  2. Go to Available tab and search for Github plugin, named “Github
    Plugin”. Activate its checkbox
  3. Click Download now and install after restart button
  4. When the plugin is installed enable checkbox Restart Jenkins when
    installation is complete and no jobs are running,
    and then wait for
    Jenkins to be restarted
  5. When Jenkins is running again, go to Manage Jenkins and click
    Configure System
  6. In Maven section click Add Maven button, enter a name for the
    installation and choose last maven version.
  7. Click Save button to finish

jenkins-install-maven
When you are back in Dashboard click create new jobs link and follow
these instructions:

jenkins-git-url

  • In Build section enter the following maven goals and options.
    Replace TOMCAT_CONTAINER_IP with the IP assigned to your
    Tomcat container (10.42.236.18 in my case) and
    TOMCAT_ADMIN_PASSWORD with the password we saw before for
    admin user (6xc3gzOi4pMG in my case).

clean package tomcat7:redeploy -DTOMCAT_HOST=TOMCAT_CONTAINER_IP -DTOMCAT_PORT=8080 -DTOMCAT_USER=admin -DTOMCAT_PASS=TOMCAT_ADMIN_PASSWORD

I am setting this maven configuration:

clean package tomcat7:redeploy -DTOMCAT_HOST=10.42.236.18 -DTOMCAT_PORT=8080 -DTOMCAT_USER=admin -DTOMCAT_PASS=6xc3gzOi4pMG

  • Save your job

jenkins-maven-goals
Now you can click Build Now button to run your job. Open your
execution (listed in Build History table) and then click Console
Output
option. If you go to the bottom you will see something like
this:
jenkins-job-result
Testing the sample application Now browse to
http://TOMCAT_CONTAINER_IP:8080/sample/
and you will see this page showing information about Tomcat server and
your browser client.
sample-application

Conclusion

In this post we have installed a basic Continuous Integration
environment as an example to make your Docker containers reachable from
your PC, your LAN, and even a mobile device or any system that can
execute an OpenVPN client. This is possible thanks to Rancher Network, a
great functionality that improves Docker networking by connecting your
containers under the same network. What we actually did was to extend
Rancher network using an OpenVPN link that is really easy to configure
with Docker, and secure to use because all your traffic is being
encrypted. This functionality can help many companies to better manage
the way they give access to their containers from any unknown or
uncontrolled network. Now you don’t need to think anymore about exposing
or mapping ports, changing firewall rules, or taking care about what
services you publish to the Internet. For more information on managing
docker with Rancher, please join our next online meetup, where we’ll be
demonstrating Rancher, Docker Compose, service discovery and many other
capabilities. Manel Martinez is a Linux
systems engineer with experience in the design and management of
scalable, distributable and highly available open source web
infrastructures based on products like KVM, Docker, Apache, Nginx,
Tomcat, Jboss, RabbitMQ, HAProxy, MySQL and XtraDB. He lives in Spain,
and you can find him on Twitter
@manel_martinezg.

Source

Remembering Paul Hudak

Paul
HudakRenowned computer
scientist Paul Hudak, one of the designers of the Haskell programming
language, died of leukemia this week. There’s been an outpouring of
reactions from people Paul’s life and work has touched. Paul was my
Ph.D. adviser at Yale in the 1990s. He supervised my work, paid for my
education, and created an environment that enabled me to learn from some
of the brightest minds in the world. Paul was an influential figure in
the advancement of functional programming. Functional programming
advocates a declarative style, as opposed to procedural or imperative
style, of programming. For example, instead of writing
result = 0; for (i=0; i<n; i++) result += a[i]; you write
result = sum(a[0:n]). In many cases, the declarative style is easier
to understand and more elegant. Because the declarative style focuses on
what, rather than how to perform the computation, it enables
programmers to worry less about implementation details and gives
compilers more freedom to produce optimized code. One of the strongest
influences of functional programming came from Lambda Calculus, a
mathematical construct formulated by Alonzo Church in the 1930s. Lambda
Calculus has had a huge impact on programming languages even though it
was created before computers were invented. Lambda Calculus introduced
modern programming constructs such as variable bindings, function
definitions, function calls, and recursion. Alan Turing, who studied as
a Ph.D. student under Alonzo Church, proved that Lambda Calculus and
Turing Machine were equivalent in computability. It is therefore
comforting to know that, in theory, whatever a computer can do, we can
write a program for it. In 1977, around the time Paul was starting his
own Ph.D. research, functional programming got a tremendous boost when
John Backus presented his Turing Award lecture titled “Can Programming
be Liberated from the von Neumann Style?” Backus argued conventional
languages designed for sequential “word-at-a-time” processing were too
complex and could no longer keep up with advances in computers. Backus
favored functional style programming which possessed stronger
mathematical properties. The Backus lecture had a strong impact because
it represented a radical departure from his early work in leading the
development of FORTRAN and in participating in the design of ALGOL 60,
the major “von Neumann style” languages of its day. Functional
programming research took off in the 1980s. Researchers from all over
the world created numerous functional programming languages. The
proliferation of languages became a problem. Many of these languages
were similar enough to be understandable by humans. But researchers
could not collaborate on the implementation or run each other’s
programs. In 1987, Paul Hudak and a group of prominent researchers came
together and created Haskell as a common research and education language
for functional programming. As far as I can remember, Paul always
emphasized other people’s contributions to the Haskell language. There’s
no doubt, however, Paul was a major driving force behind Haskell. This
is just the type of leader Paul was. He painted the vision and gathered
the resources. He would create an environment for others to thrive. He
attracted a remarkable group of world-class researchers at Yale Haskell
Group. I made great friends like Rajiv Mirani. I was fortunate to get to
know researchers like John Peterson, Charles Consel, Martin Odersky, and
Mark Jones. Mark Jones, in particular, developed a variant of Haskell
called Gofer. Gofer’s rich type system enabled me to complete my Ph.D.
thesis work on monad transformers. I decided to pursue a Ph.D. in Yale
Haskell Group largely motivated by Paul’s vision that we could make
programmers more efficient by designing better programming languages.
Paul believed programming languages should be expressive enough to make
programmers productive, yet still retain the simplicity so programs
would be easy to understand. He had a favorite saying “the most
important things in programming are abstraction, abstraction,
abstraction,” which meant a well-written program should be clean and
simple to understand with details abstracted away in modules and
libraries. Paul believed compilers should help programmers write correct
code by catching as many mistakes as possible before a program ever
runs. By the time I completed my Ph.D. program, however, we found it
difficult to get the larger world to share the same view. The computing
industry in the late 1990s and early 2000s turned out to be very
different from what functional programming researchers had anticipated.
There were several reasons for this. First, the von Neumann style
computers kept getting better. When I worked on the Haskell compiler,
computers ran at 25MHZ. CPU speed would grow to over 3GHZ in less than
10 years. The miraculous growth of the conventional computing model made
benefits compilers could get from functional programming irrelevant.
Second, the tremendous profit derived from Y2K and Internet build-out
enabled companies to employ industry-scale programming, where armies of
coders built complex systems. One of the last piece of advice Paul gave
me was to accept a job in Silicon Valley working on the then-nascent
non-functional language Java, instead of pursuing a research career on
the East Coast. Like many others, I have witnessed with surprise the
rising interest in programming language design and functional
programming in recent years. No doubt this has to do with the slowing
growth of CPU clock-rate and the growth in multi-core and multi-node
computing. Functional programming frees developers from worrying about
low-level optimization and scheduling, and enables developers to focus
on solving problems at large scale. A more fundamental reason for the
resurgence of functional programming, I believe, lies in the fact
programming has become less of an industrial-scale effort and more of a
small-scale art form. The simplicity and elegance of functional
programming strike a chord with developers. Building on the rich
foundational capabilities nicely abstracted away in web services, open
source modules, and third-party libraries, developers can create
application or infrastructure software quickly and disrupt incumbent
vendors working with outdated practices. I have not kept in touch with
Paul in recent years. But I can imagine it must be incredibly rewarding
for Paul to see the impact of his work and see how the programming model
he worked so hard to advance is finally becoming accepted.

Source

Magento and Docker | Magento Deployment

magento-logo2A
little over a month ago I wrote
about

setting up a Magento cluster on Docker using Rancher. At the I
identified some short comings of Rancher such as its lack of support fot
load-balancing. Rancher released support for load
balancing
and docker
machine
with
0.16, and I would like to revisit our Magento deployment to cover the
use of load balancers for scalability as well as availability.
Furthermore, I would also like to cover how the docker machine
integration makes it easier to launch Rancher compute nodes directly
from the Rancher UI.

Amazon Setup

As before we will be running our cluster on top of AWS hence if you have
not already done so follow the steps outlined in the Amazon Environment
Setup
section of the earlier tutorial to setup an ssh key pair and a
security group. However, unlike earlier we will be using the Rancher UI
to launch compute nodes and will require an Access Key ID and Secret
Access Key
. To create your key and secret click through to the IAM
service and select Users from the menu on the left. Click the Create
User
button and specify rancher as the user name in the subsequent
screen and click Create. You will be given the Access Key ID and
Secret Access Key in the dialogue shown below, keep the information safe
as there is no way to recover the secret and you will need this later.

iam-keyOnce
you have created the IAM user you will also need to give it permissions
to create Amazon Ec2 Instances. To do so select rancher from the user
list and click Attach Policy in the Managed Policies section. Add
the AmazonEC2FullAccess policy to the Rancher user so that we are able
to create the required resources from the Rancher UI when creating
compute nodes. Full access is a little more permissive tan required
however, for the sake of brevity we are not creating custom policy.

Screen Shot 2015-04-27 at 9.03.52
PM

Rancher Setup

After setting up the AWS environment, follow the steps outlined in the
Rancher Server Launch section of the earlier Magento
tutorial

to bring up your Rancher server and browse to
http://RANCHER_SERVER_IP:8080/. *Be sure you are using a version of
Rancher after 0.16.* Load the Hosts tab using the respective option
in the left-side menu and click + Add Host to add rancher compute
nodes. The first time you launch a compute node you will be prompted to
confirm the IP address at which Rancher server is available to your
compute nodes. Specify the Private IP address of the Amazon node on
which Rancher server is running and hit save.

Screen-Shot-2015-04-27-at-9.30.14-PM

In the Add Host screen select the Amzon EC2 Icon and specify the
required information in order to launch a compute node. The required
information is shown below. Enter the access key and secret key that you
created earlier for the rancher IAM user. We are using a t2.micro
instance for our tutorial however you would probably use a larger
instance for your nodes. Select the same VPC as your Rancher server
instance and specify Rancher as the security group to match the
security group that you created earlier in the Environment Setup
section. The compute nodes must be launched in a different availability
zone from the rancher server hence we select Zone c (Our Rancher Server
was in Zone a) . This requirement is due to the fact that Docker Machine
uses the Public IP of compute agents to ssh into them from the Server.
However, a nodes public IP is not addressable from within its own
subnet.

machine

Repeat the steps above to launch five compute nodes; one for the MySQL
database, two for the load-balanced Magento nodes and two for the load
balancers themselves. I have labeled the nodes as DataNode, Magento1,
Magento2, LB1 and LB2. When all nodes come up you should be able to see
them in the Rancher Server UI as shown below.

Screen-Shot-2015-04-27-at-10.45.09-PM

Magento Container Setup

Now that we have our Rancher deployment launched we can setup our
Magento containers. However before we launch our Magento containers we
must first launch a MySQL container to serve as our database and
Memcached containers for caching. Let’s launch our MySQL container first
on one of the compute nodes. We do this by clicking the + Add
Container
on the DataNode host. In the pop up menu we need to specify a
name for our container and mysql as the source image. Select Advanced
Options > Command
> Environment Vars + to add the four required
variables: mysql root password, mysql user, mysql password, and mysql
database. You may choose any values for these the root password and user
password, however, the mysql user and database must be magento. After
adding all of these environment variables, hit create to create the
container. Note that mysql is official Docker mysql image and details
of what is inside this container can be found on
its dockerhub page.

envvars.png

Next we will create the Memcached containers on the two magento compute
nodes, one on each of the Magento nodes. We again give the containers a
name (memcached1 and memcached2) and specify their source images
as memcached. The Memcached containers do not require any further
configuration and therefore we can just click create to setup the
containers. Details of the memcached official container we use can be
found on
its dockerhub page.

Now we are ready to create the magento containers, On the Magento1 host
create a container named magento1 using the image
usman/magento:multinode.
You need to specify the MYSQL_HOST and MEMCACHED_HOST environment
variables using the container IPs that are listed in the Rancher UI.
Note that for Magento1 you should specify the IP of Memcached1.
Similarly launch a second container called magento2 on the Magento2 host
and specify the mysql host and memcached host environment variables. In
a few moments both your magento hosts should be up and ready. Note that
unlike before we did not have to link the mysql and memcached containers
to our magento containers. This is because Rancher now gives all
containers access to each other over a Virtual Private Network (VPN)
without the need for exposing ports or linking containers. Furthermore
we will not need to expose ports on the Magento containers as we will
use the same VPN to allow the load balancers to communicate with our
nodes.

Load balancer Setup

Now that your containers are up we can setup load balancers to split
traffic onto the Magento containers. Select the Balancing tab in the
left side menu then click Balancers and + Add Load Balancer. In the
subsequent screen you can specify a name and description for your load
balancer. Next you can select the hosts on which to run balancer
containers run. in our case we can select both LB1 and LB2. We then need
to select the two Magento containers as targets. In the Listening
Ports
section we need to specify that our Magento containers are
listening for HTTP traffic on port 80 and that we want load balancers to
also listen to http traffic on port 80.

Screen Shot 2015-04-29 at 9.22.12 PM

Lastly, click on the Health Check tab and specify that the load
balancers should send a GET request to the root URI every 2000 ms to
check that the container is still healthy. If three consecutive health
checks fail then the container will be marked as unhealthy and no
further traffic will be routed to it until it can respond successfully
to two consecutive health checks. In a few moments your load balancers
will be ready and you can load Magento on the public IP of either load
balancer host. You will need to look for the IP in the Amazon EC2
console as the Rancher UI only shows the private IP of the nodes. Once
you load the Magento UI follow the steps outlined in the previous
tutorial to setup your connection the MySQL and to setup a magento
account.

Screen Shot 2015-04-28 at 10.22.36
PM

###

DNS Round-robin Setup using Amazon Route 53

Now that we have our load balancers up and running we can split traffic
onto our two Magento contianers but we still must send our requests to
one balancer or the other. To enable routing to both load balancers
transparently we need to setup DNS round-robin. For this you may use any
DNS provider of your choice but since we are using Amazon EC2 we will
use Amazon’s Route 53 service. Use the Top menu to select the Route
53
service and select Hosted Zones from the left menu. If you don’t
already have a registered domain and hosted zone you may have to create
one. We are using the rancher-magento.com domain and hosted zone. In
your hosted zone click the Create Record Set button and specify a
subdomain such as lb.rancher-magento.com in the form which loads to
the right of the screen*. S*elect type A – IPv4 address and specify
the public IP address of one of your load balancer hosts. In the
Routing Policy section select Weighted, and enter 10 as the weight.
Enter 1 as the Set ID and click Save Record Set. Repeat exactly the
same process once more but use the public IP of the second load-balancer
host. This pair of DNS entries is specifying that we want to route
clients who ask for lb.rancher-magento.com to the two specified IPs.
Since the IPs records have the same weight the traffic will be split
evenly between the two load balancers. We can now load up our Magento UI
using http://lb.rancher-magento.com instead of having to specify the
IP.

Screen Shot 2015-04-29 at 9.47.28
PM

Wrapping up

rancher-machine

Putting it all together we get a cluster setup as shown above. Using the
DNS entries our web browsers are directed to one of the load balancers
LB1, or LB2. By having two load balancers we have split traffic and
hence reduced the load on each of our load balancer instances. The load
balancers will then proxy traffic to either Magento1 or Magento2. This
again allows us to spread the load to the separate containers running on
their own hosts. We have setup only two Magento containers but your
could setup as many as you need. Furthermore, the health check setup
ensures that if one of the Magento containers fails the traffic will
quickly be diverted to the remaining container without human
intervention. Each of the Magento containers has a Memcached server
running on its own host to provide fast access to frequently used data.
However, both magento containers use the same MySQL container to ensure
consistency between the two containers. By using Rancher’s docker
machine support we were able to launch all hosts (other than Rancher
Server) directly from the Rancher UI. In addition, due to Rancher’s VPN
we did not have to expose ports on any of our containers nor did we have
to link containers. This greatly simplifies the Magento container setup
logic. With support for load balancers and machine (as well as docker
compose coming soon), Rancher is becoming a much more viable option for
running large scale user facing deployments.

To learn more about Rancher, please join us for one of our monthly
online meetups. You can register for an upcoming meetup by following the
link below.

Source