Docker Networking Tip – Macvlan driver

Last few months, I have been looking at Docker forums(https://forums.docker.com/, https://stackoverflow.com/questions/tagged/docker) and trying to understand some of the common questions/issues faced in the Docker Networking area. This prompted me to do 2 presentations:

I received positive feedback to these 2 presentations. As a next step, I thought preparing each Docker Networking tip as a video can help some folks to get a better picture. As a first attempt, I prepared Macvlan driver as my first Docker Networking video tip. Following is the associated Youtube video and presentation.

If you think this is useful and would like to see more videos, please let me know. Based on the feedback received, I will try to create more Docker Networking tips in video format.

Source

Bringing CoreOS technology to Red Hat OpenShift to deliver a next-generation automated Kubernetes platform

Bringing CoreOS technology to Red Hat OpenShift to deliver a next-generation automated Kubernetes platform

In the months since CoreOS was acquired by Red Hat, we’ve been building on our vision of helping companies achieve greater operational efficiency through automation. Today at Red Hat Summit we’ve outlined our roadmap for how we plan to integrate the projects and technologies started at CoreOS with Red Hat’s, bringing software automation expertise to customers and the community.

Enterprise Kubernetes users can greatly benefit from the planned addition of many popular Tectonic features to Red Hat OpenShift Container Platform, the industry’s most comprehensive enterprise Kubernetes platform. Quay, the leading container registry, is now backed by Red Hat as Red Hat Quay. Container Linux will continue to provide a free, fast-moving, and automated container host, and is expected to provide the basis for new operating system projects and offerings from Red Hat. And open source projects including etcd, Ignition, dex, Clair, Operators and more will continue to thrive as part of Red Hat’s commitment to driving community innovation around containers and Kubernetes.

Essentially, CoreOS technologies are being woven into the very fabric of Red Hat’s container-native products and projects and we are excited to continue delivering on the vision to make automated operations a reality.

The original container-native Linux

Since Red Hat’s acquisition of CoreOS was announced, we received questions on the fate of Container Linux. CoreOS’s first project, and initially its namesake, pioneered the lightweight, “over-the-air” automatically updated container native operating system that fast rose in popularity running the world’s containers.

With the acquisition, Container Linux will be reborn as Red Hat CoreOS, a new entry into the Red Hat ecosystem. Red Hat CoreOS will be based on Fedora and Red Hat Enterprise Linux sources and is expected to ultimately supersede Atomic Host as Red Hat’s immutable, container-centric operating system.

Red Hat CoreOS will provide the foundation for Red Hat OpenShift Container Platform, Red Hat OpenShift Online, and Red Hat OpenShift Dedicated. Red Hat OpenShift Container Platform will also, of course, continue to support Red Hat Enterprise Linux for those who prefer its lifecycle and packaging as the foundation for their Kubernetes deployments.

Current Container Linux users can rest easy that Red Hat plans continue investing in the operating system and community. The project is an important base for container-based environments by delivering automated updates with strong security capabilities, and as a part of our commitment and vision we plan to support Container Linux as you know it today for the community and Tectonic users alike.

Integrating Tectonic Automated Operations Into OpenShift

CoreOS Tectonic was created with a vision of a fully automated container platform that would relieve many of the burdens of day-to-day IT operations. This vision will now help craft the next generation of Red Hat OpenShift Container Platform, providing an advanced container experience for operators and developers alike.

With automated operations coming to OpenShift, IT teams will be able to use the automated upgrades of Tectonic paired with the reliability, support, and extensive application development capabilities of Red Hat OpenShift Container Platform. This makes managing large Kubernetes deployments easier without sacrificing other enterprise needs, including platform stability or continued support for existing IT assets.

We believe this future integrated platform will help to truly change the way IT teams deliver applications by providing speed to market through consistent deployment methods and automated operations throughout the stack.

In the meantime, current Tectonic customers will continue to receive support and updates for the platform. They can also have confidence that they will be able to transition to Red Hat OpenShift Container Platform in the future with little to no disruption, as almost all Tectonic features will be retained in Red Hat OpenShift Container Platform.

Automated Applications via the Operator Framework

We are also focusing on automating the application layer of the stack. At KubeCon we introduced and open sourced the Operator Framework. Today we are showing how we plan to put Operators into practice. Red Hat is working on a future enhancement that will enable software partners to test and validate their Operators for Red Hat OpenShift Container Platform. More than 60 software partners have committed to supporting the Kubernetes Operator Framework initiative introduced by Red Hat, including Couchbase, Dynatrace, Black Duck Software and Crunchy Data, among others.

Our aim is to make it easier for ISVs to bring cloud services, including messaging, big data, analytics, and more, to the hybrid cloud and to address a broader set of enterprise deployment models while avoiding cloud lock-in. Eventually, Red Hat plans to extend the Red Hat Container Certification with support for Operators as tested and validated Kubernetes applications on Red Hat OpenShift. With the Operator Framework in place, software partners have a more consistent, common experience for delivering services on Red Hat OpenShift, enabling ISVs to bring their offerings to market more quickly on any cloud infrastructure where Red Hat OpenShift runs.

The Quay container registry becomes Red Hat Quay

Quay, the container registry, will also continue to live on in the Red Hat container portfolio.

While OpenShift provides an integrated container registry, customers who require more comprehensive enterprise grade registry capabilities now have the option to consume Quay Enterprise and Quay.io from Red Hat. Quay includes automated geographic replication, integrated security scanning with Clair, image time machine for viewing history, rollbacks and automated pruning, and more. Red Hat Quay is available both as an enterprise software solution and as a hosted service at Red Hat Quay.io, with plans for future enhancements and continued integration with Red Hat OpenShift in future releases.

With CoreOS now part of the Red Hat family, we’ve been busy working together to bring more capabilities to enterprise customers, and more muscle to community open source projects. We’re excited to work alongside you with our Red Hat fedoras on to help automate your infrastructure, all the way from the stack to the application layer.

Learn more at Red Hat Summit

Join us at Red Hat Summit in San Francisco or view the Red Hat Summit livestream to learn more. Red Hat is also hosting a press conference live from Red Hat Summit at 11 a.m. PT today to talk about this integration and other news from the event. The press conference is open to all – join or listen to a replay here.

Source

Getting Started with Amazon EKS – Provisioning and Adding Clusters

This is a simple tutorial on how to launch a new Amazon EKS cluster from scratch and attach to Codefresh.

Have an existing Kubernetes cluster you want to add? Please see the docs.

The source code for this tutorial can be found here:
https://github.com/codefresh-io/eks-installer

Overview

Amazon Elastic Container Service for Kubernetes (Amazon EKS) is the latest product release from AWS, offering fully-hosted Kubernetes clusters.

This is great news for AWS users, however it is not overly simple to understand how EKS fits in with various other AWS services.

To help out others get started with Amazon EKS, I’ve put together a Codefresh pipeline setup.yml that does the following:

  1. Bootstraps an EKS cluster and VPC in your AWS account using Terraform
  2. Saves the Terraform statefile in a Codefresh context
  3. Creates some base Kubernetes resources
  4. Initializes Helm in the cluster
  5. Adds the cluster to your Codefresh account

There is also a corresponding teardown.yml that:

  1. Loads the Terraform statefile from Codefresh context
  2. Destroys the EKS cluster from your AWS account using Terraform
  3. Removes the cluster from your Codefresh account

Follow the instructions below to setup these pipelines in your account. After clicking the “Build” button, your cluster should be ready to use in 10-20 minutes!

Setting up the Pipelines

Add Repository and setup.yml pipeline

In your Codefresh account, at the top right of your screen click the “Add Repository” button. Turn on “Add by URL”. Enter the following repository URL (or create and use a fork):
https://github.com/codefresh-io/eks-installer

Click “Next”. Click the “Select” button under “I have a Codefresh.yml file”. For the path to codefresh.yml, enter the following:
.codefresh/setup.yml

Click through the rest of the dialogue to create the setup.yml pipeline.

Configure Triggers

Before going forward, make sure to delete any unwanted trigger configuration that may result in an unexpected EKS cluster launch:

Add teardown.yml pipeline

In the same repository view, click the “Add Pipeline” link. Name this pipeline something like “eks-uninstaller”.

At the bottom of the page, in the “Workflow” section, select “YAML”. Click “Use YAML from Repository”. Enter the following:
.codefresh/teardown.yml

Click “Save”.

Setup Environment Variables

Under the “General” tab, add the following global variables to be used by both of the pipelines:

AWS_ACCESS_KEY_ID encrypted – AWS access key ID
AWS_SECRET_ACCESS_KEY encrypted – AWS secret access key
CLUSTER_NAME – unique EKS cluster name

Additionally, you can add the following optional variables for fine-tuned setup:

CLUSTER_SIZE – number of nodes in ASG (default: 1)
CLUSTER_REGION – AWS region to deploy to (default: us-west-2)
CLUSTER_INSTANCE_TYPE – EC2 instance type (default: m4.large)

Note that at the time of writing, EKS is only available in regions us-east-1 and us-west-2 (and seems to have reached capacity in us-east-1). Your best best is to stick with us-west-2 for now.

Click “Save”.

Create new EKS Cluster

At this point, all you need to do is click “Build” on the setup.yml pipeline (eks-installer)

and wait…

Once the build is complete, navigate to the Kubernetes services page to view your newly-created EKS cluster in Codefresh:

You can then use this cluster to deploy to from your pipelines etc.

Teardown EKS Cluster

Similar to steps above, all you need to do to teardown your EKS cluster is to click “Build” on the teardown.yml pipeline (eks-uninstaller)

and wait…

Once the build is complete, the EKS cluster and all associated AWS resources will be destroyed, and the cluster will be removed from your Codefresh account.

Source

Introducing the Non-Code Contributor’s Guide

Introducing the Non-Code Contributor’s Guide

Author: Noah Abrahams (InfoSiftr), Jonas Rosland (VMware), Ihor Dvoretskyi (CNCF)

It was May 2018 in Copenhagen, and the Kubernetes community was enjoying the contributor summit at KubeCon/CloudNativeCon, complete with the first run of the New Contributor Workshop. As a time of tremendous collaboration between contributors, the topics covered ranged from signing the CLA to deep technical conversations. Along with the vast exchange of information and ideas, however, came continued scrutiny of the topics at hand to ensure that the community was being as inclusive and accommodating as possible. Over that spring week, some of the pieces under the microscope included the many themes being covered, and how they were being presented, but also the overarching characteristics of the people contributing and the skill sets involved. From the discussions and analysis that followed grew the idea that the community was not benefiting as much as it could from the many people who wanted to contribute, but whose strengths were in areas other than writing code.

This all led to an effort called the Non-Code Contributor’s Guide.

Now, it’s important to note that Kubernetes is rare, if not unique, in the open source world, in that it was defined very early on as both a project and a community. While the project itself is focused on the codebase, it is the community of people driving it forward that makes the project successful. The community works together with an explicit set of community values, guiding the day-to-day behavior of contributors whether on GitHub, Slack, Discourse, or sitting together over tea or coffee.

By having a community that values people first, and explicitly values a diversity of people, the Kubernetes project is building a product to serve people with diverse needs. The different backgrounds of the contributors bring different approaches to the problem solving, with different methods of collaboration, and all those different viewpoints ultimately create a better project.

The Non-Code Contributor’s Guide aims to make it easy for anyone to contribute to the Kubernetes project in a way that makes sense for them. This can be in many forms, technical and non-technical, based on the person’s knowledge of the project and their available time. Most individuals are not developers, and most of the world’s developers are not paid to fully work on open source projects. Based on this we have started an ever-growing list of possible ways to contribute to the Kubernetes project in a Non-Code way!

Get Involved

Some of the ways that you can contribute to the Kubernetes community without writing a single line of code include:

The guide to get started with Kubernetes project contribution is documented on Github, and as the Non-Code Contributors Guide is a part of that Kubernetes Contributors Guide, it can be found here. As stated earlier, this list is not exhaustive and will continue to be a work in progress.

To date, the typical Non-Code contributions fall into the following categories:

  • Roles that are based on skill sets other than “software developer”
  • Non-Code contributions in primarily code-based roles
  • “Post-Code” roles, that are not code-based, but require knowledge of either the code base or management of the code base

If you, dear reader, have any additional ideas for a Non-Code way to contribute, whether or not it fits in an existing category, the team will always appreciate if you could help us expand the list.

If a contribution of the Non-Code nature appeals to you, please read the Non-Code Contributions document, and then check the Contributor Role Board to see if there are any open positions where your expertise could be best used! If there are no listed open positions that match your skill set, drop on by the #sig-contribex channel on Slack, and we’ll point you in the right direction.

We hope to see you contributing to the Kubernetes community soon!

Source

Adventures of the Kubernetes Vacuum Robots // Jetstack Blog

18/Jun 2018

By Hannah Morris

Have you ever wondered how to run kubelet on a vacuum robot?

Our guess is, you haven’t – and nor have many other people. However, this didn’t stop Christian’s talk from attracting a large following at KubeCon Europe 2018, nor did it deter some curious conference goers from attempting to win a robot of their own!

You’ll be happy to hear that the robots will be back on stage in Hamburg, where Christian will be talking at Containerdays 2018.

This blog post recounts Christian’s journey with his 3 vacuum robots.

robot

One of the Team

Words of Wisdom from a Domestic God

Christian’s talk starred the Xiamio Mi Vacuum Robot, an affordable piece of kit (in case you were interested in investing). Inspired by a talk at 34C3 in 2017 – which revealed how to gain root access to the Ubuntu Linux operating system of the vacuum – Christian set about to first explain how the vacuum can be provisioned as a node in a Kubernetes cluster, and then how Kubernetes primitives can be used to control it:

  • CronJobs periodically schedule drives
  • a custom Prometheus exporter is used to track metrics of a vacuum’s life

Using custom controllers and CRDs, extended features of the vacuum can be utilised to:

  • request raw sensor readings
  • dump a map of your home
  • allow the vacuum to drive custom paths

Robots in Training

Along with 14 Jetstackers, 3 vacuum robots flew to Copenhagen in early May for the conference. They stayed with Christian in a nice Danish houseboat, which became the designated robot training ground. Christian had them running circles around the living room, as well as fetching him the necessary fuel to keep his strength up ready for the talk…

K8s Beer Run

Christian trained his robots up well

Why running kubelet on your vacuum robot is (not) a good idea!

Those who attended Christian’s talk learnt all about running kubelet on a vacuum robot to make their household chores more interesting, if not easier.

Another thing that we all took away from the talk, is that conference WiFi should never be trusted: the robots were disobedient in the live demo, and – alas! – the stage at KubeCon was left dusty.

Robot Relocation

Following the talk, we had a surprise in store: It was revealed that vacuum robot #3 was to be rehomed, and that one lucky conference goer would have the privilege of taking it away with them!

We decided to pick names from a hat to find our winner. In the moments leading up to the draw, the Jetstack stand was surrounded by a crowd of budding domestic gods and goddesses, all eager to be in with the chance to vacuum their homes with the aid of Kubernetes.

Christian drew the name of the lucky winner from Richard’s cap: Congratulations were in order for Carolina Londoño, who took her new vacuum robot home with her – all the way to Colombia!

carolinarobot

Christian with Carolina in Copenhagen; vacuum robot #3 en route to Colombia

Containerdays, Hamburg 2018

Catch Christian and his vacuum robots at Containerdays 2018 in Hamburg on Tuesday 19th June at 17.20. Here’s to hoping they clean up this time (literally!)

Source

Community Contribution: Heptio Ark Plugin for DigitalOcean

A central tenet of Heptio’s culture is building honest technology. To hold ourselves accountable, we measure the impact of our projects in the open source community and the number of contributions from partners.

Today, we’re excited to announce that StackPointCloud expanded the Heptio Ark ecosystem through their development of a Heptio Ark Block Storage plugin for DigitalOcean. You can read their blog on the plugin, including a simple how-to description, right here.

Out of the box, Heptio Ark enables disaster recovery by backing up Kubernetes resources and snapshotting Persistent Volumes for clusters running in Google Cloud, Amazon Web Services, and Microsoft Azure. Disaster recovery for other Persistent Volumes is delivered through a Restic integration that enables filesystem-based snapshotting.

Now with the DigitalOcean Heptio Ark plugin, users can take native snapshots of their DigitalOcean Block Storage Volumes. Additionally, DigitalOcean Spaces provide an s3-compatible API so users can store their Heptio Ark backups in a local object storage. These features improve the speed of backups and offers a consistent user experience across cloud providers.

It’s remarkable how quickly Kubernetes has grown as evidenced by the recently announced DigitalOcean Kubernetes offering. DigitalOcean joins the increasing number of cloud service providers that offer managed Kubernetes clusters. Developers and operators can have confidence that Heptio Ark provides consistent disaster recovery regardless of where they run Kubernetes.

Learn more

Want to learn more about managing Kubernetes disaster recovery using Heptio Ark or need to develop a custom plugin? We recommend joining our Google group and Slack channel. Or, if you’re interested in contributing to Heptio Ark, you’ll find several GitHub issues labeled as Good First Issue and Help Wanted. Take a look — we would welcome your participation!

Source

Docker features for handling Container’s death and resurrection

Docker containers provides an isolated sandbox for the containerized program to execute. One-shot containers accomplishes a particular task and stops. Long running containers runs for an indefinite period till it either gets stopped by the user or when the root process inside container crashes. It is necessary to gracefully handle container’s death and to make sure that the Job running as container does not get impacted in an unexpected manner. When containers are run with Swarm orchestration, Swarm monitors the containers health, exit status and the entire lifecycle including upgrade and rollback. This will be a pretty long blog. I did not want to split it since it makes sense to look at this holistically. You can jump to specific sections by clicking on the links below if needed. In this blog, I will cover the following topics with examples:

Handling Signals and exit codes

When we pass a signal to container using Docker CLI, Docker passes the signal to the main process running inside container(PID-1). This link has the list of all Linux signals. Docker exit codes follow the chroot exit standard for Docker defined exit codes. Other standard exit codes can come from the program running inside container. Container exit code can be seen from container events coming from Docker daemon when the container exits. For containers that have not been cleaned up, exit code can be found from “docker ps -a”.
Following is a sample “docker ps -a” output where nginx container exited with exit code 0. Here, I used “docker stop” to stop the container.

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
32d675260384 nginx “nginx -g ‘daemon …” 18 seconds ago Exited (0) 7 seconds ago web

Following is a sample “docker ps -a” output where nginx container exited with exit code 137. Here, I used “docker kill” to stop the container.

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9b5d8348cb89 nginx “nginx -g ‘daemon …” 11 seconds ago Exited (137) 2 seconds ago web

Following is the list of standard and Docker defined exit codes:

0: Success
125: Docker run itself fails
126: Contained command cannot be invoked
127: Containerd command cannot be found
128 + n: Fatal error signal n:
130: (128+2) Container terminated by Control-C
137: (128+9) Container received a SIGKILL
143: (128+15) Container received a SIGTERM
255: Exit status out of range(-1)

Following is a simple Python program that handles Signals. This program will be run as Docker container to illustrate Docker signals and exit codes.

#!/usr/bin/python

import sys
import signal
import time

def signal_handler_int(sigid, frame):
print “signal”, sigid, “,”, “Handling Ctrl+C/SIGINT!”
sys.exit(signal.SIGINT)

def signal_handler_term(sigid, frame):
print “signal”, sigid, “,”, “Handling SIGTERM!”
sys.exit(signal.SIGTERM)

def signal_handler_usr(sigid, frame):
print “signal”, sigid, “,”, “Handling SIGUSR1!”
sys.exit(0)

def main():
# Register signal handler
signal.signal(signal.SIGINT, signal_handler_int)
signal.signal(signal.SIGTERM, signal_handler_term)
signal.signal(signal.SIGUSR1, signal_handler_usr)

while True:
print “I am alive”
sys.stdout.flush()
time.sleep(1)

# This is the standard boilerplate that calls the main() function.
if __name__ == ‘__main__’:
main()

Following is the Dockerfile to convert this to container:

FROM python:2.7
COPY ./signalexample.py ./signalexample.py
ENTRYPOINT [“python”, “signalexample.py”]

Lets build the container:

docker build –no-cache -t smakam/signaltest:v1 .

Lets start the container:

docker run -d –name signaltest smakam/signaltest:v1

We can watch the logs from container using docker logs:

docker logs -f signaltest

The Python program above handles SIGINT, SIGTERM and SIGUSR1. We can pass these signals to the container using Docker CLI.
Following command sends SIGINT to the container:

docker kill –signal=SIGINT signaltest

In the Docker logs, we can see the following to show that this signal is handled:

signal 2 , Handling Ctrl+C/SIGINT!

Following output shows the container exit status:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c06266e79a43 smakam/signaltest:v1 “python signalexam…” 36 seconds ago Exited (2) 3 seconds ago signaltest

Following command sends SIGTERM to the container:

docker kill –signal=SIGTERM signaltest

In the Docker logs, we can see the following to show that this signal is handled:

signal 15 , Handling SIGTERM!

Following output shows the container exit status:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
0149708f42b2 smakam/signaltest:v1 “python signalexam…” 10 seconds ago Exited (15) 2 seconds ago signaltest

Following command sends SIGUSR1 to the container:

docker kill –signal=SIGUSR1 signaltest

In the Docker logs, we can see the following to show that this signal is handled:

signal 15 , Handling SIGUSR1!

Following output shows the container exit status:

$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c92f7b4dd45b smakam/signaltest:v1 “python signalexam…” 12 seconds ago Exited (0) 2 seconds ago signaltest

When we execute “docker stop “, Docker first sends SIGTERM signal to the container, waits for some time and then sends SIGKILL. This is done so that the program executing inside the container can use the SIGTERM signal to do the graceful shutdown of the program.

Common mistake in Docker signal handling

In the above example, the python program runs as PID 1 inside container since we used the EXEC form of ENTRYPOINT in Dockerfile. If we use the background method of ENTRYPOINT, shell process runs as PID 1 and the python program runs as another process. Following is a sample Dockerfile for starting the program as background process.

FROM python:2.7
COPY ./signalexample.py ./signalexample.py
ENTRYPOINT python signalexample.py

In this example, Docker passes the signal to the shell process instead of to the Python program. This causes the python program to not see the signal sent to the container. If there are multiple processes running inside the container and we need to pass the signal, 1 possible approach is to run the ENTRYPOINT as a script, handle the signal in the script and pass it to the correct process. 1 example using this approach is mentioned here.

Difference between “docker stop”, “docker rm” and “docker kill”

“docker stop” – Sends SIGTERM to container, waits some time for process to handle it and then sends SIGKILL. Container filesystem remains intact.
“docker kill” – Sends SIGKILL directly. Container filesystem remains intact.
“docker rm” – Removes container filesystem. “docker rm -f” will send SIGKILL and then remove container filesystem.
Using “docker run” with “–rm” option will automatically remove containers including container filesystem when the container exits.

When container exits without the container filesystem getting removed, we can still restart the container.

Container restart policy

Container restart policy controls the restart actions when Container exits. Following are the supported restart options:

  • no – This is default. Containers do not get restarted when they exit.
  • on-failure – Containers restart only when there is a failure exit code. Any exit code other than 0 is treated as failure.
  • unless-stopped – Containers restart as long as it was not manually stopped by user.
  • always – Always restart container irrespective of exit status.

Following is an example of starting “signaltest” container with restart policy of “on-failure” and retry count of 3. Retry count 3 is the number of restarts that will be done by Docker before giving up.

docker run -d –name=signaltest –restart=on-failure:3 smakam/signaltest:v1

To show the restart happening, we can manually try to send signals to the container. In the “signaltest” example, signals SIGTERM, SIGINT and SIGKILL will cause non-zero exit code and SIGUSR1 will cause zero exit code. 1 thing to remember is that restart does not work if we stop the container or send signals using “docker kill”. I think this is because there must be an explicit check added by Docker to prevent restart in these cases since the action is triggered by user.
Lets send SIGINT to the container by passing the signal to the process. We can find the process id by doing “ps -eaf | grep signalexample” in host machine.

kill -s SIGINT

Lets check the “docker ps” output. We can see that the “created” time is 50 seconds. Uptime is less than a second because container restarted.

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b867543b110c smakam/signaltest:v1 “python signalexam…” 50 seconds ago Up Less than a second

Following command shows the restart policy and the restart count for the running container. In this example, container restart happened once.

$ docker inspect signaltest | grep -i -A 2 -B 2 restart
“Name”: “/signaltest”,
“RestartCount”: 1,
“RestartPolicy”: {
“Name”: “on-failure”,
“MaximumRetryCount”: 3

To illustrate that restart does not work on exit code 0, lets send SIGUSR1 to the container that will cause exit code 0.

sudo kill -s SIGUSR1

In this case, container exits, but it does not get restarted.

Container restart does not work with “–rm” option. This is because “–rm” option causes container to be removed as soon as the container exit happens.

Container health check

It is possible that container does not exit but it not performing as per the requirement. Health check probes can be used to identify misbehaving containers and take action rather than waiting till the end when container dies. Health check probes are used to accomplish the specific task of checking container health. For a container like webserver, health check probe can be as simple as sending curl request to webserver port. By using container’s health, we can restart the container if health check fails.
To illustrate health check feature, I have used the container described here.
Following command starts the webserver container with health check capability enabled.

docker run -p 8080:8080 -d –rm –name health-check –health-interval=1s –health-timeout=3s –health-retries=3 –health-cmd “curl -f http://localhost:8080/health || exit 1” effectivetrainings/docker-health

Following are all parameters related to healthcheck:

container_healthcheck

Following “docker ps” output shows container health status:

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
947dad1c1412 effectivetrainings/docker-health “java -jar /app.jar” 28 seconds ago Up 26 seconds (healthy) 0.0.0.0:8080->8080/tcp health-check

This container has a backdoor approach to mark container health as unhealthy. Lets use the backdoor approach to mark container as unhealthy like below:

curl “http://localhost:8080/environment/health?status=false”

Now, lets check the “docker ps” output. The container’s health has now become unhealthy.

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
947dad1c1412 effectivetrainings/docker-health “java -jar /app.jar” 3 minutes ago Up 3 minutes (unhealthy) 0.0.0.0:8080->8080/tcp health-check

Service restart with Swarm

Docker Swarm mode introduces a higher level of abstraction called Service and containers are part of the service. When we create a service, we specify the number of containers that needs to be part of the service using “replicas” parameter. Docker swarm will monitor the number of replicas and if any container dies, Swarm will create new container to keep the replica count as requested by the user.
Below command can be used to create signal service with 2 container replicas:

docker service create –name signaltest –replicas=2 smakam/signaltest:v1

Following command output shows the 2 containers that are part of “signaltest” service:

$ docker service ps signaltest
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
vsgtopkkxi55 signaltest.1 smakam/signaltest:v1 ubuntu Running Running 36 seconds ago
dbbm05w91wv7 signaltest.2 smakam/signaltest:v1 ubuntu Running Running 36 seconds ago

Following parameters control the container restart policy in a service:

service_restart
Lets start the “signaltest” service with restart-condition of “on-failure”:

docker service create –name signaltest –replicas=2 –restart-condition=on-failure –restart-delay=3s smakam/signaltest:v1

Remember that sending signal “SIGTERM”, “SIGINT”, “SIGKILL” causes non-zero container exit codes and sending “SIGUSR1” causes zero container exit code.
Lets first send SIGTERM to 1 of the 2 containers:

docker kill –signal=SIGTERM

Following is the “signaltest” service output that shows the 3 containers including the one that has exited with non-zero status:

$ docker service ps signaltest
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
35ndmu3jbpdb signaltest.1 smakam/signaltest:v1 ubuntu Running Running 4 seconds ago
ullnsqio5151 _ signaltest.1 smakam/signaltest:v1 ubuntu Shutdown Failed 11 seconds ago “task: non-zero exit (15)”
2rfwgq0388mt signaltest.2 smakam/signaltest:v1 ubuntu Running Running 49 seconds ago

Following command sends SIGUSR1 signal to 1 of the containers which causes container to exit with status 0.

docker kill –signal=SIGUSR1

Following command shows that the container did not restart since the container exit code is 0.

$ docker service ps signaltest
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
35ndmu3jbpdb signaltest.1 smakam/signaltest:v1 ubuntu Running Running 52 seconds ago
ullnsqio5151 _ signaltest.1 smakam/signaltest:v1 ubuntu Shutdown Failed 59 seconds ago “task: non-zero exit (15)”
2rfwgq0388mt signaltest.2 smakam/signaltest:v1 ubuntu Shutdown Complete 3 seconds ago

$ docker service ls
ID NAME MODE REPLICAS IMAGE PORTS
xs8lzbqlr69n signaltest replicated 1/2 smakam/signaltest:v1

I don’t see a real need to change the default Swarm service restart policy from “any”.

Service health check

In the previous sections, we saw how to use container health check with “effectivetrainings/docker-health” container. Even though we could detect the container as unhealthy, we could not restart the container automatically. For standalone containers, Docker does not have native integration to restart the container on health check failure though we can achieve the same using Docker events and a script. Health check is better integrated with Swarm. With health check integrated to Swarm, when a container in a service is unhealthy, Swarm automatically shuts down the unhealthy container and starts a new container to maintain the container count as specified in the replica count of a service.

“docker service” command provides following options for health check and associated behavior.

service_health_check

Lets create “swarmhealth” service with 2 replicas of “docker-health” containers.

docker service create –name swarmhealth –replicas 2 -p 8080:8080 –health-interval=2s –health-timeout=10s –health-retries=10 –health-cmd “curl -f http://localhost:8080/health || exit 1” effectivetrainings/docker-health

Following output shows the “swarmhealth” service output and the 2 healthy containers:

$ docker service ps swarmhealth
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
jg8d78inw97n swarmhealth.1 effectivetrainings/docker-health:latest ubuntu Running Running 21 seconds ago
l3fdz5awv4u0 swarmhealth.2 effectivetrainings/docker-health:latest ubuntu Running Running 19 seconds ago

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d9b1f1b0a9b0 effectivetrainings/docker-health:latest “java -jar /app.jar” About a minute ago Up About a minute (healthy) swarmhealth.1.jg8d78inw97nmmbdtjzrscg1q
bb15bfc6e588 effectivetrainings/docker-health:latest “java -jar /app.jar” About a minute ago Up About a minute (healthy) swarmhealth.2.l3fdz5awv4u045g2xiyrbpe2u

Lets mark 1 of the container unhealthy using backdoor command:

curl “http://:8080/environment/health?status=false”

Following output shows that 1 of the containers that has been shutdown which is the unhealthy container and 2 more running replicas. 1 of the replicas got restarted after the other container became unhealthy.

$ docker service ps swarmhealth
ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS
ixxvzyuyqmcq swarmhealth.1 effectivetrainings/docker-health:latest ubuntu Running Running 4 seconds ago
jg8d78inw97n _ swarmhealth.1 effectivetrainings/docker-health:latest ubuntu Shutdown Failed 23 seconds ago “task: non-zero exit (143): do…”
l3fdz5awv4u0 swarmhealth.2 effectivetrainings/docker-health:latest ubuntu Running Running 5 minutes ago

Service upgrade and rollback

When we have new versions of service to be updated without taking service downtime, Docker provides many controls to do the upgrade and rollback. For example, we can control parameters like number of tasks to upgrade at a single time, actions on upgrade failure, delay between task upgrades etc. This helps us achieve release patterns like Blue green and Canary deployment patterns.

Following options are provided by Docker in “docker service” command to control rolling upgrade and rollback.

Rolling upgrade:

service_upgrade

Rollback:

service_rollback

To illustrate service upgrade, I have a simple python webserver program running as container.
Following is the Python program:

#!/usr/bin/python

import sys
from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
import urlparse
import json

class GetHandler(BaseHTTPRequestHandler):

def do_GET(self):
message = “You are using version 1n”
self.send_response(200)
self.end_headers()
self.wfile.write(message)
return

def main():
server = HTTPServer((”, 8000), GetHandler)
print ‘Starting server at http://localhost:8000’
server.serve_forever()

# This is the standard boilerplate that calls the main() function.
if __name__ == ‘__main__’:
main()

This is the Dockerfile to create the Container:

FROM python:2.7
COPY ./webserver.py ./webserver.py
ENTRYPOINT [“python”, “webserver.py”]

I have 2 versions of Container, smakam/webserver:v1 and smakam/webserver:v2. The only difference is the message output that either shows “You are using version 1” or “You are using version 2”.

Lets create version 1 of the service with 2 replicas:

docker service create –name webserver –replicas=2 -p 8000:8000 smakam/webserver:v1

We can access the service using script. The service request will get load balanced between the 2 replicas.

while true; do curl -s “localhost:8000”;sleep 1;done

Following is the service request output that shows we are using version 1 of the service:

You are using version 1
You are using version 1
You are using version 1

Lets upgrade to version 2 of the web service. Since we have specified update-delay of 3 seconds, there will be a 3 second gap between upgrades of 2 replicas. Since the “update-parallelism” default is 1, only 1 task will be upgraded at 1 time.

docker service update –update-delay=3s –image=smakam/webserver:v2 webserver

Following is the service request output output that shows the request slowly getting migrated to version 2 as the upgrade happens 1 replica at a time.

You are using version 1
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 2
You are using version 2

Now, lets rollback to version 1 of the webserver:

docker service update –rollback webserver

Following is the service request output output that shows the request slowly getting downgraded from version 2 to version 1.

You are using version 2
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 2
You are using version 1
You are using version 1

Please let me know your feedback and if you want to see more details on any specific topic related to this. I have put the code associated with this blog here. The containers used in this blog(smakam/signaltest, smakam/webserver) are in Docker hub.

References

Source

Building Armbian images faster using Docker – Own your bits

The people at the Armbian team have been doing an impressive work at optimizing and generating Debian images for many ARM boards.

While they have documentation on the build process, it focuses mostly on setting up a build environment on VirtualBox with Vagrant.

They do support building in a Docker environment, but I found the documentation lacking and I saw myself asking in the forums.

It is easier to set up Docker than virtualization, plus the containers spawn so much faster so we can better iterate and make modifications in the build.

I will just reflect here the commands to produce a fully automated build for an Odroid XU4, headless Debian Stretch image

git clone https://github.com/armbian/build armbian-build

cd armbian-build

./compile.sh docker BOARD=odroidxu4 BRANCH=next KERNEL_ONLY=no KERNEL_CONFIGURE=no RELEASE=stretch BUILD_DESKTOP=no

 

All the build options are here. I personally add these ones

  • CLEAN_LEVEL=”” to avoid recompiling the kernel, u-boot and the rest every build, unless they have changed.
  • NO_APT_CACHER=no to use apt-cacher-ng for faster builds.

Source

Giant Swarm vs OpenShift – Giant Swarm

Giant Swarm vs OpenShift

At Giant Swarm, we’ve often been asked to compare our infrastructure with that of Red Hat OpenShift. We’d like to shed some light on this subject and give you a rundown of the differences between Giant Swarm and OpenShift.

No doubt Red Hat OpenShift is a leading container platform, or as they put it themselves “The Kubernetes platform for big ideas”. Red Hat is one of the major contributors to the open-source Kubernetes project and announced they would use Kubernetes for container orchestration with OpenShift Enterprise 3 in summer 2015. Many enterprises decided early on to use OpenShift as a platform for their containerized applications.

As OpenShift is widely used today, this raises the question of why the world needs another offering such as Giant Swarm, and how it’s different. So, let’s take a deeper look at how Giant Swarm compares to OpenShift.

At Giant Swarm we’re driven by customer obsessions. This means we’re always starting with the “WHY”. Challenging what drives our customers, their problems, and the pains they face, again and again – to understand what they really need and want.

From working with our enterprise customers, and the talks we have with many others, we’ve learned that they want to increase their business agility and developer productivity. This is to gain a competitive edge in the digital era in the first place. They want their applications to be running resiliently and to be scalable across their own data-centers and the public clouds. Additionally, most want to stay flexible and avoid lock-in in this ever-changing world. Obviously, cost savings often play a role too.

To gain business agility and increase developer productivity they want their development teams to be empowered, have the freedom to use the right tools for the job, and easily run Cloud Native projects on-demand at scale, reliably, and without having to manage the underlying infrastructure. This is so they can focus on what they do best – working on their applications.

Key motivations to become Cloud Native

Business Agility
Developer Productivity
Resiliency
Scalability
Cost Savings

These customer motivations have strong implications on our product, our architecture, and even our business model. Next, we’ll cover the key differentiators that explain how Giant Swarm is different, how we add value during our customer’s Cloud Native journeys, and why recently more and more enterprises are choosing Giant Swarm over traditional PaaS platforms like OpenShift.

1. Multitenancy: Hard Isolation vs. Soft Isolation

Giant Swarm believes that soft in-cluster separation by namespaces doesn’t provide the high level of security often required by enterprises or the freedom that development teams need to ship features quickly. We’re not alone with this opinion as Kelsey Hightower, Staff Developer Advocate at Google Cloud Platform, recommends in his Keynote at KubeCon Austin 2017 “Run Kubernetes clusters by org-chart”.

You need to put a lot of effort into securing in-cluster environments and it gets harder the larger the company and the more tenants you want to run in a cluster. This is especially true for teams that are just getting started with Kubernetes, as they’re sometimes running wild and breaking things. It’s a lot better to provide each team with a separate cluster to start with and then re-evaluate further down the line towards consolidating some workloads onto larger clusters.

At Giant Swarm we saw this early on, and decided to build an API driven platform enabling customers to easily provision and scale fully isolated Kubernetes clusters. These tenant clusters have both network and resource isolation from all other tenant clusters. In on-premises data-centers, we run the tenant clusters on top of a host cluster which also runs Giant Swarm’s control plane. In public clouds, such as AWS and Azure, the tenant clusters are managed by the same control plane, but run independently, each in their own virtual network.

The control plane consists of many micro-services and operators. These operators are custom controllers that automate managing the tenant clusters and related components. This allows our customers to have as many fully isolated Kubernetes clusters as they require, meaning they enjoy a higher security standard due to the additional isolation. But that’s not all – it empowers development teams as they can easily create their own clusters using their preferred config and tooling. They can also decide for themselves when it’s time for an upgrade – this truly allows for multi-project operations and prevents the team from becoming blockers for others when upgrading, something that we’ve seen happening often at large organizations using OpenShift.

Furthermore, Giant Swarm’s solution allows you to easily provision a new cluster with the latest Kubernetes version, test it, and then move applications. Most of Giant Swarm’s customers are using more than 10 tenant clusters within a few weeks. As well as per team clusters they separate per environment for dev, staging, pre-prod, and especially production. We even have customers that have integrated provisioning clusters into their CI/CD pipeline so their tests are executed in a fresh cluster and cleaned up afterward.

Key benefits as a result of Giant Swarm’s Architecture with Hard Isolation

Efficient Multi-Project Operations – you can easily start as many fully isolated clusters as you need.
Empowered Development Teams – each team can get their own clusters with their preferred config and tooling.
Enhanced Security – due to the hard isolation of each tenant cluster.
Faster Update Cycles – teams can upgrade independently of each other, instead of becoming a blocker for each other.
Increased Flexibility – you can easily provision and test clusters with different Kubernetes versions on different infrastructures.

2. Continuous Updates vs. 1-2 Major Upgrades per Year

Giant Swarm believes that the fast eat the slow. That’s why Giant Swarm has a CI/CD pipeline into every installation, enabling us to upgrade all the components of the whole stack of open source products anytime, and to keep our customers always on the edge of the fast-evolving Cloud Native ecosystem. This is especially important in times where most projects such as Kubernetes have a major release every quarter. Additionally, having a CI/CD pipeline allows for daily zero-touch updates to continuously improve the container platform and customer experience as well as to rollout hot-fixes immediately to prevent customers from running into more serious issues. Of course, our system asks our customers for permission and they need to actively accept major releases with the possibility of breaking changes.

With smaller continuous updates fewer changes are made at a time, so it’s easier to identify problems. The toolchain and automation becomes more robust the more frequent teams perform upgrades. As they say: “If it hurts do it more often”. Upgrades usually do hurt and people tend to refuse to do them regularly. This is getting worse with increasing complexity – at some point you will get into trouble with interlocking between components. For example, you can’t upgrade Prometheus because it requires a newer version of Kubernetes but you want to, as in the latest version of Prometheus a bug was fixed that you’re experiencing in production.

Across the hundreds of Kubernetes clusters we’re managing in production for our customers we’re experiencing more than 80% of all issues on clusters that have not been updated for 90+ days. This clearly shows that 1 or 2 major upgrades per year are simply not enough to stay secure and run reliably.

This also brings us back to point 1. We’ve encountered large enterprises that couldn’t upgrade yet from OpenShift 3.4. This means they’re still running on Kubernetes 1.4 which is a long way behind. Whereas Giant Swarm guarantees to provide the latest version of Kubernetes within 30 days of its release. At the time being, Giant Swarm customers can already use Kubernetes 1.11 which is 18 months ahead of version 1.4.

Key benefits as a result of Continuous Updates

Staying Always Ahead – Giant Swarm ships improvements every day and guarantees to provide updates of the many open source components within 30 days of the latest release.
Increased Reliability – Giant Swarm keeps your cluster always up-to-date and prevents you from running into many possible issues, suffering from bugs that have already been fixed or even worse any interlocking between components.
Enhanced Security – Giant Swarm fixes any security issue and rolls out the update immediately via our CI/CD pipeline into your installation.

3. Managed Solution vs. Managing a Third-Party Platform

Giant Swarm believes in the DevOps concept: “You build it, you run it“. This approach has been shown to empower companies to build better software faster, gain business agility and developer productivity as development cycles are much shorter. You might have experienced this already from building your applications the DevOps way. The same is now true for infrastructure, as infrastructure has become code.

That’s why Giant Swarm is not only providing you with a container platform but also managing it 24/7 for you. This means we’re not just selling you a product as a traditional PaaS player will do but taking full responsibility that it is up-to-date and operational at all times – and it also allows us to make our product better every day.

Today, we’re already managing hundreds of Kubernetes clusters in production for world-class companies across on-premises data-centers and public clouds. This gives us unique insights as we are running into more issues than anyone else in the marketplace. We see issues early on and at scale and can respond to them accordingly. Whenever we discover an issue in one of the many clusters of our customers we create a postmortem and fix it with code. Every release and change is tested automatically in our CI/CD pipeline and then rolled out immediately into all installations ensuring that every other customer would not run into the same issue. This creates a positive network effect for all our customers as they simply run into fewer issues as more customers are joining Giant Swarm. It makes Giant Swarm’s container platform and all the Kubernetes clusters of our customers more secure, robust and reliable.

Additionally, our approach comes with economies of scale. It’s simply more efficient to manage hundreds – if not thousands of clusters – on a platform you control and can update anytime instead of managing a third-party platform where you have little or no influence on changes.

Hiring a third-party provider to manage another company’s platform is even worse – think how long it would take if one of your factories in China were to experience a problem with a cluster from an external service provider. They would have to report this back to your HQ, who would then forward it to the provider, who may then even need to open a support ticket with the vendor. It would take ages until you get a response – and even longer until you get a solution that resolves the problem. Now imagine this with a critical problem, where you’re experiencing downtime.

To make our customers lives better and to prevent the frustration and long response-times of a long support chain we give our customers direct access to our engineers via a private Slack channel that allows us to provide an immediate qualified response. We’re basically becoming part of our customer’s internal platform team, taking full responsibility that their container platform and all their clusters are up-to-date and operational at all times. As one of our Fortune 500 customers says: “We break it. Giant Swarm fixes it”.

Key benefits as a result of a Fully Managed Solution

Faster Development – development teams can focus on their applications instead of spending time and energy taking care of the complex underlying infrastructure.
Increased Reliability – Giant Swarm manages hundreds of Kubernetes clusters meaning problems are often found and fixed for another customer before they can affect your clusters.
Enhanced Security – Giant Swarm takes care of keeping all your tenant clusters secure and running well at all times.

4. Open Source with Freedom of Choice vs. Distribution with Limitations and Lock-In

Giant Swarm believes in providing customers with freedom of choice to allow their development teams to choose the right tool for the job. Instead of having to use some pre-configured, opinionated tools provided by the traditional PaaS platforms such as OpenShift, which can limit them.

That’s why Giant Swarm provides vanilla Kubernetes anywhere. You’re not only getting a conformant Kubernetes that you can get from many vendors but the same plain vanilla Kubernetes in your on-premises data-centers and at leading cloud providers to prevent lock-in. This allows you to move your workloads easily from a Kubernetes cluster managed by Giant Swarm to another vanilla Kubernetes cluster.

Betting on plain vanilla Kubernetes and working closely with the community has allowed Giant Swarm to use alpha features early on and some large enterprises to get into production using RBAC and Pod Security Policies while they were still in alpha.

At Giant Swarm we keep our open source code in public Github repositories where it is free to use, instead of trying to lock customers into our solution. Obviously, this has some implications on our business model as we will explain to you in the next section.

Key benefits as a result of a Pure Open Source Solution

Faster Development – Giant Swarm allows your development teams to use the right tool for the job – instead of requiring to use some pre-configured opinionated tools.
Increased Scalability + Flexibility – Giant Swarm provides pure vanilla Kubernetes on any infrastructure and prevents lock-in as you can easily move your workloads to another infrastructure provider.
Staying Always Ahead – Giant Swarm always provides you with the latest Kubernetes version – supporting you to even run Kubernetes alpha features in production.

5. Managed Service Subscription vs. Licenses + Enterprise Support

Giant Swarm believes that infrastructure software is becoming open source as the Cloud Native community is building better solutions than closed source vendors could do alone. The development and improvements of all these open source components are happening so fast that providing a platform with 1 or 2 major upgrades per year is simply not enough anymore. The added value is in providing a fully managed solution, taking responsibility that your business is operational at all times.

That’s why Giant Swarms doesn’t charge a license fee plus additional enterprise support as traditional PaaS providers do. Instead, Giant Swarm charges only a usage-based subscription fee for its Fully Managed Cloud Native Platform.

When it comes to Total Cost of Ownership (TCO) Giant Swarm’s offering will always be more cost-efficient. Compare this to building and managing a container platform yourself, or paying a license fee for a traditional PaaS and managing this yourself – or even hiring an external service provider to manage it for you. This is simply because of the closed loop that Giant Swarm owns of building and managing the platform – there are economies of scale of managing hundreds – if not thousands – of Kubernetes clusters on top of it in the near future. Several of our customers have confirmed that when considering TCO, Giant Swarm’s offering has clear cost-savings advantages, in comparison to managing another vendors platform (such as OpenShift) themselves, or hiring a third-party service provider.

Key Benefits as a result of a Managed Service Model

Cost Savings – as Giant Swarm has lower total costs of ownership and passes these savings on to our customers.

Giant Swarm has rethought the traditional PaaS model of vendors such as Red Hat OpenShift and has come up with a solution that is in many ways fundamentally different. These differences allow Giant Swarm to add a lot of extra value, especially for enterprises key objectives:

Business Agility – Giant Swarm’s customers can efficiently run multiple Cloud Native projects on-demand, at scale, reliably. Giant Swarm takes the complexity away from our customers, taking care that their Cloud Native infrastructure is up-to-date and operational at all times, as well as providing excellent hands-on support. This further accelerates our customer’s Cloud Native journey, so they can thrive in the digital era.
Developer Productivity – Giant Swarm provides development teams with freedom of choice instead of making them use pre-configured tools. Developers can easily get their own clusters with their preferred config and tooling, upgrade when they are ready without blocking others, and the freedom to break clusters while Giant Swarm fixes it.
Resiliency – Giant Swarm keeps the Cloud Native platform and all tenant clusters up-to-date and their customer’s workloads operational at all times. Giant Swarm proactively prevents more than 80% of all potential issues thanks to the positive network effect of managing clusters for many enterprises and daily zero-touch updates via our CI/CD pipeline into every installation. This is just getting better the more companies join Giant Swarm.
Security – Giant Swarm provides a higher security standard thanks to the hard isolation of every cluster, updating all the open source components at all times, and the immediate rollouts of hot-fixes for potential security issues via the CI/CD pipeline into every installation.
Scalability – Giant Swarm provides and manages plain vanilla Kubernetes in your data-centers and preferred cloud providers. Customers can efficiently start and scale 100+ clusters across private and public clouds around the globe.
Cost Savings – Customers have stated the total cost of ownership of Giant Swarm’s solution is clearly lower than doing it yourself, managing a traditional packaged PaaS such as OpenShift, or hiring a third-party service provider.

These huge benefits have convinced world-class companies including several Fortune 500 companies to choose Giant Swarm over OpenShift – and even some to move away from OpenShift to Giant Swarm.

While Red Hat OpenShift is an established product trusted by many, with plenty of companies having close relationships with the market leader, we are convinced that if you’re optimizing for the key objectives mentioned in this article, that Giant Swarm is the better solution. We will keep working hard to support you to win in the digital era. We will let you focus on what you do best, taking complexity away from you, allowing you to break clusters and have a coffee during the day and rest well at night while we will fix it for you.

Want to learn more? Please, get in touch via our website or schedule a discovery call with me.

Source

Image Management & Mutability in Docker and Kubernetes

May 15, 2018

by Adrian Mouat

Kubernetes is a fantastic tool for building large containerised software systems in a manner that is both resilient and scalable. But the architecture and design of Kubernetes has evolved over time, and there are some areas that could do with tweaking or rethinking. This post digs into some issues related to how image tags are handled in Kubernetes and how they are treated differently in plain Docker.

First, let’s take a look at one of the first issues that people can face. I have the following demo video that shows a developer trying to deploy a new version of a Rust webapp to a Kubernetes cluster:

The video starts by building and pushing a version of the pizza webapp that serves quattro formaggi pizza. The developer then tries to deploy the webapp to Kubernetes and ends up in a confusing situation where it’s running, yet not serving the kind of pizza we expect. We can see what’s going on by doing some more inspection:

It turns out 3 different versions of our webapp are running inside a single Kubernetes Replica Set, as evidenced by the 3 different digests.

The reason this can happen comes down to the Kubernetes imagePullPolicy. The default is IfNotPresent, which means nodes will use an existing image rather than pull a new one. In our demo, each node happened to have a different version of the image left over from previous runs. Personally, I’m disappointed that this is the default behaviour, as it’s unexpected and confusing for new users. I understand that it evolved over-time and in some cases it is the wanted behaviour, but we should be able to change this default for the sake of usability.

The simplest mitigation for the problem is to set the pull policy to AlwaysPull:

This can even be made the default for all deployments by using the AlwaysPullImages Admission Controller.

However, there is still a rather large hole in this solution. Imagine a new deployment occurs concurrently with the image being updated in the registry. It’s quite likely that different nodes will pull different versions of the image even with AlwaysPull set. We can see a better solution in the way Docker Swarm Mode works – the Swarm Mode control plane will resolve images to a digest prior to asking nodes to run the image, that way all containers are guaranteed to run the same version of the image. There’s no reason we can’t do something similar in Kubernetes using an Admission Controller, and my understanding is that Docker EE does exactly this when running Kubernetes pods. I haven’t been able to find an existing open source Admission Controller that does this, but we’re working on one at CS and I’ll update this post when I have something.

Going a little deeper, the real reason behind this trouble is a difference between the way image tags are viewed in Kubernetes and Docker. Kubernetes assumes image tags are immutable. That is to say, if I call my image amouat/pizza:today, Kubernetes assumes it will only ever refer to that unique image; the tag won’t get reused for new versions in the future. This may sound like a pain at first, but immutable images solve a lot of problems; any potential confusion about which version of an image a tag refers to simply evaporates. It does require using an appropriate naming convention; in the case of amouat/pizza:today a better version would be to use the date e.g. amouat/pizza:2018-05-12, in other cases SemVer or git hashes can work well.

In contrast, Docker treats tags as mutable and even trains us to think this way. For example, when building an application that runs in a container, I will repeatedly run docker build -t test . or similar, constantly reusing the tag so that the rest of my workflow doesn’t need to change. Also, the official images on the Docker Hub typically have tags for major and minor versions of images that get updated over time e.g. redis:3 is the same image as redis:3.2.11 at the time of writing, but in the past would have pointed at redis:3.2.10 etc.

This split is a real practical problem faced by new users. Solving it seems reasonably straightforward; can’t we have both immutable and mutable tags? This would require support from registries and (preferably) the Docker client, but the advantages seem worth it. I am hopeful that the new OCI distribution specification will tackle this issue.

To sum up; be careful when deploying images to Kubernetes and make sure you understand how images actually get deployed to your cluster. And if you happen
to have any influence on the direction of Kubernetes or the Distribution spec; can we please try to make the world a bit nicer?

Because of these and some other issues, Container Solutions have started work on Trow; an image management solution for Kubernetes that includes a registry component that runs inside the cluster. Trow will support immutable tags and include admission controllers that pin images to digests. If this sounds useful to you, please head over to trow.io and let us know!

Further Viewing

This blog was based on my talk Establishing Image Provenance and Security in Kubernetes given at KubeCon EU 2018, which goes deeper into some of the issues surrounding images.

Looking for a new challenge? We’re hiring!

Source