October 2018 – Page 2 – Art2Dec SoftLab

October 22, 2018October 22, 2018

Running Highly Available WordPress with MySQL on Kubernetes

Take a deep dive into Best Practices in Kubernetes Networking

From overlay networking and SSL to ingress controllers and network security policies, we’ve seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.

Watch the video

WordPress is a popular platform for editing and publishing content for
the web. In this tutorial, I’m going to walk you through how to build
out a highly available (HA) WordPress deployment using Kubernetes.
WordPress consists of two major components: the WordPress PHP server,
and a database to store user information, posts, and site data. We need
to make both of these HA for the entire application to be fault
tolerant. Running HA services can be difficult when hardware and
addresses are changing; keeping up is tough. With Kubernetes and its
powerful networking components, we can deploy an HA WordPress site and
MySQL database without typing a single IP address (almost). In this
tutorial, I’ll be showing you how to create storage classes, services,
configuration maps, and sets in Kubernetes; run HA MySQL; and hook up an
HA WordPress cluster to the database service. If you don’t already have
a Kubernetes cluster, you can spin one up easily on Amazon, Google, or
Azure, or by using Rancher Kubernetes Engine
(RKE) on any servers.

Architecture Overview

I’ll now present an overview of the technologies we’ll use and their
functions:

Storage for WordPress Application Files: NFS with a GCE Persistent
Disk Backing
Database Cluster: MySQL with xtrabackup for parity
Application Level: A WordPress DockerHub image mounted to NFS
Storage
Load Balancing and Networking: Kubernetes-based load balancers and
service networking

The architecture is organized as shown below:

Diagram

Creating Storage Classes, Services, and Configuration Maps in Kubernetes

In Kubernetes, stateful sets offer a way to define the order of pod
initialization. We’ll use a stateful set for MySQL, because it ensures
our data nodes have enough time to replicate records from previous pods
when spinning up. The way we configure this stateful set will allow the
MySQL master to spin up before any of the slaves, so cloning can happen
directly from master to slave when we scale up. To start, we’ll need to
create a persistent volume storage class and a configuration map to
apply master and slave configurations as needed. We’re using persistent
volumes so that the data in our databases aren’t tied to any specific
pods in the cluster. This method protects the database from data loss in
the event of a loss of the MySQL master pod. When a master pod is lost,
it can reconnect to the xtrabackup slaves on the slave nodes and
replicate data from slave to master. MySQL’s replication handles
master-to-slave replication but xtrabackup handles slave-to-master
backward replication. To dynamically allocate persistent volumes, we
create the following storage class utilizing GCE Persistent Disks.
However, Kubernetes offers a variety of persistent volume storage
providers:

# storage-class.yaml
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: slow
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
zone: us-central1-a

Create the class and deploy with this
command: $ kubectl create -f storage-class.yaml. Next, we’ll create
the configmap, which specifies a few variables to set in the MySQL
configuration files. These different configurations are selected by the
pods themselves, but they give us a handy way to manage potential
configuration variables. Create a YAML file named mysql-configmap.yaml
to handle this configuration as follows:

# mysql-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql
labels:
app: mysql
data:
master.cnf: |
# Apply this config only on the master.
[mysqld]
log-bin
skip-host-cache
skip-name-resolve
slave.cnf: |
# Apply this config only on slaves.
[mysqld]
skip-host-cache
skip-name-resolve

Create the configmap and deploy with this
command: $ kubectl create -f mysql-configmap.yaml. Next, we want to
set up the service such that MySQL pods can talk to one another and our
WordPress pods can talk to MySQL, using mysql-services.yaml. This also
enables a service load balancer for the MySQL service.

# mysql-services.yaml
# Headless service for stable DNS entries of StatefulSet members.
apiVersion: v1
kind: Service
metadata:
name: mysql
labels:
app: mysql
spec:
ports:
– name: mysql
port: 3306
clusterIP: None
selector:
app: mysql

With this service declaration, we lay the groundwork to have a multiple
write, multiple read cluster of MySQL instances. This configuration is
necessary because each WordPress instance can potentially write to the
database, so each node must be ready to read and write. To create the
services above, execute the following command:
$ kubectl create -f mysql-services.yaml At this point, we’ve created
the volume claim storage class which will hand persistent disks to all
pods that request them, we’ve configured the configmap that sets a few
variables in the MySQL configuration files, and we’ve configured a
network-level service that will load balance requests to the MySQL
servers. This is all just framework for the stateful sets, where the
MySQL servers actually operate, which we’ll explore next.

Configuring MySQL with Stateful Sets

In this section, we’ll be writing the YAML configuration for a MySQL
instance using a stateful set. Let’s define our stateful set:

Create three pods and register them to the MySQL service.
Define the following template for each pod:
Create an initialization container for the master MySQL server
named init-mysql.
- Use the mysql:5.7 image for this container.
- Run a bash script to set up xtrabackup.
- Mount two new volumes for the configuration and configmap.
Create an initialization container for the master MySQL server
named clone-mysql.
- Use the Google Cloud Registry’s xtrabackup:1.0 image for this
  container.
- Run a bash script to clone existing xtrabackups from the
  previous peer.
- Mount two new volumes for data and configuration.
- This container effectively hosts the cloned data so the new
  slave containers can pick it up.
Create the primary containers for the slave MySQL servers.
- Create a MySQL slave container and configure it to connect to
  the MySQL master.
- Create a xtrabackup slave container and configure it to
  connect to the xtrabackup master.
Create a volume claim template to describe each volume to be created
as a 10GB persistent disk.

The following configuration defines behavior for masters and slaves of
our MySQL cluster, offering a bash configuration that runs the slave
client and ensures proper operation of a master before cloning. Slaves
and masters each get their own 10GB volume which they request from the
persistent volume storage class we defined earlier.

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mysql
spec:
selector:
matchLabels:
app: mysql
serviceName: mysql
replicas: 3
template:
metadata:
labels:
app: mysql
spec:
initContainers:
– name: init-mysql
image: mysql:5.7
command:
– bash
– “-c”
– |
set -ex
# Generate mysql server-id from pod ordinal index.
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=$
echo [mysqld] > /mnt/conf.d/server-id.cnf
# Add an offset to avoid reserved server-id=0 value.
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
# Copy appropriate conf.d files from config-map to emptyDir.
if [[ $ordinal -eq 0 ]]; then
cp /mnt/config-map/master.cnf /mnt/conf.d/
else
cp /mnt/config-map/slave.cnf /mnt/conf.d/
fi
volumeMounts:
– name: conf
mountPath: /mnt/conf.d
– name: config-map
mountPath: /mnt/config-map
– name: clone-mysql
image: gcr.io/google-samples/xtrabackup:1.0
command:
– bash
– “-c”
– |
set -ex
# Skip the clone if data already exists.
[[ -d /var/lib/mysql/mysql ]] && exit 0
# Skip the clone on master (ordinal index 0).
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=$
[[ $ordinal -eq 0 ]] && exit 0
# Clone data from previous peer.
ncat –recv-only mysql-$(($ordinal-1)).mysql 3307 | xbstream -x -C /var/lib/mysql
# Prepare the backup.
xtrabackup –prepare –target-dir=/var/lib/mysql
volumeMounts:
– name: data
mountPath: /var/lib/mysql
subPath: mysql
– name: conf
mountPath: /etc/mysql/conf.d
containers:
– name: mysql
image: mysql:5.7
env:
– name: MYSQL_ALLOW_EMPTY_PASSWORD
value: “1”
ports:
– name: mysql
containerPort: 3306
volumeMounts:
– name: data
mountPath: /var/lib/mysql
subPath: mysql
– name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 500m
memory: 1Gi
livenessProbe:
exec:
command: [“mysqladmin”, “ping”]
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
readinessProbe:
exec:
# Check we can execute queries over TCP (skip-networking is off).
command: [“mysql”, “-h”, “127.0.0.1”, “-e”, “SELECT 1”]
initialDelaySeconds: 5
periodSeconds: 2
timeoutSeconds: 1
– name: xtrabackup
image: gcr.io/google-samples/xtrabackup:1.0
ports:
– name: xtrabackup
containerPort: 3307
command:
– bash
– “-c”
– |
set -ex
cd /var/lib/mysql

# Determine binlog position of cloned data, if any.
if [[ -f xtrabackup_slave_info ]]; then
# XtraBackup already generated a partial “CHANGE MASTER TO” query
# because we’re cloning from an existing slave.
mv xtrabackup_slave_info change_master_to.sql.in
# Ignore xtrabackup_binlog_info in this case (it’s useless).
rm -f xtrabackup_binlog_info
elif [[ -f xtrabackup_binlog_info ]]; then
# We’re cloning directly from master. Parse binlog position.
[[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
rm xtrabackup_binlog_info
echo “CHANGE MASTER TO MASTER_LOG_FILE=’$’,
MASTER_LOG_POS=$” > change_master_to.sql.in
fi

# Check if we need to complete a clone by starting replication.
if [[ -f change_master_to.sql.in ]]; then
echo “Waiting for mysqld to be ready (accepting connections)”
until mysql -h 127.0.0.1 -e “SELECT 1”; do sleep 1; done

echo “Initializing replication from clone position”
# In case of container restart, attempt this at-most-once.
mv change_master_to.sql.in change_master_to.sql.orig
mysql -h 127.0.0.1 <<EOF
$(<change_master_to.sql.orig),
MASTER_HOST=’mysql-0.mysql’,
MASTER_USER=’root’,
MASTER_PASSWORD=”,
MASTER_CONNECT_RETRY=10;
START SLAVE;
EOF
fi

# Start a server to send backups when requested by peers.
exec ncat –listen –keep-open –send-only –max-conns=1 3307 -c
“xtrabackup –backup –slave-info –stream=xbstream –host=127.0.0.1 –user=root”
volumeMounts:
– name: data
mountPath: /var/lib/mysql
subPath: mysql
– name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 100m
memory: 100Mi
volumes:
– name: conf
emptyDir: {}
– name: config-map
configMap:
name: mysql
volumeClaimTemplates:
– metadata:
name: data
spec:
accessModes: [“ReadWriteOnce”]
resources:
requests:
storage: 10Gi

Save this file as mysql-statefulset.yaml.
Type kubectl create -f mysql-statefulset.yaml and let Kubernetes
deploy your database. Now, when you call $ kubectl get pods, you
should see three pods spinning up or ready that each have two containers
on them. The master pod is denoted as mysql-0 and the slaves follow
as mysql-1 and mysql-2. Give the pods a few minutes to make sure
the xtrabackup service is synced properly between pods, then move on
to the WordPress deployment. You can check the logs of the individual
containers to confirm that there are no error messages being thrown. To
do this, run $ kubectl logs -f -c <container_name> The master
xtrabackup container should show the two connections from the slaves
and no errors should be visible in the logs.

Deploying Highly Available WordPress

The final step in this procedure is to deploy our WordPress pods onto
the cluster. To do this, we want to define a service for WordPress and a
deployment. For WordPress to be HA, we want every container running the
server to be fully replaceable, meaning we can terminate one and spin up
another with no change to data or service availability. We also want to
tolerate at least one failed container, having a redundant container
there to pick up the slack. WordPress stores important site-relevant
data in the application directory /var/www/html. For two instances of
WordPress to serve the same site, that folder has to contain identical
data. When running WordPress in HA, we need to share
the /var/www/html folders between instances, so we’ll define an NFS
service that will be the mount point for these volumes. The following
configuration sets up the NFS services. I’ve provided the plain English
version below:

Define a persistent volume claim to create our shared NFS disk as a
GCE persistent disk at size 200GB.
Define a replication controller for the NFS server which will ensure
at least one instance of the NFS server is running at all times.
Open ports 2049, 20048, and 111 in the container to make the NFS
share accessible.
Use the Google Cloud Registry’s volume-nfs:0.8 image for the NFS
server.
Define a service for the NFS server to handle IP address routing.
Allow necessary ports through that service firewall.

# nfs.yaml
# Define the persistent volume claim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs
labels:
demo: nfs
annotations:
volume.alpha.kubernetes.io/storage-class: any
spec:
accessModes: [ “ReadWriteOnce” ]
resources:
requests:
storage: 200Gi

—
# Define the Replication Controller
apiVersion: v1
kind: ReplicationController
metadata:
name: nfs-server
spec:
replicas: 1
selector:
role: nfs-server
template:
metadata:
labels:
role: nfs-server
spec:
containers:
– name: nfs-server
image: gcr.io/google_containers/volume-nfs:0.8
ports:
– name: nfs
containerPort: 2049
– name: mountd
containerPort: 20048
– name: rpcbind
containerPort: 111
securityContext:
privileged: true
volumeMounts:
– mountPath: /exports
name: nfs-pvc
volumes:
– name: nfs-pvc
persistentVolumeClaim:
claimName: nfs

—
# Define the Service
kind: Service
apiVersion: v1
metadata:
name: nfs-server
spec:
ports:
– name: nfs
port: 2049
– name: mountd
port: 20048
– name: rpcbind
port: 111
selector:
role: nfs-server

Deploy the NFS server using $ kubectl create -f nfs.yaml. Now, we need
to run $ kubectl describe services nfs-server to gain the IP address
to use below. Note: In the future, we’ll be able to tie these
together using the service names, but for now, you have to hardcode the
IP address.

# wordpress.yaml
apiVersion: v1
kind: Service
metadata:
name: wordpress
labels:
app: wordpress
spec:
ports:
– port: 80
selector:
app: wordpress
tier: frontend
type: LoadBalancer

—

apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs
spec:
capacity:
storage: 20G
accessModes:
– ReadWriteMany
nfs:
# FIXME: use the right IP
server: <IP of the NFS Service>
path: “/”

—

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs
spec:
accessModes:
– ReadWriteMany
storageClassName: “”
resources:
requests:
storage: 20G

—

apiVersion: apps/v1beta1 # for versions before 1.8.0 use apps/v1beta1
kind: Deployment
metadata:
name: wordpress
labels:
app: wordpress
spec:
selector:
matchLabels:
app: wordpress
tier: frontend
strategy:
type: Recreate
template:
metadata:
labels:
app: wordpress
tier: frontend
spec:
containers:
– image: wordpress:4.9-apache
name: wordpress
env:
– name: WORDPRESS_DB_HOST
value: mysql
– name: WORDPRESS_DB_PASSWORD
value: “”
ports:
– containerPort: 80
name: wordpress
volumeMounts:
– name: wordpress-persistent-storage
mountPath: /var/www/html
volumes:
– name: wordpress-persistent-storage
persistentVolumeClaim:
claimName: nfs

We’ve now created a persistent volume claim that maps to the NFS
service we created earlier. It then attaches the volume to the WordPress
pod at the /var/www/html root, where WordPress is installed. This
preserves all installation and environments across WordPress pods in the
cluster. With this configuration, we can spin up and tear down any
WordPress node and the data will remain. Because the NFS service is
constantly using the physical volume, it will retain the volume and
won’t recycle or misallocate it. Deploy the WordPress instances
using $ kubectl create -f wordpress.yaml. The default deployment only
runs a single instance of WordPress, so feel free to scale up the number
of WordPress instances
using $ kubectl scale –replicas=<number of replicas> deployment/wordpress.
To obtain the address of the WordPress service load balancer,
type $ kubectl get services wordpress and grab the EXTERNAL-IP field
from the result to navigate to WordPress.

Resilience Testing

OK, now that we’ve deployed our services, let’s start tearing them
down to see how well our HA architecture handles some chaos. In this
approach, the only single point of failure left is the NFS service (for
reasons explained in the Conclusion). You should be able to demonstrate
testing the failure of any other services to see how the application
responds. I’ve started with three replicas of the WordPress service and
the one master and two slaves on the MySQL service. First, let’s kill
all but one WordPress node and see how the application reacts:
$ kubectl scale –replicas=1 deployment/wordpress Now, we should see a
drop in pod count for the WordPress deployment. $ kubectl get pods We
should see that the WordPress pods are running only 1/1 now. When
hitting the WordPress service IP, we’ll see the same site and same
database as before. To scale back up, we can
use $ kubectl scale –replicas=3 deployment/wordpress. We’ll again
see that data is preserved across all three instances. To test the MySQL
StatefulSet, we can scale down the number of replicas using the
following: $ kubectl scale statefulsets mysql –replicas=1 We’ll see
a loss of both slaves in this instance and, in the event of a loss of
the master in this moment, the data it has will be preserved on the GCE
Persistent Disk. However, we’ll have to manually recover the data from
the disk. If all three MySQL nodes go down, you’ll not be able to
replicate when new nodes come up. However, if a master node goes down, a
new master will be spun up and via xtrabackup, it will repopulate with
the data from a slave. Therefore, I don’t recommend ever running with a
replication factor of less than three when running production databases.
To conclude, let’s talk about some better solutions for your stateful
data, as Kubernetes isn’t really designed for state.

Conclusions and Caveats

You’ve now built and deployed an HA WordPress and MySQL installation on
Kubernetes! Despite this great achievement, your journey may be far from
over. If you haven’t noticed, our installation still has a single point
of failure: the NFS server sharing the /var/www/html directory between
WordPress pods. This service represents a single point of failure
because without it running, the html folder disappears on the pods
using it. The image I’ve selected for the server is incredibly stable
and production ready, but for a true production deployment, you may
consider
using GlusterFS to
enable multi-read multi-write to the directory shared by WordPress
instances. This process involves running a distributed storage cluster
on Kubernetes, which isn’t really what Kubernetes is built for, so
despite it working, it isn’t a great option for long-term
deployments. For the database, I’d personally recommend using a managed
Relational Database service to host the MySQL instance, be it Google’s
CloudSQL or AWS’s RDS, as they provide HA and redundancy at a more
sensible price and keep you from worrying about data integrity.
Kubernetes isn’t really designed around stateful applications and any
state built into it is more of an afterthought. Plenty of solutions
exist that offer much more of the assurances one would look for when
picking a database service. That being said, the configuration presented
above is a labor of love, a hodgepodge of Kubernetes tutorials and
examples found across the web to create a cohesive, realistic use case
for Kubernetes and all the new features in Kubernetes 1.8.x. I hope your
experiences deploying WordPress and MySQL using the guide I’ve prepared
for you are a bit less exciting than the ones I had ironing out bugs in
the configurations, and of course, I wish you eternal uptime. That’s
all for now. Tune in next time when I teach you to drive a boat using
only a Myo gesture band and a cluster of Linode instances running Tails
Linux.

About the Author

Eric Volpert is a
student at the University of Chicago and works as an evangelist, growth
hacker, and writer for Rancher Labs. He enjoys any engineering
challenge. He’s spent the last three summers as an internal tools
engineer at Bloomberg and a year building DNS services for the Secure
Domain Foundation with CrowdStrike. Eric enjoys many forms of music
ranging from EDM to High Baroque, playing MOBAs and other action-packed
games on his PC, and late-night hacking sessions, duct taping APIs
together so he can make coffee with a voice command.

Source

October 21, 2018October 22, 2018

Welcome to the Era of Immutable Infrastructure

With the recent “container revolution,” a seemingly new idea became
popular: immutable infrastructure. In fact, it wasn’t particularly new,
nor did it specifically require containers. However, it was through
containers that it became more practical, understandable, and got the
attention of many in the industry. So, what is immutable
infrastructure? I’ll attempt to define it as the practice of making
infrastructure changes only in production by replacing components
instead of modifying them. More specifically, it means once we deploy a
component, we don’t modify (mutate) it. This doesn’t mean the component
(once deployed) is without any change in state; otherwise, it wouldn’t
be a very functional software component. But, it does mean that as the
operator we don’t introduce any change outside of the program’s
original API/design. Take for example this not too uncommon scenario.
Say our application uses a configuration file that we want to change. In
the dynamic infrastructure world, we might have used some scripting or a
configuration management tool to make this change. It would make a
network call to the server in question (or more likely many of them),
and execute some code to modify the file. It might also have some way of
knowing about the dependencies of that file that might need to be
altered as a result of this change (say a program needing a restart).
These relationships could become complex over time, which is why many CM
tools came up with a resource dependency model that helps to manage
them. The trade-offs between the two approaches are pretty simple.
Dynamic infrastructure is a lot more efficient with resources such as
network and disk IO. Because of this efficiency, it’s traditionally
faster than immutable because it doesn’t require pushing as many bits
or storing as many versions of a component. Back to our example of
changing a file. You could traditionally change a single file much
faster than you could replace the entire server. Immutable
infrastructure, on the other hand, offers stronger guarantees about the
outcome. Immutable components can be prebuilt before deploy, and build
once and then reused, unlike dynamic infrastructure which has logic that
needs to be evaluated in each instance. This leaves opportunity for
surprises about the outcome, as some of your environment might be in a
different state that you expect, causing errors in your deployment.
It’s also possible that you simply make a mistake in your configuration
management code, but you aren’t able to sufficiently replicate
production locally to test that outcome and catch the mistake. After
all, these configuration management languages themselves are complex. In
an article from ACM Queue, an Association for
Computing Machinery (ACM) magazine, engineers at Google articulated this
challenge well:

“The result is the kind of inscrutable ‘configuration is code’ that
people were trying to avoid by eliminating hard-coded parameters in
the application’s source code. It doesn’t reduce operational
complexity or make the configurations easier to debug or change; it
just moves the computations from a real programming language to a
domain-specific one, which typically has weaker development tools
(e.g., debuggers, unit test frameworks, etc).”

Trade-offs of efficiency have long been central to computer engineering.
However, the economics (both technological and financial) of these
decisions change over time. In the early days of programming, for
instance, developers were taught to use short variable names to save a
few bytes of precious memory at the expense of readability. Dynamic
linking libraries were developed to solve the space limitation of early
hard disk drives so that programs could share common C libraries instead
of each requiring their own copies. Both these things changed in the
last decade due to changes in the power of computer systems where now a
developer’s time is far more expensive than the bytes we save from
shortening our variables. New languages like Golang and Rust have even
brought back the statically compiled binary because it’s not worth the
headache of dealing with platform compatibility because of the wrong
DLL. Infrastructure management is at a similar crossroad. Not only has
the public cloud and virtualization made replacing a server (virtual
machine) orders of magnitude faster, but tools like Docker have created
easy to use tooling to work with pre-built server runtimes and efficient
resource usage with layer caching and compression. These features have
made immutable infrastructure practical because they are so lightweight
and frictionless. Kuberentes arrived on the scene not long after Docker
and took the torch further towards this goal, creating an API of “cloud
native” primitives that assume and encourage an immutable philosophy.
For instance, the ReplicaSet assumes that at any time in the lifecycle
of our application we can (and might need to) redeploy our application.
And, to balance this out, Pod Disruption Budgets tell Kubernetes how the
application will tolerate being redeployed. This confluence of
advancement has brought us to the era of immutable infrastructure. And
it’s only going to increase as more companies participate. Today’s
tools have made it easier than ever to
embrace these patterns. So, what are you waiting for?

About the Author

William Jimenez
is a curious solutions architect at Rancher Labs in Cupertino, CA, who
enjoys solving problems with computers, software, and just about any
complex system he can get his hands on. He enjoys helping others make
sense of difficult problems. In his free time, he likes to tinker with
amateur radio, cycle on the open road, and spend time with his family
(so they don’t think he forgot about them).

Source

October 21, 2018October 22, 2018

Using Kubernetes API from Go

Take a deep dive into Best Practices in Kubernetes Networking

Watch the video

Last month I had the great pleasure of attending Kubecon 2017, which took place in Austin, TX. The conference was super informative, and deciding on what session to join was really hard as all of them were great. But what deserves special recognition is how well the organizers respected the attendees’ diversity of Kubernetes experiences. Support is especially important if you are new to the project and need advice (and sometimes encouragement) to get started. Kubernetes 101 track sessions were a good way to get more familiar with the concepts, tools and the community. I was very excited to be a speaker on 101 track, and this blog post is a recap of my session Using Kubernetes APIs from Go

In this article we are going to learn what makes Kubernetes a great platform for developers, and cover the basics of writing a custom controller for Kubernetes in the Go language using the client-go library.

Kubernetes is a platform

Kubernetes can be liked for many reasons. As a user, you appreciate its features richness, stability and performance. As a contributor, the Kubernetes open source community is not only large, but approachable and responsive. But what really makes Kubernetes appealing to a third party developer is its extensibility. The project provides so many ways to add new features, extend existing ones without disrupting the main code base. And thats what makes Kubernetes a platform. Here are some ways to extend Kubernetes: On the picture, you can see that every Kuberentes cluster component can be extended in a certain way, whether it is a Kubelet, or API server. Today we are going to focus on a “Custom Controller” way, I’ll refer to it as Kubernetes Controller or simply a Controller from now on.

What exactly is Kubernetes Controller?

The most common definition for controller is “Code that brings current state of the system to the desired state”. But what exactly does it mean? Lets look at Ingress controller example. Ingress is a Kubernetes resource that lets you define external access to the services in cluster, typically in HTTP and usually with the Load Balancing support. But Kubernetes core code has no ingress implementation. The implementation gets covered by the third party controllers that would:

Watch ingress/services/endpoints resource events (Create/Update/Remove)
Program internal or external Load Balancer
Update Ingress with the Load Balancer address

The “desired” state of the ingress is the IP Address pointing to the functioning Load Balancer programmed with the rules defined by the user in Ingress specification. And external ingress controller is responsible for bringing the ingress resource to this state. The implementation of the controller for the same resource, as well as the way to deploy them, can vary. You can pick nginx controller and deploy it on every node in your cluster as a Daemon Set, or you can chose to run your ingress controller outside of Kubernetes cluster and program F5 as a Load Balancer. There are no strict rules, Kubernetes is flexible in that way.

Client-go

There are several ways to get information about Kubernetes cluster and its resources. You can do it using Dashboard, kubectl, or using programmatic access to Kubernetes APIs. Client-go is the most popular library used by the tools written in Go. There are clients for many other languages out there (java, python, etc). Although if you want to write your very first controller, I encourage you to try go/client-go. Kubernetes is written in Go, and I find it easier to develop a plugin in the same language the main project is written.

Lets build…

The best way to get familiar with the platforms and tools around it, is to write something. Lets start simple, and implement a controller that:

Monitors Kubernetes nodes
Alerts when storage occupied by images on the node, changes

The code source can be found here.

Ground work

Setup the project

As a developer, I like to sneak a peek at the tools my peers use to make their life easier. Here I’m going to share 3 favorite tools of mine that are gonna help us with our very first project.

go-skel – skeleton for Go microservices Just run ./skel.sh test123, and it will create the skeleton for the new go project test123.
trash – Go vendor management tool. There are many go dependencies management tools out there, but trash has been proved to be simple to use and great when it comes to transient dependencies management.
dapper – a tool to wrap any existing build tool in an consistent environment

Add client-go as a dependency

In order to use client-go code, we have to pull it as a dependency to our project. Add it to vendor.conf: And run trash. It will automatically pull all the dependencies defined in vendor.conf to the vendor folder of the project. Make sure client-go version is compatible with the Kubernetes version of your cluster.

Create a client

Before creating a client that is going to talk to Kubernetes API, we have to decide how we want to run our tool: inside or outside the Kubernetes cluster. When run inside the cluster, your application is containerized and gets deployed as Kubernetes Pod. It gives you certain perks – you can chose the way to deploy it (Daemon set to run on every node, or as a Deployment with n replicas), configure the healthcheck for it, etc. When your application runs outside of the cluster, you have to manage it yourself. Lets make our tool flexible, and support both ways of defining the client based on the config flag: We are going to use outside of cluster mode while debugging the app as this way you do not have to build the image every time and redeploy as kubernetes Pod. Once app is tested, we can build and image and deploy it in cluster. As you can see on the screen shot, the config is being built, and passed to kubernetes.NewForConfig to generate the client.

Play with basic CRUDs

For our tool, we need to monitor Nodes. It is a good idea to get familiar with the way to do CRUD operations using client-go before implementing the logic: Screen shot above displays how to do:

List nodes named “minikube” which can be achieved by passing FieldSelector filter to the command.
Update the node with the new annotation
Delete the node with the gracePeriod=10 seconds – meaning that the removal will happen only after 10 seconds since the command is issued.

All that is done using the clientset we’ve created on the previous step. We would need information about the images on the node; it can be retrieved by accessing corresponding field:

Watch/Notify using Informer

Now we know how to fetch the nodes from Kubernetes APIs and get images information from it. How do we monitor the changes to images’ size? The most simple way would be to periodically poll the nodes, calculate the current images storage capacity and compare it with the result from the previous poll. The downside to that – we execute the list call to fetch all the nodes, no matter if there were changes to them or not, and that can be expensive especially if your poll interval is small. What we really want is – to be notified when the node gets changed, and only then do our logic. Thats where client-go Informer comes to the rescue.

On this example, we create the Informer for the Node object by passing the watchList instruction on how to monitor the Node, object type api.Node and 30 seconds as a resync period instructing to periodically poll the node even when there were no changes to it – a nice way to fall back on in case the update event gets dropped by some reason. And as a last argument, we are passing 2 call back functions – handleNodeAdd and handleNodeUpdate. Those callbacks will have an actual logic that has to be triggered on the node’s changes – find out whether the storage occupied by images on the node got changed. The NewInformer gives back 2 objects – controller and store. Once the controller is started, the watch on node.update and node.add will start, and the callback functions will get called. The store is in memory cache which gets updated by the informer, and you can fetch the node object from the cache instead of calling Kubernetes APIs directly: As we have a single controller in our project, using regular Informer is fine enough. But if your future project ends up having several controllers for the same object, using SharedInformer is more recommended. So instead of creating multiple regular informers – one per controller – you can register one Shared informer, and let each controller register its own set of callbacks, and get back a shared cache in return which will reduce memory footprint:

Deployment time

Now it is time to deploy and test the code! For the first run, we are simply building a go binary and run it in out of cluster mode: To change the message output, deploy a pod using an image which is not presented on the node yet. Once basic functionality is tested, it is time to try running it in cluster mode. For that, we have to create the image first. Define the Dockerfile: And create an image using docker build . It will generate the image that you can use to deploy the pod in Kubernetes. Now your application can be run as a Pod in Kubernetes cluster. Here is an example of deployment definition, and on the screen shot above I’m using it to deploy our app: So we have:

Created go project
Added client-go package dependencies to it
Created a client to talk to Kubernetes api
Defined an Informer that would watch node object changes, and execute callback function once that happens
Implemented an actual logic in the callback definition.
Tested the code by running the binary in outside of cluster, and then deployed it inside the cluster

If you have any comments or questions on the topic, please feel free to share them with me ! Alena Prokharchyk twitter: @lemonjet github: https://github.com/alena1108

Alena Prokharchyk

Software Engineer

Source

October 20, 2018October 21, 2018

CICD Debates: Drone vs Jenkins

Introduction

Jenkins has been the industry standard CI tool for years. It contains a multitude of functionalities, with almost 1,000 plugins in its ecosystem, this can be daunting to some who appreciate simplicity. Jenkins also came up in a world before containers, though it does fit nicely into the environment. This means that there is not a particular focus on the things that make containers great, though with the inclusion of Blue Ocean and pipelines, that is rapidly changing.

Drone is an open source CI tool that wears simple like a badge of honor. It is truly Docker native; meaning that all actions take place within containers. This makes it a perfect fit for a platform like Kubernetes, where launching containers is an easy task.

Both of these tools walk hand in hand with Rancher, which makes standing up a robust Kubernetes cluster an automatic process. I’ve used Rancher 1.6 to deploy a K8s 1.8 cluster on GCE; as simple as can be.

Build a CI/CD Pipeline with Kubernetes and Rancher 2.0

Recorded Online Meetup of best practices and tools for building pipelines with containers and kubernetes.

Watch the training

This article will take Drone deployed on Kubernetes (on Rancher), and compare it to Jenkins across three categories:

Platform installation and management
Plugin ecosystem
Pipeline details

In the end, I’ll stack them up side by side and try to give a recommendation. As usually is the case however, there may not be a clear winner. Each tool has its core focus, though by nature there will be overlap.

Prereqs

Before getting started, we need to do a bit of set up. This involves setting up Drone as an authorized Oauth2 app with a Github account. You can see the settings I’ve used here. All of this is contained within the Drone documentation.

There is one gotcha which I encountered setting up Drone. Drone maintains a passive relationship with the source control repository. In this case, this means that it sets up a webhook with Github for notification of events. The default behavior is to build on push and PR merge events. In order for Github to properly notify Drone, the server must be accessible to the world. With other, on-prem SCMs, this would not be the case, but for the example described here it is. I’ve set up my Rancher server on GCE, so that it is reachable from Github.com.

Drone installs from a container through a set of deployment files, just like any other Kubernetes app. I’ve adapted the deployment files found in this repo. Within the config map spec file, there are several values we need to change. Namely, we need to set the Github-related values to ones specific to our account. We’ll take the client secret and client key from the setup steps and place them into this file, as well as the username of the authorized user. Within the drone-secret file, we can place our Github password in the appropriate slot.

This is a major departure from the way Jenkins interacts with source code. In Jenkins, each job can define its relationship with source control independent of another job. This allows you to pull source from a variety of different repositories, including Github, Gitlab, svn, and others. As of now, Drone only supports git-based repos. A full list is available in the documentation, but all of the most popular choices for git-based development are supported.

We also can’t forget our Kubernetes cluster! Rancher makes it incredibly easy to launch and manage a cluster. I’ve chosen to use latest stable version of Rancher, 1.6. We could’ve used the new Rancher 2.0 tech preview, but constructing this guide worked best with the stable version. however, the information and steps to install should be the same, so if you’d like to try it out with newer Rancher, go ahead!

Task 1 – Installation and Management

Launching Drone on Kubernetes and Rancher is as simple as copy paste. I used the default K8s dashboard to launch the files. Uploading them one by one, starting with the namespace and config files, will get the ball rolling. Here are some of the deployment files I used. I pulled from this repository and made my own local edits. This repo is owned by a frequent Drone contributor, and includes instructions on how to launch on GCE, as well as AWS. The Kubernetes yaml files are the only things we need here. To replicate, just edit the ConfigMap file with your specific values. Check out one of my files below.

yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: drone-server
namespace: drone
spec:
replicas: 1
template:
metadata:
labels:
app: drone-server
spec:
containers:
– image: drone/drone:0.8
imagePullPolicy: Always
name: drone-server
ports:
– containerPort: 8000
protocol: TCP
– containerPort: 9000
protocol: TCP
volumeMounts:
# Persist our configs in an SQLite DB in here
– name: drone-server-sqlite-db
mountPath: /var/lib/drone
resources:
requests:
cpu: 40m
memory: 32Mi
env:
– name: DRONE_HOST
valueFrom:
configMapKeyRef:
name: drone-config
key: server.host
– name: DRONE_OPEN
valueFrom:
configMapKeyRef:
name: drone-config
key: server.open
– name:DRONE_DATABASE_DRIVER
valueFrom:
configMapKeyRef:
name: drone-config
key: server.database.driver
– name: DRONE_DATABASE_DATASOURCE
valueFrom:
configMapKeyRef:
name: drone-config
key: server.database.datasource
– name: DRONE_SECRET
valueFrom:
secretKeyRef:
name: drone-secrets
key: server.secret
– name: DRONE_ADMIN
valueFrom:
configMapKeyRef:
name: drone-config
key: server.admin
– name: DRONE_GITHUB
valueFrom:
configMapKeyRef:
name: drone-config
key: server.remote.github
– name: DRONE_GITHUB_CLIENT
valueFrom:
configMapKeyRef:
name: drone-config
key: server.remote.github.client
– name: DRONE_GITHUB_SECRET
valueFrom:
configMapKeyRef:
key: server.remote.github.secret
– name: DRONE_DEBUG
valueFrom:
configMapKeyRef:
name: drone-config
key: server.debug

volumes:
– name: drone-server-sqlite-db
hostPath:
path: /var/lib/k8s/drone
– name: docker-socket
hostPath:
path: /var/run/docker.sock

Jenkins can be launched in much the same way. Because it is deployable in a Docker container, you can construct a similar deployment file and launch on Kubernetes. Here’s an example below. This file was taken from the GCE examples repo for the Jenkins CI server.

yaml
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: jenkins
namespace: jenkins
spec:
replicas: 1
template:
metadata:
labels:
app: master
spec:
containers:
– name: master
image: jenkins/jenkins:2.67
ports:
– containerPort: 8080
– containerPort: 50000
readinessProbe:
httpGet:
path: /login
port: 8080
periodSeconds: 10
timeoutSeconds: 5
successThreshold: 2
failureThreshold: 5
env:
– name: JENKINS_OPTS
valueFrom:
secretKeyRef:
name: jenkins
key: options
– name: JAVA_OPTS
value: ‘-Xmx1400m’
volumeMounts:
– mountPath: /var/jenkins_home
name: jenkins-home
resources:
limits:
cpu: 500m
memory: 1500Mi
requests:
cpu: 500m
memory: 1500Mi
volumes:
– name: jenkins-home
gcePersistentDisk:
pdName: jenkins-home
fsType: ext4
partition: 1

Launching Jenkins is similarly easy. Because of the simplicity of Docker and Rancher, all you need to do is take the set of deployment files and paste them into the dashboard. My preferred way is using the Kubernetes dashboard for all management purposes. From here, I can upload the Jenkins files one by one to get the server up and running.

Managing the Drone server comes down to configurations passed when launching. Hooking up to Github involved adding OAuth2 tokens, as well as (in my case) a username and password to access a repository. changing this would involve either granting organization access through GIthub, or relaunching the server with new credentials. This could possibly hamper development, as it means that Drone cannot handle more than one source provider. As mentioned above, Jenkins allows for any number of source repos, with the caveat that each job only uses one.

Task 2 – Plugins

Plugins in Drone are very simple to configure and manage. In fact, there isn’t much you need to do to get one up and running. The ecosystem is considerably smaller than that for Jenkins, but there are still plugins for almost every major tool available. There are plugins for most major cloud providers, as well as integrations with popular source control repos. As mentioned before, containers in Drone are first class citizens. This means that each plugin and executed task is also a container.
Jenkins is the undisputed king of plugins. If you can think of the task, there is probably a plugin to accomplish it. There are at last glance, almost 1000 plugins available for use. The downside of this is that it can sometimes be difficult to determine, out of a selection of similar looking plugins, which one is the best choice for what you’re trying to accomplish

There are docker plugins for building pushing and images, AWS and K8s plugins for deploying to clusters, and various others. Because of the comparative youth of the Drone platform, there are a great deal fewer plugins available here than for Jenkins. That does not however, take away from their effectiveness and ease of use. A simple stanza in a drone.yml file will automatically download, configure, and run a selected plugin, with no other input needed. And remember, because of Drone’s relationship with containers, each plugin is maintained within an image. There are no extra dependencies to manage; if the plugin creator has done their job correctly, everything will be contained within that container.

When I built the drone.yml file for the simple node app, adding a Docker plugin was a breeze. There were only a few lines needed, and the image was built and pushed to a Dockerhub repo of my choosing. In the next section, you can see the section labeled docker. This stanza is all that’s needed to configure and run the plugin to build and push the Docker image.

Task 3

The last task is the bread and butter of any CI system. Drone and Jenkins are both designed to build apps. Originally, Jenkins was targeted towards java apps, but over the years the scope has expanded to include anything you could compile and execute as code. Jenkins even excels at new pipelines and cron-job like scheduled tasks. However, it is not container native, though it does fit very well into the container ecosystem.

yaml
pipeline:
build:
image: node:alpine
commands:
– npm install
– npm run test
– npm run build
docker:
image: plugins/docker
dockerfile: Dockerfile
repo: badamsbb/node-example
tags: v1

For comparison, here’s a Jenkinsfile for the same app.

groovy
#!/usr/bin/env groovy
pipeline {
agent {
node {
label ‘docker’
}
}
tools {
nodejs ‘node8.4.0’
}
stages {
stage (‘Checkout Code’) {
steps {
checkout scm
}
}
stage (‘Verify Tools’){
steps {
parallel (
node: {
sh “npm -v”
},
docker: {
sh “docker -v”
}
)
}
}
stage (‘Build app’) {
steps {
sh “npm prune”
sh “npm install”
}
}
stage (‘Test’){
steps {
sh “npm test”
}
}
stage (‘Build container’) {
steps {
sh “docker build -t badamsbb/node-example:latest .”
sh “docker tag badamsbb/node-example:latest badamsbb/node-example:v$”
}
}
stage (‘Verify’) {
steps {
input “Everything good?”
}
}
stage (‘Clean’) {
steps {
sh “npm prune”
sh “rm -rf node_modules”
}
}
}
}

While this example is verbose for the sake of explanation, you can see that accomplishing the same goal, a built Docker image, can be more involved than with Drone. In addition, what’s not pictured is the set up of the interactions between Jenkins and Docker. Because Jenkins is not Docker native, agent must be configured ahead of time to properly interact with the Docker daemon. This can be confusing to some, which is where Drone comes out ahead. It is already running on top of Docker; this same Docker is used to run its tasks.

Conclusion

Drone is a wonderful piece of CI software. It has quickly become a very popular choice for wanting to get up and running quickly, looking for a simple container-native CI solution. The simplicity of it is elegant, though as it is still in a pre-release status, there is much more to come. Adventurous engineers may be willing to give it a shot in production, and indeed many have. In my opinion, it is best suited to smaller teams looking to get up and running quickly. Its small footprint and simplicity of use lends itself readily to this kind of development.

However, Jenkins is the tried and true powerhouse of the CI community. It takes a lot to topple the king, especially one so entrenched in his position. Jenkins has been very successful at adapting to the market, with Blue Ocean and container-based pipelines making strong cases for its staying power. Jenkins can be used by teams of all sizes, but excels at scale. Larger organizations love Jenkins due to its history and numerous integrations. It also has distinct support options, either active community support for open source, or enterprise-level support through CloudBees But as with all tools, both Drone and Jenkins have their place within the CI ecosystem.

Bio

Brandon Adams
Certified Jenkins Engineer, and Docker enthusiast. I’ve been using Docker since the early days, and love hearing about new applications for the technology. Currently working for a Docker consulting partner in Bethesda, MD.

Source

October 20, 2018October 21, 2018

Expanding User Support with Office Hours

Today’s developer has an almost overwhelming amount of resources available for learning. Kubernetes development teams use StackOverflow, user documentation, Slack, and the mailing lists. Additionally, the community itself continues to amass an awesome list of resources.

One of the challenges of large projects is keeping user resources relevant and useful. While documentation can be useful, great learning also happens in Q&A sessions at conferences, or by learning with someone whose explanation matches your learning style. Consider that learning Kung Fu from Morpheus would be a lot more fun than reading a book about Kung Fu!

We as Kubernetes developers want to create an interactive experience: where Kubernetes users can get their questions answered by experts in real time, or at least referred to the best known documentation or code example.

Having discussed a few broad ideas, we eventually decided to make Kubernetes Office Hours a live stream where we take user questions from the audience and present them to our panel of contributors and expert users. We run two sessions: one for European time zones, and one for the Americas. These streaming setup guidelines make office hours extensible—for example, if someone wants to run office hours for Asia/Pacific timezones, or for another CNCF project.

To give you an idea of what Kubernetes office hours are like, here’s Josh Berkus answering a question on running databases on Kubernetes. Despite the popularity of this topic, it’s still difficult for a new user to get a constructive answer. Here’s an excellent response from Josh:

It’s often easier to field this kind of question in office hours than it is to ask a developer to write a full-length blog post. [Editor’s note: That’s legit!] Because we don’t have infinite developers with infinite time, this kind of focused communication creates high-bandwidth help while limiting developer commitments to 1 hour per month. This allows a rotating set of experts to share the load without overwhelming any one person.

We hold office hours the third Wednesday of every month on the Kubernetes YouTube Channel. You can post questions on the #office-hours channel on Slack, or you can submit your question to Stack Overflow and post a link on Slack. If you post a question in advance, you might get better answers, as volunteers have more time to research and prepare. If a question can’t be fully solved during the call, the team will try their best to point you in the right direction and/or ping other people in the community to take a look. Check out this page for more details on what’s off- and on topic as well as meeting information for your time zone. We hope to hear your questions soon!

Special thanks to Amazon, Bitnami, Giant Swarm, Heptio, Liquidweb, Northwestern Mutual, Packet.net, Pivotal, Red Hat, Weaveworks, and VMWare for donating engineering time to office hours.

And thanks to Alan Pope, Joe Beda, and Charles Butler for technical support in making our livestream better.

Source

October 20, 2018October 21, 2018

NextCloudPi backup strategies – Own your bits

This post is a review of the different options that we have when deciding on our backup strategy. As more features have appeared over time, this information has become scattered around this blog, so it is nice to have a review in a single article.

It goes without saying that we need to backup our sensitive data in case a data loss event occurs, such as server failure, hard drive failure or a house fire. Many people think about this only too late, and it can be devastating.

Ideally we would have at least two copies, better three, of all our data. This will cover us from hardware failure. If possible, one of them should be in a different location, in case an event such as somebody breaking in our home, or a house fire, and it also would ideally be encrypted.

Now if we are running NCP on a low end hardware, options such as encryption, or RAID can have a prohibitive computational cost. With this in mind, the following are the backup features that have been developed for this scenario.

Periodic backups

This one is the most basic one. Backup our Nextcloud files and database at regular intervals, in case our system breaks or our database becomes corrupted. As we add more and more Gigabytes of data, this can take really long and take up a lot of space.

The data is better off at an external USB drive than at the SD card, not only because it is cheaper per Megabyte, but also because they are more reliable.

Pros:

Doesn’t require any advanced filesystem such as BTRFS.
Self contained. We can copy the backup to another NCP instance and restore easily.

Cons:

Can take up a lot of space, even if compressed.
They can be slow.
Our cloud will be inaccessible in maintenance mode during the process.

It is recommended to use a second hard drive to save the backups to, in case the first one fails (instructions). In order to do this just plug in a second hard drive and set it as the destination of backups in nc-backup-auto. Remember that if we are using more than one drive, we shoud reference each one by label in the NCP options as explained here.

Periodic dataless backups

To alleviate some of these issues, we can do dataless backups. These are typically a couple hundred megabytes big and the downtime will be small. The problem with this is that we need to find another way of duplicating the data itself.

Pros:

Doesn’t require any advanced filesystem such as BTRFS
Almost self contained. We can copy the backup to another NCP instance and restore our accounts, apps and so on easily.
The downtime quite small
Don’t take much additional space

Cons:

We need to backup our data separatedly
Restoring can be more complicated to include data

After restoring, we have to edit config.php and point our instance to the path where our data is, then we need to run nc-scan to make Nextcloud aware of the new files.

Periodic BTRFS snapshots

We have dedicated several posts already to BTRFS and its features. NCP will default to format our USB drive to BTRFS, and our datadir will automatically be in a subvolume. Just activate nc-snapshots to get hourly snapshots of your data with btrfs-snp

$ ls -1 /media/myCloudDrive/ncp-snapshots/

daily_2018-01-30_211703

daily_2018-01-31_221703

daily_2018-02-01_231702

daily_2018-02-02_231703

daily_2018-02-03_231702

daily_2018-02-04_231702

daily_2018-02-05_231702

hourly_2018-02-05_221701

hourly_2018-02-05_231701

hourly_2018-02-06_001701

hourly_2018-02-06_011701

hourly_2018-02-06_021701

hourly_2018-02-06_031701

hourly_2018-02-06_041701

hourly_2018-02-06_051701

hourly_2018-02-06_061701

hourly_2018-02-06_071701

hourly_2018-02-06_081701

hourly_2018-02-06_091701

hourly_2018-02-06_101702

hourly_2018-02-06_111701

hourly_2018-02-06_121701

hourly_2018-02-06_131701

hourly_2018-02-06_141701

hourly_2018-02-06_151701

hourly_2018-02-06_161701

hourly_2018-02-06_171701

hourly_2018-02-06_181702

hourly_2018-02-06_201701

hourly_2018-02-06_211701

hourly_2018-02-06_221701

manual_2017-12-28_113633

monthly_2017-12-28_101702

monthly_2018-01-27_101703

weekly_2018-01-11_111701

weekly_2018-01-18_111702

weekly_2018-01-25_111702

weekly_2018-02-01_111702

By virtue of BTRFS being a COW system, the new clones will only take as much additional space as new files we have added since the last snapshot, being space efficient.

Also, the snapshots can be sent incrementally very efficiently to another BTRFS filesystem using btrfs-sync. We can sync to another hard drive on the same machine

, or a BTRFS filesystem on another machine, in which case the transfer is encrypted through SSH.

In a low bandwidth situation, btrfs-sync can send the deltas compressed, but that will take a big toll on the CPU so it is not recommended on low end boards.

Pros:

Virtually instant, no matter how many Terabytes we have
No downtime
Space efficient
Can be synced efficiently to another hard drive or another machine through SSH by using btrfs-sync

Cons:

Need to run on BTRFS, which is the default filesystem that nc-format-USB uses
If we want to sync it to another USB drive it also needs to use BTRFS
If we want to sync it remotely we neend a BTRFS subvolume on the other end and setting up SSH credentials

If we only care about our data this can mean zero downtime and an efficient means of prevent accidental deletion. In fact, I have the Trash and Versions apps disabled. I recommend combining nc-snapshots and nc-snapshot-sync with dataless backups to get the best of both worlds.

Rsync

Another option to save our data remotely is to sync it through rsync. This is also quite efficient but compared to BTRFS snapshots we won’t retain a history of the datadir, just the latest version.

Pros:

Doesn’t require a particular filesystem on either end
Efficient delta sync, only copies the new files

Cons:

Need to setup SSH credentials
Not able to sync snapshots, it will only mirror the latest version of our datadir.

This option also allows us to keep our data safe if something happens in our house, and is more flexible as it doesn’t require a BTRFS filesystem in the other end.

Make sure you keep your data with you, we can never be too safe!
Source

October 19, 2018October 20, 2018

Principles of Container-based Application Design

Today, almost all applications can run in containers. But creating a cloud-native application that automates the operation and management of containerized applications through a cloud-native platform such as Kubernetes requires extra work.
Cloud native applications need to consider failures; even when the underlying architecture fails, it needs to run reliably.
To provide such functionality, a cloud-native platform like Kubernetes needs to impose some contracts and constraints on running applications.
These contracts ensure that applications can run under certain constraints, allowing the platform to automate application management.

| Container Design Principles |

The seven principles described here relate to both the build time and the runtime, two types of concerns.

Build time
1) Single focus: Each container only addresses one point of interest and is done very well.
2) Self-contained: A container only relies on the Linux kernel. Additional library requirements can be added when building the container.
3) Mirror invariance: Containerized applications mean invariance, and once the build is complete, there is no need to rebuild from the environment.
Runtime
4) Highly observable: Each container must implement all the necessary APIs to help the platform observe and manage the application in the best possible way.
5) Lifecycle consistency: A container must be able to get event information from the platform and react accordingly.
6) Process tractability: The life of a containerized application must be as short as possible so that it can be replaced by another container at any time.
7) Runtime Restrictions: Each container must declare its own resource requirements and limit resource usage to the required range.

Source

October 18, 2018October 18, 2018

Managing Kubernetes Workloads With Rancher 2.0

A Detailed Overview of Rancher’s Architecture

This newly-updated, in-depth guidebook provides a detailed overview of the features and functionality of the new Rancher: an open-source enterprise Kubernetes platform.

Get the eBook

Rancher 2.0 was built with many things in mind. You can provision and manage Kubernetes clusters, deploy user services onto them and easily control access with authentication and RBAC. One of the coolest things about Rancher 2.0 is its intuitive UI, which we’ve designed to try and demystify Kubernetes, and accelerate adoption for anyone new to it. In this tutorial I’ll walk you through that new user interface, and explain how you can use it to deploy a simple NGINX service.

Designing Your Workload

There are several things that you might need to figure out before deploying the workload for your app:

Is it a stateless or stateful app?
How many instances of your app need to be running?
What are the placement rules — whether the app needs to run on specific hosts?
Is your app meant to be exposed as a service on a private network, so other applications can talk to it?
Is public access to the app needed?

There can be more questions to answer, but the above are the most basic ones and a good starting place. The Rancher UI will give you more details on what you can configure on your workload, so you can tune it up or update later.

Deploying your first workload with Rancher 2.0

Lets start with the fun part — deploying some very simple workload and exposing it to the outside world with Rancher. Assuming Rancher installation is done (it takes just one click), and at least one Kubernetes cluster is provisioned (a little bit more challenging than one click, but also very fast), switch to Project View and hit “Deploy” on the Workloads page:

All the options are default, except for image and Port Mapping (we will get into more details on this later). I want my service to publish on a random port on every host in my cluster, and when the port is hit, the traffic redirected to nginx internal port 80. Once the workload is deployed, the public endpoint will be set on the object in the UI for easy access:

By clicking on the 31217 public endpoint link, you’d get redirected straight to your service:

As you can see, it takes just one step to deploy the workload and publish it to the outside, which is very similar to Rancher 1.6. If you are a Kubernetes user, you know it takes a couple of Kubernetes objects to backup the above — a deployment and a service. The deployment will take care of starting the containerized application; it also monitors its health, restarts if it crashes based on a restart policy, etc. But in order to expose the application to the outside, Kubernetes needs a service object created explicitly. Rancher makes it simple for the end user by just getting the workload declaration in a user friendly way, and creating all the required Kubernetes constructs behind the scenes. More on those constructs in the next section.

More Workload Options

By default, the Rancher UI presents the user with the basic options for the workload deployment. You can choose to change them starting with the Workload type:

Based on the type picked, a corresponding Kubernetes resource is going to get created.

Scalable deployment of (n) pods — Kubernetes Deployment
Run one pod on each node — Kubernetes DaemonSet
Stateful set — Kubernetes StatefulSet
Run on a cron schedule — Kubernetes CronJob

Along with the type, options like image, environment variables, and labels can be set. That will all define the deployment spec of your application. Now, exposing the application to the outside can be done via the Port Mapping section:

With this port declaration, after the workload is deployed, it will be exposed via the same random port on every node in the cluster. Modify Source Port if you need a specific value instead of a random one. There are several options for “Publish on”:

Based on the value picked, Rancher will create a corresponding service object on the Kubernetes side:

Every node — Kubernetes NodePort Service
Internal cluster IP — Kubernetes ClusterIP service. Your workload will be accessible via a private network only in this case.
Load Balancer — Kubernetes Load Balancer service. This option should be picked only when your Kubernetes cluster is deployed in the public cloud, such as with AWS, and has an External Load Balancer support (like AWS ELB).
Nodes running a pod — no service gets created; HostPort option gets set in the Deployment spec

We highlight the implementation details, but you don’t really need to use them. Rancher UI/API would give all the necessary information in order to access your workload by providing a clickable link to the workload endpoint.

Traffic Distribution between Workloads using Ingress

There is one more way to publish the workload — via Ingress. Not only does it publish applications on standard http ports 80/443, but it also provides L7 routing capabilities along with SSL termination. Functionality like this can be useful if you deploy a web application and would like your traffic routed to different endpoints based on the host/path routing rules:

Unlike in Rancher 1.6, the Load Balancer is not tight to a specific LB provider like haproxy. The implementation varies based on Cluster type. For Google Container Engine clusters, it is GLBC, for Amazon EKS — AWS ELB/ALB, for on Digital Ocean/Amazon EC2 — nginx load balancer. The last one Rancher installs and manages, and we are planning to introduce more Load Balancer providers in the future on demand.

Enhanced Service Discovery

If you are building an application that consists of multiple workloads talking to each other, most likely DNS is used to resolve the service name. You can certainly connect to the container using the API address, but the container can die, and the ip address will change. So DNS is really the preferable way. Kubernetes Service Discovery comes as a built in feature in all the clusters provisioned by Rancher. Every workload created from the Rancher UI can be resolved by its name within the same namespace. Although a Kubernetes service (of ClusterIP type) needs to be created explicitly in order to discover the workload, Rancher takes this burden from its users, and creates the service automatically for every workload. In addition, Rancher enhances the Service Discovery by letting users create:

An Alias of another DNS value
A Custom record pointing to one or more existing workloads

All the above is available under Workloads Service Discovery page in the UI:

As you can see, configuring workloads in Rancher 2.0 is just as easy as in 1.6. Even though the backend now implements everything through Kubernetes, the Rancher UI still simplifies workload creation just as before. Through the Rancher interface, you can expose your workload to the public, place it behind a load balancer and configure internal service discovery — all accomplished in an intuitive and easy way. This blog covered the basics of workload management. We are planning to write more on features like Volumes, Application Catalog, etc. In addition, our UI and backend are constantly evolving. There may be new cool features being exposed as you read this post—so stay tuned!

Alena Prokharchyk

Software Engineer

Source

October 18, 2018October 18, 2018

Fixing the Subpath Volume Vulnerability in Kubernetes

On March 12, 2018, the Kubernetes Product Security team disclosed CVE-2017-1002101, which allowed containers using subpath volume mounts to access files outside of the volume. This means that a container could access any file available on the host, including volumes for other containers that it should not have access to.

The vulnerability has been fixed and released in the latest Kubernetes patch releases. We recommend that all users upgrade to get the fix. For more details on the impact and how to get the fix, please see the announcement. (Note, some functional regressions were found after the initial fix and are being tracked in issue #61563).

This post presents a technical deep dive on the vulnerability and the solution.

Kubernetes Background

To understand the vulnerability, one must first understand how volume and subpath mounting works in Kubernetes.

Before a container is started on a node, the kubelet volume manager locally mounts all the volumes specified in the PodSpec under a directory for that Pod on the host system. Once all the volumes are successfully mounted, it constructs the list of volume mounts to pass to the container runtime. Each volume mount contains information that the container runtime needs, the most relevant being:

Path of the volume in the container
Path of the volume on the host (/var/lib/kubelet/pods/<pod uid>/volumes/<volume type>/<volume name>)

When starting the container, the container runtime creates the path in the container root filesystem, if necessary, and then bind mounts it to the provided host path.

Subpath mounts are passed to the container runtime just like any other volume. The container runtime does not distinguish between a base volume and a subpath volume, and handles them the same way. Instead of passing the host path to the root of the volume, Kubernetes constructs the host path by appending the Pod-specified subpath (a relative path) to the base volume’s host path.

For example, here is a spec for a subpath volume mount:

apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
containers:
– name: my-container
<snip>
volumeMounts:
– mountPath: /mnt/data
name: my-volume
subPath: dataset1
volumes:
– name: my-volume
emptyDir: {}

In this example, when the Pod gets scheduled to a node, the system will:

Set up an EmptyDir volume at /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume
Construct the host path for the subpath mount: /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume/ + dataset1
Pass the following mount information to the container runtime:
- Container path: /mnt/data
- Host path: /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume/dataset1
The container runtime bind mounts /mnt/data in the container root filesystem to /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume/dataset1 on the host.
The container runtime starts the container.

The Vulnerability

The vulnerability with subpath volumes was discovered by Maxim Ivanov, by making a few observations:

Subpath references files or directories that are controlled by the user, not the system.
Volumes can be shared by containers that are brought up at different times in the Pod lifecycle, including by different Pods.
Kubernetes passes host paths to the container runtime to bind mount into the container.

The basic example below demonstrates the vulnerability. It takes advantage of the observations outlined above by:

Using an init container to setup the volume with a symlink.
Using a regular container to mount that symlink as a subpath later.
Causing kubelet to evaluate the symlink on the host before passing it into the container runtime.

apiVersion: v1
kind: Pod
metadata:
name: my-pod
spec:
initContainers:
– name: prep-symlink
image: “busybox”
command: [“bin/sh”, “-ec”, “ln -s / /mnt/data/symlink-door”]
volumeMounts:
– name: my-volume
mountPath: /mnt/data
containers:
– name: my-container
image: “busybox”
command: [“/bin/sh”, “-ec”, “ls /mnt/data; sleep 999999”]
volumeMounts:
– mountPath: /mnt/data
name: my-volume
subPath: symlink-door
volumes:
– name: my-volume
emptyDir: {}

For this example, the system will:

Setup an EmptyDir volume at /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume
Pass the following mount information for the init container to the container runtime:
- Container path: /mnt/data
- Host path: /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume
The container runtime bind mounts /mnt/data in the container root filesystem to /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume on the host.
The container runtime starts the init container.
The init container creates a symlink inside the container: /mnt/data/symlink-door -> /, and then exits.
Kubelet starts to prepare the volume mounts for the normal containers.
It constructs the host path for the subpath volume mount: /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume/ + symlink-door.
And passes the following mount information to the container runtime:
- Container path: /mnt/data
- Host path: /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty-dir/my-volume/symlink-door
The container runtime bind mounts /mnt/data in the container root filesystem to /var/lib/kubelet/pods/1234/volumes/kubernetes.io~empty~dir/my-volume/symlink-door
However, the bind mount resolves symlinks, which in this case, resolves to / on the host! Now the container can see all of the host’s filesystem through its mount point /mnt/data.

This is a manifestation of a symlink race, where a malicious user program can gain access to sensitive data by causing a privileged program (in this case, kubelet) to follow a user-created symlink.

It should be noted that init containers are not always required for this exploit, depending on the volume type. It is used in the EmptyDir example because EmptyDir volumes cannot be shared with other Pods, and only created when a Pod is created, and destroyed when the Pod is destroyed. For persistent volume types, this exploit can also be done across two different Pods sharing the same volume.

The Fix

The underlying issue is that the host path for subpaths are untrusted and can point anywhere in the system. The fix needs to ensure that this host path is both:

Resolved and validated to point inside the base volume.
Not changeable by the user in between the time of validation and when the container runtime bind mounts it.

The Kubernetes product security team went through many iterations of possible solutions before finally agreeing on a design.

Idea 1

Our first design was relatively simple. For each subpath mount in each container:

Resolve all the symlinks for the subpath.
Validate that the resolved path is within the volume.
Pass the resolved path to the container runtime.

However, this design is prone to the classic time-of-check-to-time-of-use (TOCTTOU) problem. In between steps 2) and 3), the user could change the path back to a symlink. The proper solution needs some way to “lock” the path so that it cannot be changed in between validation and bind mounting by the container runtime. All the subsequent ideas use an intermediate bind mount by kubelet to achieve this “lock” step before handing it off to the container runtime. Once a bind mount is performed, the mount source is fixed and cannot be changed.

Idea 2

We went a bit wild with this idea:

Create a working directory under the kubelet’s pod directory. Let’s call it dir1.
Bind mount the base volume to under the working directory, dir1/volume.
Chroot to the working directory dir1.
Inside the chroot, bind mount volume/subpath to subpath. This ensures that any symlinks get resolved to inside the chroot environment.
Exit the chroot.
On the host again, pass the bind mounted dir1/subpath to the container runtime.

While this design does ensure that the symlinks cannot point outside of the volume, it was ultimately rejected due to difficulties of implementing the chroot mechanism in 4) across all the various distros and environments that Kubernetes has to support, including containerized kubelets.

Idea 3

Coming back to earth a little bit, our next idea was to:

Bind mount the subpath to a working directory under the kubelet’s pod directory.
Get the source of the bind mount, and validate that it is within the base volume.
Pass the bind mount to the container runtime.

In theory, this sounded pretty simple, but in reality, 2) was quite difficult to implement correctly. Many scenarios had to be handled where volumes (like EmptyDir) could be on a shared filesystem, on a separate filesystem, on the root filesystem, or not on the root filesystem. NFS volumes ended up handling all bind mounts as a separate mount, instead of as a child to the base volume. There was additional uncertainty about how out-of-tree volume types (that we couldn’t test) would behave.

The Solution

Given the amount of scenarios and corner cases that had to be handled with the previous design, we really wanted to find a solution that was more generic across all volume types. The final design that we ultimately went with was to:

Resolve all the symlinks in the subpath.
Starting with the base volume, open each path segment one by one, using the openat() syscall, and disallow symlinks. With each path segment, validate that the current path is within the base volume.
Bind mount /proc/<kubelet pid>/fd/<final fd> to a working directory under the kubelet’s pod directory. The proc file is a link to the opened file. If that file gets replaced while kubelet still has it open, then the link will still point to the original file.
Close the fd and pass the bind mount to the container runtime.

Note that this solution is different for Windows hosts, where the mounting semantics are different than Linux. In Windows, the design is to:

Resolve all the symlinks in the subpath.
Starting with the base volume, open each path segment one by one with a file lock, and disallow symlinks. With each path segment, validate that the current path is within the base volume.
Pass the resolved subpath to the container runtime, and start the container.
After the container has started, unlock and close all the files.

Both solutions are able to address all the requirements of:

Resolving the subpath and validating that it points to a path inside the base volume.
Ensuring that the subpath host path cannot be changed in between the time of validation and when the container runtime bind mounts it.
Being generic enough to support all volume types.

Acknowledgements

Special thanks to many folks involved with handling this vulnerability:

Maxim Ivanov, who responsibly disclosed the vulnerability to the Kubernetes Product Security team.
Kubernetes storage and security engineers from Google, Microsoft, and RedHat, who developed, tested, and reviewed the fixes.
Kubernetes test-infra team, for setting up the private build infrastructure
Kubernetes patch release managers, for coordinating and handling all the releases.
All the production release teams that worked to deploy the fix quickly after release.

If you find a vulnerability in Kubernetes, please follow our responsible disclosure process and let us know; we want to do our best to make Kubernetes secure for all users.

– Michelle Au, Software Engineer, Google; and Jan Šafránek, Software Engineer, Red Hat

Source

October 18, 2018October 18, 2018

Rancher Glossary: 1.6 to 2.0 Terms and Concepts

How to Migrate from Rancher 1.6 to Rancher 2.1 Online Meetup

Key terminology differences, implementing key elements, and transforming Compose to YAML

Watch the video

As we near the end of the development process for Rancher 2.0, we thought it might be useful to provide a glossary of terms that will help Rancher users understand the fundamental concepts in Kubernetes and Rancher.

In the move from Rancher 1.6 to Rancher 2.0, we have aligned more with the Kubernetes naming standard. This shift could be confusing for people who have only used Cattle environments under Rancher 1.6.

This article aims to help you understand the new concepts in Rancher 2.0. It can also act as an easy reference for terms and concepts between the container orchestrators Cattle and Kubernetes.

Rancher 1.6 Cattle compared with Rancher 2.0 Kubernetes

Rancher 1.6 offered Cattle as a container orchestrator and many users chose to use it. In Cattle, you have an environment , which is both an administrative and a compute boundary, i.e., the lowest level at which you can assign permissions; importantly, all hosts in that environment were dedicated to that environment and that environment alone. Then, to organize your containers, you had a Stack , which was a logical grouping of a collection of services, with a service being a particular running image.

So, how does this structure look under under 2.0?

If you are working in the container space, then it is unlikely that you haven’t heard some of the buzz words around Kubernetes, such as pods, namespaces and nodes. What this article aims to do is ease the transition from Cattle to Kubernetes by aligning the terms of both orchestrators. Along with some of the names changing, some of the capabilities have changed as well.

The following table gives a definition of some of the core Kubernetes concepts

Concept	Definition
Cluster	Collection of machines that run containerized applications managed by Kubernetes
Namespace	A virtual cluster, multiple of which can be supported by a single physical cluster
Node	One of the physical (virtual) machines that make up a cluster
Pod	The smallest and simplest Kubernetes object. A Pod represents a set of running containers on your cluster
Deployment	An API object that manages a replicated application
Workload	Units of work that are running on the cluster, these can be pods, or deployments.

More detailed information on Kubernetes concepts can be found at
https://kubernetes.io/docs/concepts/

ENVIRONMENTS

The environment in Rancher 1.6 represented 2 things:

The Compute boundary
The administrative boundary

In 2.0 the environment concept doesn’t exist, instead it becomes replaced by:

Cluster – The compute boundary
Project – An administrative boundary

A Project is an administrative layer introduced by Rancher to ease the burden of administration in Kubernetes.

HOST

In Cattle, a host could only belong to one environment, things are similar in that nodes (the new name for hosts!) can only belong to one cluster. What used to be an environment with hosts, is now a cluster with nodes.

STACK

A stack in Rancher 1.6 is a way to group a number of services. In Rancher 2.0 this is done via namespaces.

SERVICE

In Rancher 1.6, a service was defined as one or more instances of the same container running. In Rancher 2.0, one or more instances of the same container running are defined as a workload , where a workload can be made up of a pod (s) running with a controller.

CONTAINER

A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings, etc. Within Rancher 1.6, a container was the minimal definition required to run an application. Under Kubernetes, a pod is the minimal definition. A pod can be a single image, or it can be a number of images that all share the same storage/network and description of how they interact. Pod contents are always co-located and co-scheduled, and run in a shared context.

LOAD BALANCER

In Rancher 1.6, a Load Balancer was used to expose your applications from within the Rancher environment for access externally. In Rancher 2.0, the concept is the same. There is a Load Balancer option to expose your services. In the language of Kubernetes, this function is more often referred to as an Ingress. In short, Load Balancer and Ingress play the same role.

Conclusion

In terms of concepts, Cattle was the closest orchestrator to Kubernetes out of all of the orchestrators. Hopefully this article will act as an easy reference for people moving from Rancher 1.6 to 2.0. Plus, the similarity between the two orchestrators should allow for an easier transition.

The following table gives a quick reference for the old versus new terms.

Rancher 1.6	Rancher 2.0
Container	Pod
Services	Workload
Load Balancer	Ingress
Stack	Namespace
Environment	Project (Administration)/Cluster (Compute)
Host	Node
Catalog	Helm

For further reading and training, check out our free online training series: Introduction to Kubernetes and Rancher.

Chris Urwin

Chris Urwin
UK Technical Lead

Source