Kubernetes 1.8: Hidden Gems – Volume Snapshotting /

23/Nov 2017

By Luke Addison

In this Hidden Gems blog post, Luke looks at the new volume snapshotting functionality in Kubernetes and how cluster administrators can use this feature to take and restore snapshots of their data.

In Kubernetes 1.8, volume snapshotting has been released as a prototype. It is external to core Kubernetes whilst it is in the prototype phase, but you can find the project under the snapshot subdirectory of the kubernetes-incubator/external-storage repository. For a detailed explanation of the implementation of volume snapshotting, read the design proposal here. The prototype currently supports GCE PD, AWS EBS, OpenStack Cinder and Kubernetes hostPath volumes. Note that aside from hostPath volumes, the logic for snapshotting a volume is implemented by cloud providers; the purpose of volume snapshotting in Kubernetes is to provide a common API for negotiating with different cloud providers in order to take and restore snapshots.

The best way to get an overview of volume snapshotting in Kubernetes is by going through an example. In this post, we are going to spin up a Kubernetes 1.8 cluster on GKE, deploy snapshot-controller and snapshot-provisioner and take and restore a snapshot of a GCE PD.

For reproducibility, I am using Git commit hash b1d5472a7b47777bf851cfb74bfaf860ad49ed7c of the kubernetes-incubator/external-storage repository.

The first thing we need to do is compile and package both snapshot-controller and snapshot-provisioner into Docker containers. Make sure you have installed Go and configured your GOPATH correctly.

$ go get -d github.com/kubernetes-incubator/external-storage
$ cd $GOPATH/src/github.com/kubernetes-incubator/external-storage/snapshot
$ # Checkout a fixed revision
$ #git checkout b1d5472a7b47777bf851cfb74bfaf860ad49ed7c
$ GOOS=linux GOARCH=amd64 go build -o _output/bin/snapshot-controller-linux-amd64 cmd/snapshot-controller/snapshot-controller.go
$ GOOS=linux GOARCH=amd64 go build -o _output/bin/snapshot-provisioner-linux-amd64 cmd/snapshot-pv-provisioner/snapshot-pv-provisioner.go

You can then use the following Dockerfiles. These will build both snapshot-controller and snapshot-provisioner. We run apk add –no-cache ca-certificates in order to add root certificates into the container images. To avoid using stale certificates, we could alternatively pass them into the containers by mounting the hostPath /etc/ssl/certs to the same location in the containers.

FROM alpine:3.6

RUN apk add –no-cache ca-certificates

COPY _output/bin/snapshot-controller-linux-amd64 /usr/bin/snapshot-controller

ENTRYPOINT [“/usr/bin/snapshot-controller”]

FROM alpine:3.6

RUN apk add –no-cache ca-certificates

COPY _output/bin/snapshot-provisioner-linux-amd64 /usr/bin/snapshot-provisioner

ENTRYPOINT [“/usr/bin/snapshot-provisioner”]

$ docker build -t dippynark/snapshot-controller:latest . -f Dockerfile.controller
$ docker build -t dippynark/snapshot-provisioner:latest . -f Dockerfile.provisioner
$ docker push dippynark/snapshot-controller:latest
$ docker push dippynark/snapshot-provisioner:latest

We will now create a cluster on GKE using gcloud.

$ gcloud container clusters create snapshot-demo –cluster-version 1.8.3-gke.0
Creating cluster snapshot-demo…done.
Created [https://container.googleapis.com/v1/projects/jetstack-sandbox/zones/europe-west1-b/clusters/snapshot-demo].
kubeconfig entry generated for snapshot-demo.
NAME ZONE MASTER_VERSION MASTER_IP MACHINE_TYPE NODE_VERSION NUM_NODES STATUS
snapshot-demo europe-west1-b 1.8.3-gke.0 35.205.77.138 n1-standard-1 1.8.3-gke.0 3 RUNNING

Snapshotting requires two extra resources, VolumeSnapshot and VolumeSnapshotData. For an overview of the lifecyle of these two resources, take a look at the user guide in the project itself. We will look at the functionality of each of these resources further down the page, but the first step is to register them with the API Server. This is done using CustomResourceDefinitions. snapshot-controller will create a CustomeResourceDefinition for each of VolumeSnapshot and VolumeSnapshotData when it starts up so some of the work is taken care of for us. snapshot-controller will also watch for VolumeSnapshot resources and take snapshots of the volumes they reference. To allow us to restore our snapshots we will deploy snapshot-provisioner as well.

apiVersion: v1
kind: ServiceAccount
metadata:
name: snapshot-controller-runner
namespace: kube-system

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: snapshot-controller-role
rules:
– apiGroups: [“”]
resources: [“persistentvolumes”]
verbs: [“get”, “list”, “watch”, “create”, “delete”]
– apiGroups: [“”]
resources: [“persistentvolumeclaims”]
verbs: [“get”, “list”, “watch”, “update”]
– apiGroups: [“storage.k8s.io”]
resources: [“storageclasses”]
verbs: [“get”, “list”, “watch”]
– apiGroups: [“”]
resources: [“events”]
verbs: [“list”, “watch”, “create”, “update”, “patch”]
– apiGroups: [“apiextensions.k8s.io”]
resources: [“customresourcedefinitions”]
verbs: [“create”, “list”, “watch”, “delete”]
– apiGroups: [“volumesnapshot.external-storage.k8s.io”]
resources: [“volumesnapshots”]
verbs: [“get”, “list”, “watch”, “create”, “update”, “patch”, “delete”]
– apiGroups: [“volumesnapshot.external-storage.k8s.io”]
resources: [“volumesnapshotdatas”]
verbs: [“get”, “list”, “watch”, “create”, “update”, “patch”, “delete”]

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: snapshot-controller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: snapshot-controller-role
subjects:
– kind: ServiceAccount
name: snapshot-controller-runner
namespace: kube-system

apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: snapshot-controller
namespace: kube-system
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: snapshot-controller
spec:
serviceAccountName: snapshot-controller-runner
containers:
– name: snapshot-controller
image: dippynark/snapshot-controller
imagePullPolicy: Always
args:
– -cloudprovider=gce
– name: snapshot-provisioner
image: dippynark/snapshot-provisioner
imagePullPolicy: Always
args:
– -cloudprovider=gce

In this case we have specified -cloudprovider=gce, but you can also use aws or openstack depending on your environment. For these other cloud providers there may be other parameters you need to set to configure the neccessary authorisation. Examples of how to do this can be found here. hostPath is enabled by default, but requires you to run snapshot-controller and snapshot-provisioner on the same node as the hostPath volume that you want to snapshot and restore and should only be used on single node development clusters for testing purposes. For an example of how to deploy snapshot-controller and snapshot-provisioner to take and restore hostPath volume snapshots for a particular directory, see here. For a walkthrough of taking and restoring a hostPath volume snapshot see here.

We have also defined a new ServiceAccount to which we have bound a custom ClusterRole. This is only needed for RBAC enabled clusters. If you have not enabled RBAC in your cluster, you can ignore the ServiceAccount, ClusterRole and ClusterRoleBinding and remove the serviceAccountName field from the snapshot-controller Deployment. If you have enabled RBAC in your cluster, notice that we have authorised the ServiceAccount to create, list, watch and delete CustomResourceDefinitions. This is so that snapshot-controller can set them up for our two new resources. Since snapshot-controller only needs these CustomResourceDefinition permissions temporarily on startup, it would be better to remove them and make administrators create the two CustomResourceDefinitions manually. Once snapshort-controller is running, you will be able to see the created CustomResourceDefinitions.

$ kubectl get crd
NAME AGE
volumesnapshotdatas.volumesnapshot.external-storage.k8s.io 1m
volumesnapshots.volumesnapshot.external-storage.k8s.io 1m

To see the full definitions for these resources you can run kubectl get crd -o yaml. Note that VolumeSnapshot specifies a scope of Namespaced and VolumeSnapshotData is non namespaced. We can now interact with our new resource types.

$ kubectl get volumesnapshot,volumesnapshotdata
No resources found.

Looking at the logs for both snapshot containers we can see that they are working correctly.

$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE

snapshot-controller-66f7c56c4-h7cpf 2/2 Running 0 1m
$ kubectl logs snapshot-controller-66f7c56c4-h7cpf -n kube-system -c snapshot-controller
I1104 11:38:53.551581 1 gce.go:348] Using existing Token Source &oauth2.reuseTokenSource, mu:sync.Mutex, t:(*oauth2.Token)(nil)}
I1104 11:38:53.553988 1 snapshot-controller.go:127] Register cloudprovider %sgce-pd
I1104 11:38:53.553998 1 snapshot-controller.go:93] starting snapshot controller
I1104 11:38:53.554050 1 snapshot-controller.go:168] Starting snapshot controller
$ kubectl logs snapshot-controller-66f7c56c4-h7cpf -n kube-system -c snapshot-provisioner
I1104 11:38:57.565797 1 gce.go:348] Using existing Token Source &oauth2.reuseTokenSource, mu:sync.Mutex, t:(*oauth2.Token)(nil)}
I1104 11:38:57.569374 1 snapshot-pv-provisioner.go:284] Register cloudprovider %sgce-pd
I1104 11:38:57.585940 1 snapshot-pv-provisioner.go:267] starting PV provisioner volumesnapshot.external-storage.k8s.io/snapshot-promoter
I1104 11:38:57.586017 1 controller.go:407] Starting provisioner controller be8211fa-c154-11e7-a1ac-0a580a200004!

Let’s now create the PersistentVolumeClaim we are going to snapshot.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: gce-pvc
spec:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 3Gi

Note that this is using the default StorageClass on GKE which will dynamically provision a GCE PD PersistentVolume. Let’s now create a Pod that will create some data in the volume. We will take a snapshot of the data and restore it later.

apiVersion: v1
kind: Pod
metadata:
name: busybox
spec:
restartPolicy: Never
containers:
– name: busybox
image: busybox
command:
– “/bin/sh”
– “-c”
– “while true; do date >> /tmp/pod-out.txt; sleep 1; done”
volumeMounts:
– name: volume
mountPath: /tmp
volumes:
– name: volume
persistentVolumeClaim:
claimName: gce-pvc

The Pod appends the current date and time to a file stored on our GCE PD every second. We can use cat to inspect the file.

$ kubectl exec -it busybox cat /tmp/pod-out.txt
Sat Nov 4 11:41:30 UTC 2017
Sat Nov 4 11:41:31 UTC 2017
Sat Nov 4 11:41:32 UTC 2017
Sat Nov 4 11:41:33 UTC 2017
Sat Nov 4 11:41:34 UTC 2017
Sat Nov 4 11:41:35 UTC 2017
$

We are now ready to take a snapshot. Once we create the VolumeSnapshot resource below, snapshot-controller will attempt to create the actual snapshot by interacting with the configured cloud provider (GCE in our case). If successful, the VolumeSnapshot resource is bound to a corresponding VolumeSnapshotData resource. We need to reference the PersistentVolumeClaim that references the data we want to snapshot.

apiVersion: volumesnapshot.external-storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: snapshot-demo
spec:
persistentVolumeClaimName: gce-pvc

$ kubectl create -f snapshot.yaml
volumesnapshot “snapshot-demo” created
$ kubectl get volumesnapshot
NAME AGE
snapshot-demo 18s
$ kubectl describe volumesnapshot snapshot-demo
Name: snapshot-demo
Namespace: default
Labels: SnapshotMetadata-PVName=pvc-048bd424-c155-11e7-8910-42010a840164
SnapshotMetadata-Timestamp=1509796696232920051
Annotations: <none>
API Version: volumesnapshot.external-storage.k8s.io/v1
Kind: VolumeSnapshot
Metadata:
Cluster Name:
Creation Timestamp: 2017-11-04T11:58:16Z
Generation: 0
Resource Version: 2348
Self Link: /apis/volumesnapshot.external-storage.k8s.io/v1/namespaces/default/volumesnapshots/snapshot-demo
UID: 71256cf8-c157-11e7-8910-42010a840164
Spec:
Persistent Volume Claim Name: gce-pvc
Snapshot Data Name: k8s-volume-snapshot-7193cceb-c157-11e7-8e59-0a580a200004
Status:
Conditions:
Last Transition Time: 2017-11-04T11:58:22Z
Message: Snapshot is uploading
Reason:
Status: True
Type: Pending
Last Transition Time: 2017-11-04T11:58:34Z
Message: Snapshot created successfully and it is ready
Reason:
Status: True
Type: Ready
Creation Timestamp: <nil>
Events: <none>

Notice the Snapshot Data Name field. This is a reference to the VolumeSnapshotData resource that was created by snapshot-controller when we created our VolumeSnapshot. The conditions towards the bottom of the output above show that our snapshot was created successfully. We can check snapshot-controller’s logs to verify this.

$ kubectl logs snapshot-controller-66f7c56c4-ptjmb -n kube-system -c snapshot-controller

I1104 11:58:34.245845 1 snapshotter.go:239] waitForSnapshot: Snapshot default/snapshot-demo created successfully. Adding it to Actual State of World.
I1104 11:58:34.245853 1 actual_state_of_world.go:74] Adding new snapshot to actual state of world: default/snapshot-demo
I1104 11:58:34.245860 1 snapshotter.go:516] createSnapshot: Snapshot default/snapshot-demo created successfully.

We can also view the snapshot in GCE.

gce snapshot

We can now look at the corresponding VolumeSnapshotData resource that was created.

$ kubectl get volumesnapshotdata
NAME AGE
k8s-volume-snapshot-7193cceb-c157-11e7-8e59-0a580a200004 3m
$ kubectl describe volumesnapshotdata k8s-volume-snapshot-2a97d3f9-c155-11e7-8e59-0a580a200004
Name: k8s-volume-snapshot-7193cceb-c157-11e7-8e59-0a580a200004
Namespace:
Labels: <none>
Annotations: <none>
API Version: volumesnapshot.external-storage.k8s.io/v1
Kind: VolumeSnapshotData
Metadata:
Cluster Name:
Creation Timestamp: 2017-11-04T11:58:17Z
Deletion Grace Period Seconds: <nil>
Deletion Timestamp: <nil>
Resource Version: 2320
Self Link: /apis/volumesnapshot.external-storage.k8s.io/v1/k8s-volume-snapshot-7193cceb-c157-11e7-8e59-0a580a200004
UID: 71a28267-c157-11e7-8910-42010a840164
Spec:
Gce Persistent Disk:
Snapshot Id: pvc-048bd424-c155-11e7-8910-42010a8401641509796696237472729
Persistent Volume Ref:
Kind: PersistentVolume
Name: pvc-048bd424-c155-11e7-8910-42010a840164
Volume Snapshot Ref:
Kind: VolumeSnapshot
Name: default/snapshot-demo
Status:
Conditions:
Last Transition Time: <nil>
Message: Snapshot creation is triggered
Reason:
Status: Unknown
Type: Pending
Creation Timestamp: <nil>
Events: <none>

Notice the reference to the GCE PD snapshot. It also references the VolumeSnapshot resource we created above and the PersistentVolume that the snapshot has been taken from. This was the PersistentVolume that was dynamically provisioned when we created our gcd-pvc PersistentVolumeClaim earlier. One thing to point out here is that snapshot-controller does not deal with pausing any applications that are interacting with the volume before the snapshot is taken, so the data may be inconsistent if you do not deal with this manually. This will be less of a problem for some applications than others.

The following diagram shows how the various resources discussed above reference each other. We can see how a VolumeSnapshot binds to a VolumeSnapshotData resource. This is analogous to PersistentVolumeClaims and PersistentVolumes. We can also see that VolumeSnapshotData references the actual snapshot taken by the volume provider, in the same way to how a PersistentVolume references the physical volume backing it.

relationship diagram

Now that we have created a snapshot, we can restore it. To do this we need to create a special StorageClass implemented by snapshot-provisioner. We will then create a PersistentVolumeClaim referencing this StorageClass. An annotation on the PersistentVolumeClaim will inform snapshot-provisioner on where to find the information it needs to negotiate with the cloud provider to restore the snapshot. The StorageClass can be defined as follows.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: snapshot-promoter
provisioner: volumesnapshot.external-storage.k8s.io/snapshot-promoter
parameters:
type: pd-standard

Note the provisioner field which tells snapshot-provisioner it needs to implement the StorageClass. We can now create a PersistentVolumeClaim that will use the StorageClass to dynamically provision a PersistentVolume that contains the contents of our snapshot.

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: busybox-snapshot
annotations:
snapshot.alpha.kubernetes.io/snapshot: snapshot-demo
spec:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 3Gi
storageClassName: snapshot-promoter

Note the snapshot.alpha.kubernetes.io/snapshot annotation which refers to the VolumeSnapshot we want to use. snapshot-provisioner can use this resource to get all the information it needs to perform the restore. We have also specified snapshot-promoter as the storageClassName which tells snapshot-provisioner that it needs to act. snapshot-provisioner will provision a PersistentVolume containing the contents of the snapshot-demo snapshot. We can see from the STORAGECLASS columns below that the snapshot-promoter StorageClass has been used.

$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE

busybox-snapshot Bound pvc-8eed96e4-c157-11e7-8910-42010a840164 3Gi RWO snapshot-promoter 11s

$ kubectl get pv pvc-8eed96e4-c157-11e7-8910-42010a840164
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-8eed96e4-c157-11e7-8910-42010a840164 3Gi RWO Delete Bound default/busybox-snapshot snapshot-promoter 21s

Checking the snapshot-provisioner logs we can see that the snapshot was restored successfully.

$ kubectl logs snapshot-controller-66f7c56c4-ptjmb -n kube-system -c snapshot-provisioner

Provisioning disk pvc-8eed96e4-c157-11e7-8910-42010a840164 from snapshot pvc-048bd424-c155-11e7-8910-42010a8401641509796696237472729, zone europe-west1-b requestGB 3 tags map[source:Created from snapshot pvc-048bd424-c155-11e7-8910-42010a8401641509796696237472729 -dynamic-pvc-8eed96e4-c157-11e7-8910-42010a840164]

I1104 11:59:10.563990 1 controller.go:813] volume “pvc-8eed96e4-c157-11e7-8910-42010a840164” for claim “default/busybox-snapshot” created
I1104 11:59:10.987620 1 controller.go:830] volume “pvc-8eed96e4-c157-11e7-8910-42010a840164” for claim “default/busybox-snapshot” saved
I1104 11:59:10.987740 1 controller.go:866] volume “pvc-8eed96e4-c157-11e7-8910-42010a840164” provisioned for claim “default/busybox-snapshot”

Let’s finally mount the busybox-snapshot PersistentVolumeClaim into a Pod to see that the snapshot was restored properly.

apiVersion: v1
kind: Pod
metadata:
name: busybox-snapshot
spec:
restartPolicy: Never
containers:
– name: busybox
image: busybox
command:
– “/bin/sh”
– “-c”
– “while true; do sleep 1; done”
volumeMounts:
– name: volume
mountPath: /tmp
volumes:
– name: volume
persistentVolumeClaim:
claimName: busybox-snapshot

We can use cat to see the data written to the volume by the busybox pod.

$ kubectl exec -it busybox-snapshot cat /tmp/pod-out.txt
Sat Nov 4 11:41:30 UTC 2017
Sat Nov 4 11:41:31 UTC 2017
Sat Nov 4 11:41:32 UTC 2017
Sat Nov 4 11:41:33 UTC 2017
Sat Nov 4 11:41:34 UTC 2017
Sat Nov 4 11:41:35 UTC 2017

Sat Nov 4 11:58:13 UTC 2017
Sat Nov 4 11:58:14 UTC 2017
Sat Nov 4 11:58:15 UTC 2017
$

Notice that since the data is coming from a snapshot, the final date does not change if we run cat repeatedly.

$ kubectl exec -it busybox-snapshot cat /tmp/pod-out.txt

Sat Nov 4 11:58:15 UTC 2017
$

Comparing the final date to the creation time of the snapshot in GCE, we can see that the snapshot took about 2 seconds to take.

We can delete the VolumeSnapshot resource which will also delete the corresponding VolumeSnapshotData resource and the snapshot in GCE. This will not affect any PersistentVolumeClaims or PersistentVolumes we have already provisioned using the snapshot. Conversely, deleting any PersistentVolumeClaims or PersistentVolumes that have been used to take a snapshot or have been provisioned using a snapshot will not delete the snapshot itself from GCE, however deleting the PersistentVolumeClaim or PersistentVolume that was used to take a snapshot will prevent you from restoring any further snapshots using snapshot-provisioner.

$ kubectl delete volumesnapshot snapshot-demo
volumesnapshot “snapshot-demo” deleted

We should also delete the busybox Pods so they do not keep checking the date forever.

$ kubectl delete pods busybox busybox-snapshot
pod “busybox” deleted
pod “busybox-snapshot” deleted

For good measure we will also clean up the PersistentVolumeClaims and the cluster itself.

$ kubectl delete pvc busybox-snapshot gce-pvc
persistentvolumeclaim “busybox-snapshot” deleted
persistentvolumeclaim “gce-pvc” deleted
$ yes | gcloud container clusters delete snapshot-demo –async
The following clusters will be deleted.
– [snapshot-demo] in [europe-west1-b]

Do you want to continue (Y/n)?
$

As usual, any GCE PDs you provisioned will not be deleted by deleting the cluster, so make sure to clear those up too if you do not want to be charged.

Although this project is in the early stages, you can instantly see its potential from this simple example and we will hopefully see support for other volume providers very soon as it matures. Together with CronJobs, we now have the primitives we need within Kubernetes to perform automated backups of our data. For submitting any issues or project contributions, the best place to start is the external-storage issues tab.

Source

Announcing Project Longhorn v0.3.0 Release

Hi,

This is Sheng Yang from Rancher Labs. Today I am very excited to announce that, after five months of hard work, Longhorn v0.3.0 is now available at https://github.com/rancher/longhorn ! Longhorn v0.3.0 is also available now through the app catalog in Rancher 2.0.

As you may recall, we released Longhorn v0.2 back in March, with support for Kubernetes. We got great feedback from that release, and many feature requests as well. For the last five months, we’ve worked very hard to meet your expectations. Now we’re glad to present you the feature-packed Longhorn v0.3.0 release!

Newly designed UI

We’ve greatly improved Longhorn UI in v0.3. Now the user can see the status of the system in the dashboard. We’ve added multi-select and group operations for the volumes. Also, the websocket support has been added so it is no longer necessary to refresh the page to update the UI. Instead, the UI updates itself when the backend state changes. All those changes should improve the user experience immensely.

Here are some screenshots of the updated UI:

Dashboard

Dashboard

Node page

Node page

Volume page

Volume page

Container Storage Interface (CSI)

For v0.2, the most common issue we got from users is the misconfiguration of Flexvolume driver directory location. As a result, Kubernetes may not be able to connect to the Longhorn Driver at all. Kubernetes doesn’t provide information regarding the location of Flexvolume driver and the user would need to figure that out manually. In v0.3, we’ve added support for the latest Container Storage Interface, which needs no configuration beforehand to install. See here for the details about the requirement and how to install Longhorn with CSI driver.

For the users who are continuing to use Flexvolume and have to figure out the volume plugin locations, we’ve included a script to help user. Check it here.

S3 as the backup target

One of the key features of Longhorn is volume backup. It can back up the local snapshots and transfer it to a secondary storage, like NFS. One of the most requested features in v0.2 is supporting S3 as the backup target. We’ve made it possible with v0.3. See here for how to use S3 as the backup target for Longhorn.

Multiple disks with capacity based scheduling

Longhorn v0.2 placed volumes randomly on disks regardless of available disk space. In v0.3, we support multiple disks per node, and we’ve rewritten our scheduler to provide capacity based scheduling. The user now can enable/disable scheduling for any node or disk and find out how much of the disks are used. We’ve also provided various options for the user to customize how Longhorn would schedule volumes on top of available disk space. See here for the details.

Base image

Inv0.3, we support the base image feature. The base image in Longhorn is a packaged Docker image, following the RancherVM image spec. So if the user has a read-only image which needs to be shared between multiple volumes, it can be done using the base image feature. See here for how to create and use the base image feature.

iSCSI frontend

We’ve added iSCSI as a supported frontend by Longhorn. Previously we only supported using block device as the frontend to access the volume content. We believe adding iSCSI frontend should benefit traditional hypervisors that prefer iSCSI as the interface to the block devices. See here for the details about iSCSI frontend support.

Engine live upgrade

The last but not the least, we’ve put in a framework to support upgrading Longhorn engine without bringing down the volume. As you may recall, Longhorn engines include one controller and multiple replicas. Now while the volume is running, we can swap out the old version of controller and replica and put in a new version on the fly. So you can deploy new versions of Longhorn storage software without volume downtime.

Noted that even though you can live upgrade Longhorn engine from v0.3 to future versions, you cannot live upgrade from v0.2 to v0.3.

Upgrade

Longhorn v0.3 supports upgrade of all of its software components by leveraging Kubernetes. See the instructions for the upgrade here.

Notice for the users installed Longhorn v0.1 using Rancher app catalog, do not use the upgrade button in the UI. Currently the upgrade cannot be done correctly via the Rancher app catalog. Please follow the instruction above to manually upgrade your old Longhorn system.

Future release plan

We will release minor stable releases starting from v0.3. The user can always upgrade to the stable at https://github.com/rancher/longhorn or deploy Longhorn from Rancher app catalog. The next minor release is v0.3.1. You can see the issues tracker for the release here.

You can see the release plan for the next major release (v0.4) here.

Final words

Give Longhorn a try.

As you try Longhorn software please be aware that Longhorn is still a work in progress. It’s currently an alpha quality project. We don’t recommend to use it in the production environment.

If you find any issues, feel free to file it using our Github issues. You can also contact us using Rancher forum, or Slack.

Enjoy!

Sheng Yang

Principal Engineer

Sheng Yang currently leads Project Longhorn in Rancher Labs, Rancher’s open source microservices-based, distributed block storage solution. He is also the author of Convoy, an open source persistent storage solution for Docker. Before Rancher Labs, he joined Citrix through the Cloud.com acquisition, where he worked on CloudStack project and CloudPlatform product. Before that, he was a kernel developer at Intel focused on KVM and Xen development. He has worked in the fields of virtualization and cloud computing for the last eleven years.

Source

The Machines Can Do the Work, a Story of Kubernetes Testing, CI, and Automating the Contributor Experience

Author: Aaron Crickenberger (Google) and Benjamin Elder (Google)

“Large projects have a lot of less exciting, yet, hard work. We value time spent automating repetitive work more highly than toil. Where that work cannot be automated, it is our culture to recognize and reward all types of contributions. However, heroism is not sustainable.”Kubernetes Community Values

Like many open source projects, Kubernetes is hosted on GitHub. We felt the barrier to participation would be lowest if the project lived where developers already worked, using tools and processes developers already knew. Thus the project embraced the service fully: it was the basis of our workflow, our issue tracker, our documentation, our blog platform, our team structure, and more.

This strategy worked. It worked so well that the project quickly scaled past its contributors’ capacity as humans. What followed was an incredible journey of automation and innovation. We didn’t just need to rebuild our airplane mid-flight without crashing, we needed to convert it into a rocketship and launch into orbit. We needed machines to do the work.

The Work

Initially, we focused on the fact that we needed to support the sheer volume of tests mandated by a complex distributed system such as Kubernetes. Real world failure scenarios had to be exercised via end-to-end (e2e) tests to ensure proper functionality. Unfortunately, e2e tests were susceptible to flakes (random failures) and took anywhere from an hour to a day to complete.

Further experience revealed other areas where machines could do the work for us:

  • PR Workflow
    • Did the contributor sign our CLA?
    • Did the PR pass tests?
    • Is the PR mergeable?
    • Did the merge commit pass tests?
  • Triage
    • Who should be reviewing PRs?
    • Is there enough information to route an issue to the right people?
    • Is an issue still relevant?
  • Project Health
    • What is happening in the project?
    • What should we be paying attention to?

As we developed automation to improve our situation, we followed a few guiding principles:

  • Follow the push/poll control loop patterns that worked well for Kubernetes
  • Prefer stateless loosely coupled services that do one thing well
  • Prefer empowering the entire community over empowering a few core contributors
  • Eat our own dogfood and avoid reinventing wheels

Enter Prow

This led us to create Prow as the central component for our automation. Prow is sort of like an If This, Then That for GitHub events, with a built-in library of commands, plugins, and utilities. We built Prow on top of Kubernetes to free ourselves from worrying about resource management and scheduling, and ensure a more pleasant operational experience.

Prow lets us do things like:

  • Allow our community to triage issues/PRs by commenting commands such as “/priority critical-urgent”, “/assign mary” or “/close”
  • Auto-label PRs based on how much code they change, or which files they touch
  • Age out issues/PRs that have remained inactive for too long
  • Auto-merge PRs that meet our PR workflow requirements
  • Run CI jobs defined as Knative Builds, Kubernetes Pods, or Jenkins jobs
  • Enforce org-wide and per-repo GitHub policies like branch protection and GitHub labels

Prow was initially developed by the engineering productivity team building Google Kubernetes Engine, and is actively contributed to by multiple members of Kubernetes SIG Testing. Prow has been adopted by several other open source projects, including Istio, JetStack, Knative and OpenShift. Getting started with Prow takes a Kubernetes cluster and kubectl apply starter.yaml (running pods on a Kubernetes cluster).

Once we had Prow in place, we began to hit other scaling bottlenecks, and so produced additional tooling to support testing at the scale required by Kubernetes, including:

  • Boskos: manages job resources (such as GCP projects) in pools, checking them out for jobs and cleaning them up automatically (with monitoring)
  • ghProxy: a reverse proxy HTTP cache optimized for use with the GitHub API, to ensure our token usage doesn’t hit API limits (with monitoring)
  • Greenhouse: allows us to use a remote bazel cache to provide faster build and test results for PRs (with monitoring)
  • Splice: allows us to test and merge PRs in a batch, ensuring our merge velocity is not limited to our test velocity
  • Tide: allows us to merge PRs selected via GitHub queries rather than ordered in a queue, allowing for significantly higher merge velocity in tandem with splice

Scaling Project Health

With workflow automation addressed, we turned our attention to project health. We chose to use Google Cloud Storage (GCS) as our source of truth for all test data, allowing us to lean on established infrastructure, and allowed the community to contribute results. We then built a variety of tools to help individuals and the project as a whole make sense of this data, including:

  • Gubernator: display the results and test history for a given PR
  • Kettle: transfer data from GCS to a publicly accessible bigquery dataset
  • PR dashboard: a workflow-aware dashboard that allows contributors to understand which PRs require attention and why
  • Triage: identify common failures that happen across all jobs and tests
  • Testgrid: display test results for a given job across all runs, summarize test results across groups of jobs

We approached the Cloud Native Computing Foundation (CNCF) to develop DevStats to glean insights from our GitHub events such as:

Into the Beyond

Today, the Kubernetes project spans over 125 repos across five orgs. There are 31 Special Interests Groups and 10 Working Groups coordinating development within the project. In the last year the project has had participation from over 13,800 unique developers on GitHub.

On any given weekday our Prow instance runs over 10,000 CI jobs; from March 2017 to March 2018 it ran 4.3 million jobs. Most of these jobs involve standing up an entire Kubernetes cluster, and exercising it using real world scenarios. They allow us to ensure all supported releases of Kubernetes work across cloud providers, container engines, and networking plugins. They make sure the latest releases of Kubernetes work with various optional features enabled, upgrade safely, meet performance requirements, and work across architectures.

With today’s announcement from CNCF – noting that Google Cloud has begun transferring ownership and management of the Kubernetes project’s cloud resources to CNCF community contributors, we are excited to embark on another journey. One that allows the project infrastructure to be owned and operated by the community of contributors, following the same open governance model that has worked for the rest of the project. Sound exciting to you? Come talk to us at #sig-testing on kubernetes.slack.com.

Want to find out more? Come check out these resources:

Source

What’s New in Navigator? – Jetstack Blog

18/Jan 2018

By Richard Wall

Navigator is a Kubernetes extension for managing distributed databases.
In this post we’ll tell you about all the improvements we’ve made since we unveiled it last year, including:
experimental support for Apache Cassandra clusters,
improved support for Elasticsearch clusters,
and a Helm chart for easy installation!
We’ll also give you an overview of the Navigator roadmap for 2018.

The Apache Cassandra database is a leading NoSQL database designed for scalability and high availability without compromising performance.
It’s an ideal candidate for running on a scalable, highly available, distributed platform such as Kubernetes.
For that reason it was the database chosen to showcase the potential of Kubernetes StatefulSets (or PetSet as it was known initially).
Since then a Cassandra example has been added to the official Kubernetes documentation but it is only an example; it is not sufficient if you want to run a Cassandra cluster on Kubernetes in production.

Enter Navigator!
Navigator now has experimental support the Apache Cassandra database.
Navigator codifies and automates many of the processes that would previously have been performed by a database administrator or SRE (Site Reliability Engineer) (the operator).
For example it will bootstrap a Cassandra cluster and create a load balancer for distributing CQL connections to all the cluster nodes.
It performs regular node health checks which means that if a node becomes unresponsive, that node will be automatically bypassed by the loadbalancer and eventually the node will be restarted.
Navigator can also scale up your Cassandra cluster, and it’s as simple as using Helm to increment the replicas field on the CassandraCluster API resource.
See the demo below.

This is what’s currently possible and our goal is to make it simple enough that any developer can request a secure, highly available, scalable Cassandra database and Navigator will take care of the rest, ensuring that:

  • there are adequate database nodes running to service the database clients,
  • that failed or unresponsive database nodes are restarted.
  • Database nodes are distributed across zones / data center.
  • Database seed nodes are established in each data center.
  • Database nodes are cleanly removed from the cluster before being removed or upgaded.
  • A quorum of database nodes is maintained.
  • Database state is backed.
  • Database state can be recovered in the event of a catastrophic failure.

Now for a demo.

Demo

In this short screen cast we demonstrate how to install Navigator and then install a Cassandra cluster, using Helm.
We also show how to examine the status and logs of Navigator components and of the Cassandra database nodes.
We demonstrate how to scale up the Cassandra database and then connect to the database using cqlsh to create a key space, a table and insert some data.

The commands used in the demo are available in the Navigator repository.

Elasticsearch was the first database supported by Navigator and we’ve made dozens of improvements in the last six months,
working closely with customers and the community.

Here are some examples:

Pilot Resources

The Navigator Pilot is a process which is injected into the container of the target database and becomes the entry point for the container.
So instead of starting the database process immediately, Kubernetes will actually start a /pilot process which first connects to the Navigator API to discover the desired database configuration.
It then configures and starts up an Elasticsearch sub-process.

We’ve introduced a new Pilot API resource.
This is a place where the controller can publish the desired configuration (spec) of each database node.
And it’s where a Pilot process can publish the actual state of its database sub-process (status).

The Navigator controller creates a Pilot resource for every pod under its control.

Sub-process Management

We’ve made many improvements to the Pilot to ensure that:

  • It cleanly starts the database sub-process.
  • It catches TERM signals and allows its database sub-process to be cleanly stopped.
  • It can reliably detect when its database sub-process has stopped.

Health Checks

And the Pilot now has a new REST endpoint (/healthz ) through which it responds to Kubernetes Readiness and Liveness probes.
While the database is running, the Pilot queries the Elasticsearch API to gather the current state of the database and publishes it via the /healthz endpoint.

Leader Election

Pilots now have a leader election mechanism.
This allows a single “leader” Pilot process to perform cluster-wide administrative functions.

Scale Down Safely

This is all groundwork which will allow us to safely scale down Elasticsearch clusters.

Navigator now runs in (and is tested in) Kubernetes environments where RBAC is enabled.
The Navigator API server, the Navigator controller, and the Pilots all run with the least necessary privilege.

And if you prefer, you can run a separate Navigator controller for each database namespace.
We’ve implemented a new Filtered Informer mechanism so that, in this mode, the Navigator controller will only be able to interact with API resources in a single namespace.

Navigator now has its own API server which is aggregated into the Kubernetes API.

The reason for this change was to overcome the limitations of CRDs (Custom Resource Definitions).
Most importantly it allows us to assign versions to our Navigator API resource types, as they evolve.
And it allows seamless conversion between different versions of the resources.

And while the Navigator architecture has become somewhat more complex, the installation of Navigator has been vastly simplified by the introduction of a Navigator Helm chart (see below).

Navigator now has a tried and tested Helm chart.

This allows you to install and configure the Navigator API server, and the Navigator Controller, and all the supporting Kubernetes resources, in a single fool-proof step.
We use that same Helm chart to install Navigator before running our end-to-end tests, so we (and you) can be confident that this installation mechanism is reliable.
The Helm chart is quite configurable and allows you to tweak the logging level of the Navigator components for example.

We have also been working on a suite of end-to-end tests for Navigator.

These verify that the Navigator API server and the Navigator controller can be installed, using Helm, as documented.
They verify that the Navigator API server is successfully aggregated in all the versions of Kubernetes we support.
That the Navigator controller starts an Elasticsearch or Cassandra database cluster matching the desired configuration in the API server.
And most importantly that the databases are actually usable after they have been launched.

We now use Kubernetes test infrastructure to run unit tests and end-to-end tests.

We initially tried running tests against Minikube on Travis CI and it totally worked!
But we soon encountered limitations. We needed to use the minikube start –bootstrapper=kubeadm option, in order to properly set up the Navigator API server aggregation; but that doesn’t work on the Ubuntu 14.04 operating system provided by Travis-CI.

Additionally some of our end-to-end tests were attempting to launch (and scale-up) multi-node Elasticsearch and Cassandra databases.
A single Travis-CI virtual machine just doesn’t cut the mustard.

So we’ve installed our own test infrastructure on Google Cloud and installed and tweaked the Kubernetes Test-Infra for our own purposes and it works great!

We’ll write more about this in a future blog post but for now take a look at: ‘Prow: Testing the way to Kubernetes Next’ and ‘Making Use of Kubernetes Test Infra Tools’, which give a great introduction to the Kubernetes Test-Infra tools.

The Navigator API is subject to change and the project is still in an alpha state so we ask that you do not use it in production, yet!
But we’re working flat out to add more features and to make Navigator as robust as possible.
Here are some of the features that we’re working on:

  • Scale Down: Safely scale down all supported databases
  • Database Upgrade: Rolling upgrades of all supported databases
  • Backup and Restore: Scheduled database backup and automated restore

So stay tuned!
And join us if you can at QCon London, in March 2018, where we plan to announce and demonstrate a new Navigator release.
Hope to see you there!

Source

From Cattle to K8s – Scheduling Workloads in Rancher 2.0

An important and complex aspect of container orchestration is scheduling the application containers. Appropriate placement of containers onto the shared infrastructure resources that are available is the key to achieve maximum performance at optimum compute resource usage.

Imgur

Cattle, which is the default orchestration engine for Rancher 1.6, provided various scheduling abilities to effectively place services, as documented here.

With the release of the 2.0 version based on Kubernetes, Rancher now utilizes native Kubernetes scheduling. In this article we will look at the scheduling methods available in Rancher 2.0 in comparison to Cattle’s scheduling support.

How to Migrate from Rancher 1.6 to Rancher 2.1 Online Meetup

Key terminology differences, implementing key elements, and transforming Compose to YAML

Watch the video

Node Scheduling

Based on the native Kubernetes behavior, by default, pods in a Rancher 2.0 workload will be spread across the nodes (hosts) that are schedulable and have enough free capacity. But just like the 1.6 version, Rancher 2.0 also facilitates:

  • Running all pods on a specific node.
  • Node scheduling using labels.

Here is how scheduling in the 1.6 UI looks. Rancher lets you either run all containers on a specific host, specify hard/soft host labels, or use affinity/anti-affinity rules while deploying services.

Imgur

And here is the equivalent node scheduling UI for Rancher 2.0 that provides the same features while deploying workloads.

Imgur

Rancher uses the underlying native Kubernetes constructs to specify node affinity/anti-affinity. Detailed documentation from Kubernetes is available here.

Let’s run through some examples that schedule workload pods using these node scheduling options, and then check how the Kubernetes YAML specs look like in comparison to the 1.6 Docker Compose config.

Example: Run All Pods on a Specific Node

While deploying a workload (navigate to your Cluster > Project > Workloads), it is possible to schedule all pods in your workload to a specific node.

Here I am deploying a workload of scale = 2 using the nginx image on a specific node.

Imgur

Rancher will choose that node if there is enough compute resource availability and no port conflicts if hostPort is used. If the workload exposes itself using a nodePort that conflicts with another workload, the deployment gets created successfully, but no nodePort service is created. Therefore, the workload doesn’t get exposed at all.

On the Workloads tab, you can list the workload group by node. I can see both of the pods for my Nginx workload are scheduled on the specified node:

Imgur

Now here is what this scheduling rule looks like in Kubernetes pod specs:

Imgur

Example: Host Label Affinity/Anti-Affinity

I added a label foo=bar to node1 in my Rancher 2.0 cluster to test the label-based node scheduling rules.

Imgur

Host Label Affinity: Hard

Here is how to specify a host label affinity rule in the Rancher 2.0 UI. A hard affinity rule means that the host chosen must satisfy all the scheduling rules. If no such host can be found, the workload will fail to deploy.

Imgur

In the PodSpec YAML, this rule translates to field nodeAffinity. Also note that I have included the Rancher 1.6 docker-compose.yml used to achieve the same scheduling behavior using labels.

Imgur

Host Label Affinity: Soft

If you are a Rancher 1.6 user, you know that a soft rule means that the scheduler should try to deploy the application per the rule, but can deploy even if the rule is not satisfied by any host. Here is how to specify this rule in Rancher 2.0 UI.

Imgur

The corresponding YAML specs for the pod are shown below.

Imgur

Host Label Anti-Affinity

Apart from the key = value host label matching rule, Kubernetes scheduling constructs also support the following operators:

Imgur

So to achieve anti-affinity, you can use the operators NotIn and DoesNotExist for the node label.

Support for Other 1.6 Scheduling Options

If you are a Cattle user, you will be familiar with a few other scheduling options available in Rancher 1.6:

If you are using these options in your Rancher 1.6 setups, it is possible to replicate them in Rancher 2.0 using native Kubernetes scheduling options. As of v2.0.8, there is no UI support for these options while deploying workloads, but you can always use them by importing the Kubernetes YAML specs on a Rancher cluster.

Schedule Using Container labels

This 1.6 option lets you schedule containers to a host where a container with a specific label is already present. To do this on Rancher 2.0, use Kubernetes inter-pod affinity and anti-affinity feature.

As noted in these docs, Kubernetes allows you to constrain which nodes your pod can be scheduled to based on pod labels rather than node labels.

One of the most-used scheduling features in 1.6 was anti-affinity to the service itself using labels on containers. To replicate this behavior in Rancher 2.0, we can use pod anti-affinity constructs in Kubernetes YAML specs. For example, consider a Nginx web workload. To ensure that pods in this workload do not land on the same host, you can use the podAntiAffinity construct as shown below. By specifying podAntiAffinity using labels, we ensure that each Nginx replica does not co-locate on a single node.

Imgur

Using Rancher CLI, this workload can be deployed onto the Kubernetes cluster. Note that the above deployment specifies three replicas, and I have three schedulable nodes in the Kubernetes cluster.

Imgur

Since podAntiAffinity is specified, the three pods end up on different nodes. To further check how podAntiAffinity applies, I can scale up the deployment to four pods. Notice that the fourth pod cannot get scheduled since the scheduler cannot find another node to satisfy the podAntiAffinity rule.

Imgur

Resource-Based Scheduling

While you are creating a service in Rancher 1.6, you can specify the memory reservation and mCPU reservation in the Security/Host tab in the UI. Cattle will schedule containers for the service onto hosts that have enough available compute resources.

In Rancher 2.0, you can specify the memory and CPU resources required by your workload pods using resources.requests.memory and resources.requests.cpu under the pod container specs. You can find more detail about these specs here.

When you specify these resource requests, the Kubernetes scheduler will assign the pod to a node with capacity.

Schedule Only Specific Services to Host

Rancher 1.6 has the ability to specify container labels on the host to only allow specific containers to be scheduled to it.

To achieve this in Rancher 2.0, use the equivalent Kubernetes feature of adding node taints (like host tags) and using tolerations in your pod specs.

Global Service

In Rancher 1.6, a global service is a service with a container deployed on every host in the environment.

If a service has the label io.rancher.scheduler.global: ‘true’, then the Rancher 1.6 scheduler will schedule a service container on each host in the environment. As mentioned in the documentation, if a new host is added to the environment, and the host fulfills the global service’s host requirements, the service will automatically be started on it by Rancher.

The sample below is an example of a global service in Rancher 1.6. Note that just placing the required label is sufficient to make a service global.

version: ‘2’
services:
global:
image: nginx
stdin_open: true
tty: true
labels:
io.rancher.container.pull_image: always
io.rancher.scheduler.global: ‘true’

How can we deploy a global service in Rancher 2.0 using Kubernetes?

For this purpose, Rancher deploys a Kubernetes DaemonSet object for the user’s workload. A DaemonSet functions exactly like the Rancher 1.6 global service. The Kubernetes scheduler will deploy a pod on each node of the cluster, and as new nodes are added, the scheduler will start new pods on them provided they match the scheduling requirements of the workload.

Additionally, in 2.0, you can also limit a DaemonSet to be deployed to nodes that have a specific label, as mentioned here.

Deploying a DaemonSet Using Rancher 2.0 UI

If you are a Rancher 1.6 user, to migrate your global service to Rancher 2.0 using the UI, navigate to your Cluster > Project > Workloads view. While deploying a workload, you can choose the following workload types:

Imgur

This is what the corresponding Kubernetes YAML specs look like for the above DaemonSet workload:

apiVersion: apps/v1beta2
kind: DaemonSet
metadata:
labels:
workload.user.cattle.io/workloadselector: daemonSet-default-globalapp
name: globalapp
namespace: default
spec:
selector:
matchLabels:
workload.user.cattle.io/workloadselector: daemonSet-default-globalapp
template:
metadata:
labels:
workload.user.cattle.io/workloadselector: daemonSet-default-globalapp
spec:
affinity: {}
containers:
– image: nginx
imagePullPolicy: Always
name: globalapp
resources: {}
stdin: true
tty: true
restartPolicy: Always

Docker Compose to Kubernetes YAML

To migrate a Rancher 1.6 global service to Rancher 2.0 using its Compose config, follow these steps.

You can convert the docker-compose.yml file from Rancher 1.6 to Kubernetes YAML using the Kompose tool, and then deploy the application using either the Kubectl client tool or Rancher CLI in the Kubernetes cluster.

Consider the docker-compose.yml specs mentioned above where the Nginx service is a global service. This is how it can be converted to Kubernetes YAML using Kompose:

Imgur

Now configure the Rancher CLI against your Kubernetes Cluster and deploy the generated *-daemonset.yaml file.

Imgur

As shown above, my Kubernetes cluster has two worker nodes where workloads can be scheduled, and deploying the global-daemonset.yaml started two pods for the Daemonset, one on each node.

Conclusion

In this article, we reviewed how the various scheduling functionalities of Rancher 1.6 can be migrated to Rancher 2.0. Most of the scheduling techniques have equivalent options available in Rancher 2.0, or they can be achieved via native Kubernetes constructs.

In the upcoming article, I will explore a bit about how service discovery support in Cattle can be replicated in a Rancher 2.0 setup – stay tuned!

Prachi Damle

Prachi Damle

Principal Software Engineer

Source

2018 Steering Committee Election Cycle Kicks Off

2018 Steering Committee Election Cycle Kicks Off

Author: Paris Pittman (Google), Jorge Castro (Heptio), Ihor Dvoretskyi (CNCF)

Having a clear, definable governance model is crucial for the health of open source projects. For one of the highest velocity projects in the open source world, governance is critical especially for one as large and active as Kubernetes, which is one of the most high-velocity projects in the open source world. A clear structure helps users trust that the project will be nurtured and progress forward. Initially, this structure was laid by the former 7 member bootstrap committee composed of founders and senior contributors with a goal to create the foundational governance building blocks.

The initial charter and establishment of an election process to seat a full Steering Committee was a part of those first building blocks. Last year, the bootstrap committee kicked off the first Kubernetes Steering Committee election which brought forth 6 new members from the community as voted on by contributors. These new members plus the bootstrap committee formed the Steering Committee that we know today. This yearly election cycle will continue to ensure that new representatives get cycled through to add different voices and thoughts on the Kubernetes project strategy.

The committee has worked hard on topics that will streamline the project and how we operate. SIG (Special Interest Group) governance was an overarching recurring theme this year: Kubernetes community is not a monolithic organization, but a huge, distributed community, where Special Interest Groups (SIGs) and Working Groups (WGs) are the atomic community units, that are making Kubernetes so successful from the ground.

Contributors – this is where you come in.

There are three seats up for election this year. The voters guide will get you up to speed on the specifics of this years election including candidate bios as they are updated in real time. The elections process doc will steer you towards eligibility, operations, and the fine print.

1) Nominate yourself, someone else, and/or put your support to others.

Want to help chart our course? Interested in governance and community topics? Add your name! The nomination process is optional.

2) Vote.

On September 19th, eligible voters will receive an email poll invite conducted by CIVS. The newly elected will be announced at the weekly community meeting on Thursday, October 4th at 5pm UTC.

To those who are running:

Helpful resources

  • Steering Committee – who sits on the committee and terms, their projects and meetings info
  • Steering Committee Charter – this is a great read if you’re interested in running (or assessing for the best candidates!)
  • Election Process
  • Voters Guide! – Updated on a rolling basis. This guide will always have the latest information throughout the election cycle. The complete schedule of events and candidate bios will be housed here.

Source

The Kubernetes Market in 2018 // Jetstack Blog

25/Jan 2018

By Matt Barker

Not long ago, I overheard the Jetstack team chatting about recent changes in the market and the increasingly widespread adoption of Kubernetes. Only when I reflected to write this did I realise that we have been saying the same thing every few months for the past year.

group

Indeed, the Kubernetes market shows no sign of slowing down. Jetstack alone has tripled in size as we scale to cater to demand, KubeCon has gone from a couple of hundred in a small room to 4000 in a vast conference centre, and recent announcements have seen millions of dollars pour into the space as companies like Cisco and VMWare announce strategic investments.

And all of this is for good reason. We regularly see customers make huge reductions in cloud spend (75% in some cases), and vast improvements in project delivery times (up to 10x faster). I’m personally looking forward to hearing more of these stories as Kubernetes permeates the market in 2018.

As we are in the early stages of what promises to be another exciting year for Kubernetes, I thought I would take a moment to reflect on some of the major themes I have seen develop whilst running a professional service devoted to the project.

training

1.) Kubernetes has won the container management war

Defensive internal battles have been fought and lost. Amazon announced EKS, Docker now supports Kubernetes natively, and an ecosystem has been forged which will only grow stronger in magnitude as the ISVs and vendors start to build for ‘Kubernetes first.’

2.) Amongst early adopters, the conversation is changing from build to operations

In recent months, queries for Production Readiness Reviews and Advanced Operations Training have exploded. It may only be for smaller services, or as a pilot project, but teams are fast gearing up for what it takes to run Kubernetes in production.

Jetstack has worked towards updating our services to cater to this demand, and we have been developing training and operational playbooks as part of a subscription. This is the first step towards a suite of products and services that will help teams trying to get up to speed on Kubernetes.

We thank companies like Monzo for their openness in sharing what can go wrong in production, and how you can try to avoid situations like this.

3.) Multi-cloud is the ultimate aim for Jetstack customers

There’s no doubt that for certain customers the ease of GKE is a no-brainer, and for others, buying into a ready-made container platform like OpenShift is the best way to unlock Kubernetes value. However, for the vast majority of Jetstack customers, the ultimate goal is working with upstream Kubernetes in the most consistent way they can across multiple environments. The drivers for this are varied, but the major reasons we see include:

  • Ensuring their teams properly understand the components of Kubernetes and don’t hide it via a service. The big concern is that their operators are able to cope with production issues effectively.
  • Making the most of existing on-prem environments.
  • Regulatory reasons (mainly seen in banks, and used as a way to reduce risk of reliance on one cloud environment).
  • Fear of being locked into a cloud service.
  • OEM deployments for target customers with varying requirements.

Whilst working with a number of clients, notably CompareTheMarket, Jetstack was able to open source Tarmak, which is a suite of tools developed to give customers a cloud-agnostic way of delivering best-practice Kubernetes clusters. Read our Introducing Tarmak blog post for an in-depth discussion of its current features and design.

laptop

4.) Stateful services are still a thorny issue

In a perfect world, customers would run their stateful apps alongside stateless apps in Kubernetes, for deployment and management consistency. Sadly, for many people the complexities of running distributed systems within Kubernetes means they are often kept outside of the cluster.

Jetstack is working closely with a number of companies to build on Kubernetes and its machinery to provide databases in-cluster. We’ve integrated Elasticsearch for a European telco, Couchbase with Amadeus, and we are now actively working on Cassandra. If you’re interested in containerised database-as-a-service, take a look at Navigator. State will certainly be a part of the Kubernetes conversation this year.

5.) IT decision makers look further up the stack towards applications

Jetstack has started to receive its first queries around service mesh and serverless functionality on Kubernetes. We are delighted to be kicking off our first large-scale Istio project this month, and are closely analysing platforms like OpenFaaS and Kubeless.

Conclusion

Whether you’re part of the Kubernetes community or not, one thing is for certain: Kubernetes is now impossible to ignore.

So no matter if you’re a technical architect, a CIO, or a software vendor, it’s time to get involved, and become a part of the movement to automate your IT infrastructure.

barker

Matt Barker and the Jetstack team.

This year, Jetstack is looking to expand into Europe: If you want to work on a variety of exciting Kubernetes projects, get in touch: hello@jetstack.io.

Source

Gartner Dubs Heptio a “Cool Vendor” – Heptio

Some exciting news to share ‒ Gartner has named us a Cool Vendor in Cloud Infrastructure 2018*. The designation for 2018 went to four companies behind a key innovation enabling cloud native infrastructure. We’re honored to be recognized in the report — you can read the report to see what Gartner has to say about Heptio, Kubernetes and more.

When Joe and I started Heptio 19 months ago, we were committed to keeping Kubernetes open. We have always believed it is the best way to help companies move from legacy to cloud native, and to avoid vendor lock-in to any one platform or distribution. We certainly thought open by design was the “cool” path to take and each customer win, each collaboration with AWS and Azure, and now this newest accolade, has validated our thesis.

As we continue growing, we intend to join the ranks of companies that have risen from coolness to greatness by leveraging our unique strengths:

Community focus. We are committed to helping customers unlock the full potential of upstream Kubernetes and avoid the lock-in of legacy distributions. Our engineers and evangelists use projects, webinars, conference sessions and Heptio branded workshops to bring their industry leading knowledge to the larger open source community.

Business model innovation. Many companies have tried to monetize open source and most have struggled. We are innovating the business model itself with HKS, a combination of products, services and support that make it easier for organizations to deploy and manage Kubernetes so that they can develop software far more efficiently.

Culture. We understand that to build innovative products for the long-term, we have to build an exceptional culture. That’s why we prioritize diversity in hiring and equity in compensation. As our headcount ticks up, we use weekly all-company emails, bi-weekly all-hands, semi-annual team off-sites and team building events to create an uncommon degree of transparency.

We think that’s pretty cool. To the next milestone!

*Gartner, Cool Vendors in Cloud Infrastructure, Philip Dawson, Arun Chandrasekaran, Julia Palmer, 9 May 2018.

Disclaimer: Gartner does not endorse any vendor, product or service depicted in research publications, and does not advise technology users to select only those vendors with the highest ratings. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.

Source

Deep Dive Into Kubernetes Networking in Azure

We started building our azure-operator in the fall of 2017. One of the challenges that we faced was the networking architecture. We evaluated multiple possible architectures and finally chose the one that was best by many parameters. We hope this post will help people setting up their own Azure clusters with decent networking. First let’s look at the available options for Kubernetes networking in Azure.

Calico with BGP

The first option was to use default Calico with BGP. We are using this option in our on-prem and AWS clusters. On Azure we faced some known limitations: IPIP tunnel traffic and unknown IP traffic is not allowed. Some additional information can be found here.

Azure Container Network

The second option was to use the Azure Container Networking (ACN) CNI plugin. ACN uses Azure network resources (network interfaces, routing tables, etc.) to provide networking connectivity for containers. Native Azure networking support was really promising, but after spending some time we were not able to run it successfully (issue). We did our evaluation in November 2017, at the moment (May 2018) many things have changed and ACN is used by the Azure official installer now.

Calico Policy-Only + Flannel

The third and most trivial option was to use overlay networking. Canal (Calico policy-only + Flannel) provides us with the benefits of network policies and overlay networking is universal meaning it can run anywhere. This is a very good option. The only drawback is the VxLAN overlay networking, which as with any other overlay has a performance penalty.

Finally the best option was to use a mix of Calico in policy-only mode and Kubernetes native (Azure cloud provider) support of Azure user-defined routes.

This approach has multiple benefits:

  • Native Kubernetes network policies backed by Calico (both ingress and egress).
  • Network performance (no overlay networking).
  • Easy configuration and no operational overhead.

The only difference from default Calico is that BGP networking is disabled. Instead we are enabling node CIDR allocation in Kubernetes. This is briefly described in the official Calico guide for Azure. Responsibility for routing pod traffic is lying with the Azure route table instead of BGP (used by default in Calico). This is shown on the diagram below.

To better understand the services configuration, let’s look at the next diagram below.

  • k8s-controller-manager allocates IP blocks for nodes and makes sure to create routes in Azure route table.
  • calico-node is responsible for applying network policies. Under the hood it does this by creating iptables rules locally on every node.
  • kubelet is responsible for calling Calico CNI scripts. CNI scripts setup the container virtual network interfaces (veth pair in this case) and local IP configuration.

The following checklist can help with configuring your cluster to use Azure route tables with Calico network policies.

  1. Azure route table created and mapped to subnet(s) with VMs. See how to create an Azure route table here.
  2. Kubernetes controller manager has –allocate-node-cidrs set to true and a proper subnet in the –cluster-cidr parameter. The subnet should be a part of Azure virtual network and will be used by pods. For example, for virtual network 10.0.0.0/16 and VMs subnet 10.0.0.0/24 we can pick up the second half of the virtual network, which is 10.0.128.0/17. By default Kubernetes allocates /24 subnet per node which is equal to 256 pods per node. This controlled by –node-cidr-mask-size flag.
  3. Azure cloud provider config has routeTableName parameter set to the name of the routing table. In general, having a properly configured cloud provider config in Azure is very important, but this is out of scope of this post. All available options for Azure cloud provider can be found here.
  4. Calico is installed in policy-only mode with the Kubernetes datastore backend. This means Calico doesn’t need etcd access, all data stored in Kubernetes. See the official manual for details.

This post has shown some of the Kubernetes networking options in Azure and why we chose the solution we’re using. If you’re running Kubernetes on Azure or thinking about it you can read more about our production ready Kubernetes on Azure. If you’re interested in learning more about how Giant Swarm can run your Kubernetes on Azure, get in touch.

Source

Introduction to Vitess: Using Vitess on Kubernetes

Expert Training in Kubernetes and Rancher

Join our free online training sessions to learn how to manage Kubernetes workloads with Rancher.

Sign up here

A Vitess overview

Vitess is a database solution for deploying, scaling and managing large clusters of MySQL instances. It’s architected to run as effectively in a public or private cloud architecture as it does on dedicated hardware. It combines and extends many important MySQL features with the scalability of a NoSQL database.

In a simple scenario with Relational Databases and MySQL in particular, you would have a master instance, then a couple of replicas. You would then direct read-only queries to the replicas and write queries (and possible critical read queries) to the master instance. Similar to the setup in the image below:

MySQL default Setup

This all works fine, until you hit a limit:
– when your hardware doesn’t cope with the load and you need a more expensive hardware;
– you need to manually shard the database;
– cluster management becomes a growing pain;
– query monitoring is increasingly difficult.

These are the same problems engineers at youtube dealt with, and they came up with Vitess to solve these problems. Fortunately for us, they made it open source, so everyone can benefit from it.

Now you’ll see a lot of features of Vitess, that will seem like a list of buzzwords, but in this case, they are actually true:

  • Performance
    • Connection pooling – Multiplex front-end application queries onto a pool of MySQL connections to optimize performance.
    • Query de-duping – Reuse results of an in-flight query for any identical requests received while the in-flight query was still executing.
    • Transaction manager – Limit number of concurrent transactions and manage deadlines to optimize overall throughput.
  • Protection
    • Query rewriting and sanitization – Add limits and avoid non-deterministic updates.
    • Query killer – Terminate queries that take too long to return data.
    • Table ACLs – Specify access control lists (ACLs) for tables based on the connected user.
  • Monitoring
    • Performance analysis: Tools let you monitor, diagnose, and analyze your database performance.
    • Query streaming – Use a list of incoming queries to serve OLAP workloads.
    • Update stream – A server streams the list of rows changing in the database, which can be used as a mechanism to propagate changes to other data stores.
  • Topology Management Tools
    • Master management tools (handles reparenting)
    • Web-based management GUI
    • Designed to work in multiple data centers / regions
  • Sharding
    • Virtually seamless dynamic re-sharding
    • Vertical and Horizontal sharding support
    • Multiple sharding schemes, with the ability to plug-in custom ones

It can be clearly seen that, at scale, the above features become incredibly valuable and necessary.

Now, let’s try it out 🙂

Vitess has come a long way and now it is relatively easy to deploy a Vitess cluster with Kubernetes.

The plan is as follows:

  • deploy a Rancher 2.0 instance and use it for every deploy step in this article;
  • using Rancher 2.0, create a Kubernetes cluster;
  • deploy Vitess on the above Kubernetes cluster;
  • create a database on Vitess and query it;
  • deploy a test application in the same Kubernetes cluster, that will use Vitess;

You should be able to create all the resources needed for the outlined steps with a free trial from Google Cloud Platform. A trial from GCP has a limit of 8 cores per project, keep that in mind when deploying your resources.

Deploy a Rancher 2.0 instance and start a Kubernetes Cluster

There’s already a good guide on how to deploy a Rancher 2.0 instance here, thus let’s not get into many details here. One note – if you want to run this guide on Google Cloud Platform trial, then create a separate project for the Rancher instace, thus saving yourself a CPU core for the Kubernetes cluster (remember the 8 cores limit per project for the GCP trial?)

Now onto starting a Kubernetes Cluster. For this, there’s again a pretty good guide here. One note – Vitess documentation recommends a cluster of 5 n1-standard-4 instances, that will get you over the GCP trial limit. But, if you create a 4 nodes of n1-standard-2 type, this will get you exactly to the 8 cores limit per project for GCP trial and it’s enough for the outlined tasks. There will be some changes to the Vitess scripts, to deploy less resources, so you can fit needed resources into the smaller cluster.

The Create Cluster part from Rancher should look like in the below image:

Rancher Kubernetes Cluster

Start a Vitess control pod

Note the name of the cluster vitess-demo, it will be referenced further. Once you have your cluster created and active the first thing to do is to create a control instance, where you will have all the necessary tools for this Vitess demo. To do that, go to your newly created vitess-demo cluster, onto to the Default project and click on Deploy. Fill in there the details:

  • set the Name to vitess-control;
  • set the docker image to golang;
  • leave everything else as it is.

It should look like in the below image:
Vitess Control Deploy

Click confidently on Launch!

In a moment you should have your vitess-control pod running. It will be used as a control plane for your Vitess cluster and thanks to Rancher, you will be able to access it from anywhere you have an internet access and a browser.

Ok, let’s get to installing the control part. To do that, click on the right hand side menu of vitess-control pod and select Execute shell, just like in the screenshot below:

Vitess Control Execute Shell

Build, install vtctlclient and configure settings

This will bring you the go command line that we need. Let’s start typing commands there. First build and install vtctlclient by running:

go get vitess.io/vitess/go/cmd/vtctlclient

When successfull, this command doesn’t provide any output and it will download and build Vitess source code at this location: $GOPATH/src/vitess.io/vitess/, it will also copy the built vtctlclient binary to $GOPATH/bin.

Now let’s navigate to the Vitess source code directory and configure site-local settings with the following commands:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./configure.sh

It should look as below:
Vitess begin steps

At the moment Vitess has out-of-the-box support for backups in Google Cloud Storage. If you want, you can create a bucket in Google Cloud to test the backup in this demo, but it’s not very important for the demo purposes.

There will also be a need to have kubectl configured on the vitess-control instance, to interact directly with the cluster. To do this, run the following commands on the vitess-control command line:

curl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod +x ./kubectl
mv ./kubectl /usr/local/bin/kubectl

Now you need the kubeconfig file, that you can get from Rancher. You can get that from the vitess-demo default dashboard. On the upper right corner you should see Launch kubectl and Kubeconfig file. Click on Kubeconfig file and you’ll be presented with the contents of the kubeconfig file. On the bottom you will have the Copy to Clipboard option, click on that. Now you need to paste it into ~/.kube/config file inside vitess-control instance. To do that, execute shell in vitess-control instance and issue the below commands there:

apt-get update
apt-get install -y vim
vim ~/.kube/config

Once you opened the file with vim, paste the kubeconfig file inside that file (don’t forget to enter insert mode in vim, by just pressing the I key, before pasting the contents).

Confirm that you have kubectl working properly, by issuing the following command:

You should see the only pod running, the exact one from which you’re running the commands – vitess-control

Start Vitess etcd cluster

The Vitess topology service stores coordination data for all the servers in a Vitess cluster. It can store this data in one of several consistent storage systems. In this example, we’ll use etcd. This will require a separate etcd cluster, from the Kubernetes one.

To easily create an etcd cluster, let’s install etcd-operator, using Rancher’s Catalog Apps. Go to Catalog Apps, click on Launch, search for etcd-operator. Once you’ve found etcd-operator, click on View Details and from the provided options, change only the namespace where etcd cluster will be deployed, because Vitess’ etcd-cluster will be deployed in the default namespace and etcd-operator has to be in the same namespace. It should look like this:

etcd-operator default namespace

Click on Lauch and wait a couple of moments for the etcd-operator to deploy. Verify that etcd-operator is running, by checking the Workloads tab and inside it, check the Service Discovery tab. If you see there etcd-operator as a workload and etcd-cluster with etcd-cluster-client as services, then everything is fine.

Now let’s start etcd cluster for Vitess. If you’re running this demo from a Google Cloud Platform trial account, then at this step you need to change the number of replicas started by the etcd cluster. To do that go to the Vitess control pod, Execute Shell and run the following command:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
sed -i ‘s/ETCD_REPLICAS:-3/ETCD_REPLICAS:-1/g’ etcd-up.sh
./etcd-up.sh

### example output
# Creating etcd service for ‘global’ cell…
# etcdcluster.etcd.database.coreos.com/etcd-global created
# Creating etcd service for ‘test’ cell…
# etcdcluster.etcd.database.coreos.com/etcd-test created

Check that pods are running:

# kubectl get pod
NAME READY STATUS RESTARTS AGE
etcd-cluster-4ftlwdxbrp 1/1 Running 0 3m
etcd-cluster-bjjsrfkm4x 1/1 Running 0 3m
etcd-cluster-cxqkghhjt4 1/1 Running 0 2m
etcd-global-qbnvtfd66b 0/1 PodInitializing 0 10s
etcd-operator-etcd-operator-etcd-operator-7d467bd7fb-kpkvw 1/1 Running 0 3m
etcd-test-x4pn5jw99s 0/1 Running 0 10s
vitess-control-fdb84cbc4-7l2mp 1/1 Running 0 1h
#

Start vtctld

This one will be used to accept RPC commands from vtctlclient to modify the Vitess cluster. To install it, you’ll also need your vitess-control command line and issue there the following commands:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./vtctld-up.sh

### example output
# Creating vtctld ClusterIP service…
# service/vtctld created
# Creating vtctld replicationcontroller…
# replicationcontroller/vtctld created

# To access vtctld web UI, start kubectl proxy in another terminal:
# kubectl proxy –port=8001
# Then visit http://localhost:8001/api/v1/proxy/namespaces/default/services/vtctld:web/

Try vtctlclient to send commands to vtctld

At this point you can run vtctlclient from the vitess-control pod, to issue commands to the vtctld service on your Kubernetes cluster.

To enable RPC access into the Kubernetes cluster, you’ll need to use kubectl to set up an authenticated tunnel.

Since the tunnel needs to target a particular vtctld pod name, guys at Vitess provided the kvtctl.sh script, which uses kubectl to discover the pod name and set up the tunnel before running vtctlclient. To test this, let’s run ./kvtctl.sh help from where we left, this will test your connection to vtctld and list the vtctlclient commands that you can use to administer the Vitess cluster.

./kvtctl.sh help

### example output
# Starting port forwarding to vtctld…
# Available commands:
#
# Tablets:
# InitTablet [-allow_update] [-allow_different_shard] [-allow_master_override] [-parent] [-db_name_override=<db name>] [-hostname=<hostname>] [-mysql_port=<port>] [-port=<port>] [-grpc_port=<port>] -keyspace=<keyspace> -shard=<shard> <tablet alias> <tablet type>

Setup Vitess cell in the topology (Vitess’s etcd cluster)

The global etcd cluster is configured from command-line parameters, specified in the Kubernetes configuration files. The per-cell etcd cluster however needs to be configured, so it is reachable by Vitess. Run the following commands from vitess-control pod to set it up:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./kvtctl.sh AddCellInfo –root /test -server_address http://etcd-test-client:2379 test

If successful you should only see a INFO message that tells you that your command is proxied to vtctld.

Start a Vitess tablet

Now starting with the interesting parts. A Vitess tablet is the unit of scaling for the database. A tablet consists of the vttablet and mysqld processes, running on the same host. We enforce this coupling in Kubernetes by putting the respective containers for vttablet and mysqld inside a single pod.

To start a vttablet, guys from Vitess provided a script that will launch 5 vttablet pods. This can be configured, and if you want to fit into the CPU limit provided by GCP trial, you need to change those values. How to do that, will be presented below.

Now go to vitess-control command line and run the following commands:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
### run below commands only if you want to change the vttable count from 5 to 3
### to fit into the GCP trial CPU limit
sed -i ‘s/TABLETS_PER_SHARD:-5/TABLETS_PER_SHARD:-3/g’ vttablet-up.sh
sed -i ‘s/RDONLY_COUNT:-2/RDONLY_COUNT:-1/g’ vttablet-up.sh
### end of vttablet count change
./vttablet-up.sh

### example output
# Creating test_keyspace.shard-0 pods in cell test…
# Creating pod for tablet test-0000000100…
# pod/vttablet-100 created
# Creating pod for tablet test-0000000101…
# pod/vttablet-101 created
# Creating pod for tablet test-0000000102…
# pod/vttablet-102 created

At this point you should see tablet pods appearing in Rancher’s Workloads tab, just like shown below:

vttablets pods

You can also check the the status of the tablets from the vitess-control command line:

./kvtctl.sh ListAllTablets test
### example output
# Starting port forwarding to vtctld…
# test-0000000100 test_keyspace 0 replica 10.12.0.8:15002 10.12.0.8:3306 []
# test-0000000101 test_keyspace 0 replica 10.12.3.11:15002 10.12.3.11:3306 []
# test-0000000102 test_keyspace 0 rdonly 10.12.4.8:15002 10.12.4.8:3306 []

Initialize a MySQL Database

Once all tablets are up and running, it’s a good time to initialise a underlying MySQL database.

Take note that many vtctlclient commands produce no output on success, so the saying “No news, is good news” is valid here too.

First, designate one of the tablets to be the initial master. Vitess will automatically connect the other slaves’ mysqld instances so that they start replicating from the master’s mysqld. This is also when the default database is created. Since our keyspace is named test_keyspace, the MySQL database will be named vt_test_keyspace. As you can see from the ListAllTablets output, there are 2 replica and one rdonly tablets. Let’s designate the first tablet test-0000000100 as master (again, running from vitess-control command line):

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./kvtctl.sh InitShardMaster -force test_keyspace/0 test-0000000100

### example output
# Starting port forwarding to vtctld…
# W0817 22:29:32.961277 2530 main.go:60] W0817 22:29:32.958629 reparent.go:181] master-elect tablet test-0000000100 is not the shard master, proceeding anyway as -force was used
# W0817 22:29:32.961988 2530 main.go:60] W0817 22:29:32.959145 reparent.go:187] master-elect tablet test-0000000100 is not a master in the shard, proceeding anyway as -force was used

Since this is the first time the shard has been started, the tablets are not doing any replication yet, and there is no existing master. That is why the InitShardMaster command above uses the -force flag to bypass the usual sanity checks that would apply if this wasn’t a brand new shard.

Now you should be able to see one master, one replica and one rdonly (if you didn’t change the counts in the scripts, you should see several replicas and rdonly pods):

./kvtctl.sh ListAllTablets test

### example output
# Starting port forwarding to vtctld…
# test-0000000100 test_keyspace 0 master 10.12.0.8:15002 10.12.0.8:3306 []
# test-0000000101 test_keyspace 0 replica 10.12.3.11:15002 10.12.3.11:3306 []
# test-0000000102 test_keyspace 0 rdonly 10.12.4.8:15002 10.12.4.8:3306 []

The replica tablets are used for serving live web traffic, while the rdonly tablets are used for offline processing, such as batch jobs and backups.

Create a table inside Vitess Cluster

Now it’s starting to get shape and we can apply a schema to the databases. For this we can use the vtctlclient tool that can apply the database schema across all tablets in a keyspace. There’s already a provided create_test_table.sql file from vitess on your vitess-control pod, let’s apply it:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./kvtctl.sh ApplySchema -sql “$(cat create_test_table.sql)” test_keyspace

The SQL that was applied is the following:

CREATE TABLE messages (
page BIGINT(20) UNSIGNED,
time_created_ns BIGINT(20) UNSIGNED,
message VARCHAR(10000),
PRIMARY KEY (page, time_created_ns)
) ENGINE=InnoDB

And it can be viewed like this:

./kvtctl.sh GetSchema test-0000000100

### example output
# Starting port forwarding to vtctld…
# {
# “database_schema”: “CREATE DATABASE /*!32312 IF NOT EXISTS*/ {{.DatabaseName}} /*!40100 DEFAULT CHARACTER SET utf8 */”,
# “table_definitions”: [
# {
# “name”: “messages”,
# “schema”: “CREATE TABLE `messages` (n `page` bigint(20) unsigned NOT NULL,n `time_created_ns` bigint(20) unsigned NOT NULL,n `message` varchar(10000) DEFAULT NULL,n PRIMARY KEY (`page`,`time_created_ns`)n) ENGINE=InnoDB DEFAULT CHARSET=utf8”,
# “columns”: [
# “page”,
# “time_created_ns”,
# “message”
# ],
# “primary_key_columns”: [
# “page”,
# “time_created_ns”
# ],
# “type”: “BASE TABLE”,
# “data_length”: “16384”,
# “row_count”: “0”
# }
# ],
# “version”: “5b2e5dbcb5766b6c69fe55c81b6ea805”
# }

Initialise Vitess routing schema

The example above we have just a database, with no specific configuration and we need to make that (empty) configuration visible for serving with the following command:

./kvtctl.sh RebuildVSchemaGraph

As this command doesn’t produce any output on success, you’re good if you don’t receive any messages.

Start Vitess routing proxy – vtgate

Vitess uses vtgate to route each client query to the correct vttablet. In Kubernetes, a vtgate service distributes connections to a pool of vtgate pods. There’s again a script ready to start vtgate and it can also be changed to start a lower number of pods, to fit the GCP trial limit. Now go to vitess-control command line and run the following commands:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
### run below command only if you want to change the vtgate count from 3 to 1
### to fit into the GCP trial CPU limit
sed -i ‘s/VTGATE_REPLICAS:-3/VTGATE_REPLICAS:-1/g’ vtgate-up.sh
### end of vttablet count change

./vtgate-up.sh

### example output
# Creating vtgate service in cell test…
# service/vtgate-test created
# Creating vtgate replicationcontroller in cell test…
# replicationcontroller/vtgate-test created

After a couple of moments, you should see vtgate-test Workload in Rancher’s dashboard.

Test your cluster with an App

Now comes the most awaited moment – testing your Vitess cluster with an App. Guys from Vitess provided that too, that is also already present on your vitess-control pod. It is a GuestBook example that uses Vitess as a backend database. Apply it from the vitess-control pod:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./guestbook-up.sh

### example output
# Creating guestbook service…
# service/guestbook created
# Creating guestbook replicationcontroller…
# replicationcontroller/guestbook created

At this moment you should be able to access the GuestBook app in your browser. You need to get the public IP address of it. There are at least 2 ways to do that:
– first one is to view it in Rancher, by going to Rancher’s workloads, then select Load Balancer, there you should see the GuestBook load balancer. From it’s right hand side menu, select View in API and then in the new opened page, search for PublicEndpoints -> Addresses, there you should have the public IP address of your GuestBook app.
– second one is to get it from the vitess-control command line, by running the following command:

“`c
kubectl get service guestbook -o=jsonpath='{.status.loadBalancer.ingress[].ip}’

### example output
# 35.197.201.4
“`

Now you can access the GuestBook by pointing your browser to that IP address. It should look like this:

Guestbook demo

If you click on random page, it will generate a random page and insert it into the above created schema inside Vitess. Do that a couple of times and also post some messages to those pages, so we can view them afterwards in Vitess.

You can see Vitess’ replication capabilities by opening the app in multiple browser windows, with the same Guestbook page number. Each new entry is committed to the master database. In the meantime, JavaScript on the page continuously polls the app server to retrieve a list of GuestBook entries. The app serves read-only requests by querying Vitess in ‘replica’ mode, confirming that replication is working.

Now let’s see how it looks inside Vitess, go to vitess-control command line and issue the following commands:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./kvtctl.sh ExecuteFetchAsDba test-0000000100 “SELECT * FROM messages”

### example output
# Starting port forwarding to vtctld…
# +——+———————+———–+
# | page | time_created_ns | message |
# +——+———————+———–+
# | 0 | 1534547955692433152 | Hello |
# | 0 | 1534547959825954816 | World |
# | 65 | 1534547943325885184 | And |
# | 65 | 1534547947058206208 | again |
# +——+———————+———–+

Next steps

Now as you’ve seen what Vitess can do on the surface, the next step would be to try Vitess resharding or dynamic resharding, just note that for these you’ll need more resources than those provided by Google Cloud Platform trial limits.

Tear down the app and the clusters

To clean up all the demo pods and services you’ve created, there are also scripts provided for that, that you can run from the vitess-control pod:

cd $GOPATH/src/vitess.io/vitess/examples/kubernetes
./guestbook-down.sh
./vtgate-down.sh
./vttablet-down.sh
./vtctld-down.sh
./etcd-down.sh

The last cleaning steps would be to delete the cluster from Rancher and then terminate the Rancher instance from the Google Cloud Platform console.

Some final thoughts

Vitess is a very powerful and very useful MySQL backend. Although it requires a little bit of effort to understand and properly set up, it offers a huge advantage for the future management of massive MySQL databases and it will scale easily. If Vitess is the default MySQL backend for giants like YouTube, it certainly has it’s merits.

Rancher helped a lot in this demo to have everything in one place, I only had to have one Rancher instance to be able to do all the work, just by accessing it in a browser. The control instance was there, and all cluster details were there, just at a click distance.

Roman Doroschevici

Roman Doroschevici

github

Source