A Journey from Cattle to Kubernetes!

How to Migrate from Rancher 1.6 to Rancher 2.1 Online Meetup

Key terminology differences, implementing key elements, and transforming Compose to YAML

Watch the video

The past few years I have been developing and enhancing Cattle, which is the default container orchestration and scheduling framework for Rancher 1.6.

Cattle is used extensively by Rancher users to create and manage applications based on Docker containers. One of the key reasons for its extensive adoption is its compatibility with standard Docker Compose syntax.

With the release of Rancher 2.0, we shifted from Cattle as the base orchestration platform to Kubernetes. Kubernetes introduces its own terminologies and yaml specs for deploying services and pods that differs from the Docker Compose syntax.

I must say it really is a big learning curve for Cattle developers like me and our users to find ways to migrate apps to the Kubernetes-based 2.0 platform.

In this blog series, we will explore how various features supported using Cattle in Rancher 1.6 can be mapped to their Kubernetes equivalents in Rancher 2.0.

Who Moved My Stack? 🙂

In Rancher 1.6, you could easily deploy services running Docker images in one of two ways: using either the Rancher UI or the Rancher Compose Tool, which extends the popular Docker Compose.

With Rancher 2.0, we’ve introduced new grouping boundaries and terminologies to align with Kubernetes. So what happens to your Cattle-based environments and stacks in a 2.0 environment? How can a Cattle user transition their stacks and services to Rancher 2.0?

To solve this problem, lets identify parallels between the two versions.

Some of the key terms around application deployment in 1.6 are:

  • Container: The smallest deployment unit. Containers are a lightweight, stand-alone, executable package of software that includes everything required to run it. (https://www.docker.com/what-container)
  • Service: A group of one or more containers running an identical Docker image.
  • Stack: Services that belong to an application can be grouped together under a stack, which bundles your applications into logical groups.
  • Compose config: Rancher allows users to view/export config files for the entire stack. These files, named docker_compose.yml and rancher_compose.yml, include all services and can be used to replicate the same application stack from a different Rancher setup.

Equivalent key terms for Rancher 2.0 are below. You can find more information about them in the Rancher 2.0 Documentation.

  • Pod: In Kubernetes, a pod is the smallest unit of deployment. A pod consist of one or more containers running a specific image. Pods are roughly equivalent to containers in 1.6. An application service consists of one or more running pods. If a Rancher 1.6 service has sidekicks, the pod equivalent would have more than one container, one container launched per sidekick.
  • Workload: The term service used in 1.6 maps to the term workload in 2.0. A workload object defines the specs and deployment rules for a set of pods that comprise the application. However, unlike services in 1.6, workloads are divided into different categories. The workload category most similar to a stateless service from 1.6 is the deployment category.
  • Namespace: The term stack from 1.6 maps to the Kubernetes concept of a namespace in 2.0. After launching a Kubernetes cluster in Rancher 2.0, workloads are deployed to the default namespace, unless you explicitly define a namespace yourself. This functionality is similar to the default stack in 1.6.
  • Kubernetes YAML: This file type is similar to a Docker Compose file. It specifies Kubernetes objects in YAML format. Just as the Docker Compose tool can digest Compose files to deploy specific container services, kubectl is the cli tool that processes Kubernetes YAML as input, which is then used to provision Kubernetes objects. For more information, see the Kubernetes Documentation.

How Do I Move a Simple Application from Rancher 1.6 to 2.0?

After learning the parallels between Cattle and Kubernetes, I began investigating options for transitioning a simple application from Rancher 1.6 to 2.0.

For this exercise, I used the LetsChat app, which is formed from a couple of services. I deployed these services to a stack in 1.6 using Cattle. Here is the docker-compose.yml file for the services in my stack:

Imgur

Along with provisioning the service containers, Cattle facilitates service discovery between the services in my stack. This service discovery allows the LetsChat service talk to the Mongo service.

Is provisioning and configuring service discovery in Rancher 2.0 as easy as it was in 1.6?

A Cluster and Project on Rancher 2.0

First, I needed to create a Rancher 2.0 Kubernetes cluster. You can find instructions for this process in our Quick Start Guide.

In Rancher 1.6, I’m used to deploying my stacks within a Cattle Environment that has some compute resources assigned.

After inspecting the UI in Rancher 2.0, I recognized that workloads are deployed in a project within the Kubernetes Cluster that I created. It seems that a 2.0 Cluster and a Project together are equivalent to a Cattle environment from 1.6!

Imgur

However, there are some important differences to note:

  • In 1.6, Cattle environments have a set of compute nodes assigned to them, and the Rancher Server is the global control plane backed by mysql DB, which provides storage for each environment. In 2.0, each Kubernetes cluster has its own set of compute nodes, nodes running the cluster control plane, and nodes running etcd for storage.
  • In 1.6, all Cattle environment users could access any host in the environment. In Rancher 2.0, this access model has changed. You can now restrict users to specific projects. This model allows for multi-tenancy since hosts are owned by the cluster, and the cluster can be further divided into multiple projects where users can manage their apps.

Deploying Workloads from Rancher 2.0 UI

With my new Kubernetes cluster in place, I was set to launch my applications the 2.0 way!

I navigated to the Default project under my cluster. From the Workloads tab, I launched a deployment for the LetsChat and Mongo Docker images.

Imgur

For my LetsChat deployment, I exposed container port 8080 by selecting the HostPort option for port mapping. Then I entered my public port 9890 as the listening port.

I selected HostPort because Kubernetes exposes the specified port for each host that the workload (and its pods) are deployed to. This behavior is similar to exposing a public port on Cattle.

While Rancher provisioned the deployments, I monitored the status from the Workloads view. I could drill down to the deployed Kubernetes pods and monitor the logs. This experience was very similar to launching services using Cattle and drilling down to the service containers!

Imgur
Once the workloads were provisioned, Rancher provided a convenient link to the public endpoint of my LetsChat app. Upon clicking the link, voilĂĄ!

Imgur

Docker Compose to Kubernetes Yaml

If you’re migrating multiple application stacks from Rancher 1.6 to 2.0, manually migrating by UI is not ideal. Instead, use a Docker Compose config file to speed things up.

If you are a Rancher 1.6 user, you’re probably familiar with launching services by calling a Compose file from Rancher CLI. Similarly, Rancher 2.0 provides a CLI to launch the Kubernetes resources.

So our next step is to convert our docker-compose.yml file to the kubernetes yaml specs and use CLI.

Converting my Compose file to the Kubernetes YAML specs manually didn’t inspire confidence. I’m unfamiliar with Kubernetes YAML, and it’s confusing compared to the simplicity of Docker Compose. A quick Google search led me to this conversion tool—Kompose.

Imgur

Kompose generated two files per service in the docker-compose.yml:

  • a deployment YAML
  • a service YAML

Why is a separate service spec required?

A Kubernetes service is a REST object that abstracts access to the pods in the workload. A service provides a static endpoint to the pods. Therefore, even if the pods change IP address, the public endpoint remains unchanged. A service object points to its corresponding deployment (workload) by using selector labels.

When a service in Docker Compose exposes public ports, Kompose translates that to a service YAML spec for Kubernetes, along with a deployment YAML spec.

Lets see how the compose and Kubernetes YAML specs compare:

Imgur

As highlighted above, everything under the chat service in docker-compose.yml is mapped to spec.containers in the Kubernetes chat-deployment.yaml file.

  • The service name in docker-compose.yml is placed under spec.containers.name
  • image in docker-compose.yml maps to spec.containers.image
  • ports in docker-compose.yml maps to spec.containers.ports.containerPort
  • Any Labels present in docker-compose.yml are placed as metadata.annotations

Note that the separate chat-service.yaml file contains the public port mapping of the deployment, and it points to the deployment using a selector io.kompose.service: chat), which is a label on the chat-deployment object.

To deploy these files to my cluster namespace, I downloaded and configured the Rancher CLI tool.

The workloads launched fine, but…

Imgur

There was no public endpoint placed for the chat workload. After some troubleshooting, I noticed that the generated file from Kompose was missing the HostPort spec in the chat-deployment.yaml file! I manually added the missing spec and re-imported the yaml to publicly expose the LetsChat workload.

Imgur

Troubleshoot successful! I could access the application on the Host-IP:HostPort.

Imgur

Finished

There you have it! Rancher users can successfully port their application stacks from 1.6 to 2.0 using either the UI or Compose-to-Kubernetes YAML conversion.

Although the complexity of Kubernetes is still apparent, with the help of Rancher 2.0, I found the provisioning flow just as simple and intuitive as Cattle.

This article looked at the bare minimum flow of transitioning simple services from Cattle to Rancher 2.0. However there are more challenges you’ll face when migrating to Rancher 2.0: you’ll need to understand the changes in Rancher 2.0 around scheduling, load balancing, service discovery, and service monitoring. Let’s dig deeper in upcoming articles!

In the next article, we will explore various options for exposing a workload publicly via port mapping options on Kubernetes.

Prachi Damle

Prachi Damle

Principal Software Engineer

Source

The History of Kubernetes, the Community Behind It

Authors: Brendan Burns (Distinguished Engineer, Microsoft)

oscon award

It is remarkable to me to return to Portland and OSCON to stand on stage with members of the Kubernetes community and accept this award for Most Impactful Open Source Project. It was scarcely three years ago, that on this very same stage we declared Kubernetes 1.0 and the project was added to the newly formed Cloud Native Computing Foundation.

To think about how far we have come in that short period of time and to see the ways in which this project has shaped the cloud computing landscape is nothing short of amazing. The success is a testament to the power and contributions of this amazing open source community. And the daily passion and quality contributions of our endlessly engaged, world-wide community is nothing short of humbling.

Congratulations @kubernetesio for winning the “most impact” award at #OSCON I’m so proud to be a part of this amazing community! @CloudNativeFdn pic.twitter.com/5sRUYyefAK

— Jaice Singer DuMars (@jaydumars) July 19, 2018

👏 congrats @kubernetesio community on winning the #oscon Most Impact Award, we are proud of you! pic.twitter.com/5ezDphi6J6

— CNCF (@CloudNativeFdn) July 19, 2018

At a meetup in Portland this week, I had a chance to tell the story of Kubernetes’ past, its present and some thoughts about its future, so I thought I would write down some pieces of what I said for those of you who couldn’t be there in person.

It all began in the fall of 2013, with three of us: Craig McLuckie, Joe Beda and I were working on public cloud infrastructure. If you cast your mind back to the world of cloud in 2013, it was a vastly different place than it is today. Imperative bash scripts were only just starting to give way to declarative configuration of IaaS with systems. Netflix was popularizing the idea of immutable infrastructure but doing it with heavy-weight full VM images. The notion of orchestration, and certainly container orchestration existed in a few internet scale companies, but not in cloud and certainly not in the enterprise.

Docker changed all of that. By popularizing a lightweight container runtime and providing a simple way to package, distributed and deploy applications onto a machine, the Docker tooling and experience popularized a brand-new cloud native approach to application packaging and maintenance. Were it not for Docker’s shifting of the cloud developer’s perspective, Kubernetes simply would not exist.

I think that it was Joe who first suggested that we look at Docker in the summer of 2013, when Craig, Joe and I were all thinking about how we could bring a cloud native application experience to a broader audience. And for all three of us, the implications of this new tool were immediately obvious. We knew it was a critical component in the development of cloud native infrastructure.

But as we thought about it, it was equally obvious that Docker, with its focus on a single machine, was not the complete solution. While Docker was great at building and packaging individual containers and running them on individual machines, there was a clear need for an orchestrator that could deploy and manage large numbers of containers across a fleet of machines.

As we thought about it some more, it became increasingly obvious to Joe, Craig and I, that not only was such an orchestrator necessary, it was also inevitable, and it was equally inevitable that this orchestrator would be open source. This realization crystallized for us in the late fall of 2013, and thus began the rapid development of first a prototype, and then the system that would eventually become known as Kubernetes. As 2013 turned into 2014 we were lucky to be joined by some incredibly talented developers including Ville Aikas, Tim Hockin, Dawn Chen, Brian Grant and Daniel Smith.

Happy to see k8s team members winning the “most impact” award. #oscon pic.twitter.com/D6mSIiDvsU

— Bridget Kromhout (@bridgetkromhout) July 19, 2018

Kubernetes won the O’Reilly Most Impact Award. Thanks to our contributors and users! pic.twitter.com/T6Co1wpsAh

— Brian Grant (@bgrant0607) July 19, 2018

The initial goal of this small team was to develop a “minimally viable orchestrator.” From experience we knew that the basic feature set for such an orchestrator was:

  • Replication to deploy multiple instances of an application
  • Load balancing and service discovery to route traffic to these replicated containers
  • Basic health checking and repair to ensure a self-healing system
  • Scheduling to group many machines into a single pool and distribute work to them

Along the way, we also spent a significant chunk of our time convincing executive leadership that open sourcing this project was a good idea. I’m endlessly grateful to Craig for writing numerous whitepapers and to Eric Brewer, for the early and vocal support that he lent us to ensure that Kubernetes could see the light of day.

In June of 2014 when Kubernetes was released to the world, the list above was the sum total of its basic feature set. As an early stage open source community, we then spent a year building, expanding, polishing and fixing this initial minimally viable orchestrator into the product that we released as a 1.0 in OSCON in 2015. We were very lucky to be joined early on by the very capable OpenShift team which lent significant engineering and real world enterprise expertise to the project. Without their perspective and contributions, I don’t think we would be standing here today.

Three years later, the Kubernetes community has grown exponentially, and Kubernetes has become synonymous with cloud native container orchestration. There are more than 1700 people who have contributed to Kubernetes, there are more than 500 Kubernetes meetups worldwide and more than 42000 users have joined the #kubernetes-dev channel. What’s more, the community that we have built works successfully across geographic, language and corporate boundaries. It is a truly open, engaged and collaborative community, and in-and-of-itself and amazing achievement. Many thanks to everyone who has helped make it what it is today. Kubernetes is a commodity in the public cloud because of you.

But if Kubernetes is a commodity, then what is the future? Certainly, there are an endless array of tweaks, adjustments and improvements to the core codebase that will occupy us for years to come, but the true future of Kubernetes are the applications and experiences that are being built on top of this new, ubiquitous platform.

Kubernetes has dramatically reduced the complexity to build new developer experiences, and a myriad of new experiences have been developed or are in the works that provide simplified or targeted developer experiences like Functions-as-a-Service, on top of core Kubernetes-as-a-Service.

The Kubernetes cluster itself is being extended with custom resource definitions (which I first described to Kelsey Hightower on a walk from OSCON to a nearby restaurant in 2015), these new resources allow cluster operators to enable new plugin functionality that extend and enhance the APIs that their users have access to.

By embedding core functionality like logging and monitoring in the cluster itself and enabling developers to take advantage of such services simply by deploying their application into the cluster, Kubernetes has reduced the learning necessary for developers to build scalable reliable applications.

Finally, Kubernetes has provided a new, common vocabulary for expressing the patterns and paradigms of distributed system development. This common vocabulary means that we can more easily describe and discuss the common ways in which our distributed systems are built, and furthermore we can build standardized, re-usable implementations of such systems. The net effect of this is the development of higher quality, reliable distributed systems, more quickly.

It’s truly amazing to see how far Kubernetes has come, from a rough idea in the minds of three people in Seattle to a phenomenon that has redirected the way we think about cloud native development across the world. It has been an amazing journey, but what’s truly amazing to me, is that I think we’re only just now scratching the surface of the impact that Kubernetes will have. Thank you to everyone who has enabled us to get this far, and thanks to everyone who will take us further.

Brendan

Source

Configuring Horizontal Pod Autoscaling on Running Services on Kubernetes.

Take a deep dive into Best Practices in Kubernetes Networking

From overlay networking and SSL to ingress controllers and network security policies, we’ve seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.

Watch the video

Introduction

One of the nicer features of Kubernetes is the ability to code and configure autoscale on your running services. Without autoscaling, it’s difficult to accomodate deployment scaling and meet SLAs. This feature is called Horizontal Pod Autoscaler (HPA) on Kubernetes clusters.

Why use HPA

Using HPA, you can achieve up/down autoscaling in your deployments, based on resource use and/or custom metrics, and to accomodate deployments scale to realtime load of your services.

HPA produces two direct improvements to your services:

  1. Use compute and memory resources when needed, releasing them if not required.
  2. Increase/decrease performance as needed to accomplish SLA’s.

How HPA works

HPA automatically scales the number of pods (defined minimum and maximum number of pods) in a replication controller, deployment or replica set, based on observed CPU/memory utilization (resource metrics) or based on custom metrics provided by third-party metrics application like Prometheus, Datadog, etc. HPA is implemented as a control loop, with a period controlled by the Kubernetes controller manager –horizontal-pod-autoscaler-sync-period flag (default value 30s).

HPA schema

HPA definition

HPA is an API resource in the Kubernetes autoscaling API group. The current stable version is autoscaling/v1, which only includes support for CPU autoscaling. To get additional support for scaling on memory and custom metrics, the Beta vesion should be used autoscaling/v2beta1.

Read more info about the HPA API object.

HPA is supported in a standard way by kubectl. It can be created, managed and deleted using kubectl:

  • Creating HPA
    • With manifest: kubectl create -f <HPA_MANIFEST>
    • Without manifest (Just support CPU): kubectl autoscale deployment hello-world –min=2 –max=5 –cpu-percent=50
  • Getting hpa info
    • Basic: kubectl get hpa hello-world
    • Detailed description: kubectl describe hpa hello-world
  • Deleting hpa
    • kubectl delete hpa hello-world

Here’s a HPA manifest definition example:

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hello-world
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: hello-world
minReplicas: 1
maxReplicas: 10
metrics:
– type: Resource
resource:
name: cpu
targetAverageUtilization: 50
– type: Resource
resource:
name: memory
targetAverageValue: 100Mi

  • Using autoscaling/v2beta1 version to use cpu and memory metrics
  • Controlling autoscale of hello-world deployment
  • Defined minimum number of replicas of 1
  • Defined maximum number of replicas of 10
  • Scaling up when:
    • cpu use is more that 50%
    • Memory use more than 100Mi

Installation

Before HPA can be used in your Kubernetes cluster, some elements have to be installed and configured in your system.

Requirements

Be sure that your Kubernetes cluster services are running at least with these flags:
– kube-api: requestheader-client-ca-file
– kubelet: read-only-port at 10255
– kube-controller: Optional, just needed if distinct values than default are required.
– horizontal-pod-autoscaler-downscale-delay: “5m0s”
– horizontal-pod-autoscaler-upscale-delay: “3m0s”
– horizontal-pod-autoscaler-sync-period: “30s”

For RKE, Kubernetes cluster definition, be sure you add these lines at the services section. To do it in the Rancher v2.0.X UI, open “Cluster options” – “Edit as YAML” and add these definitions:

services:

kube-api:
extra_args:
requestheader-client-ca-file: “/etc/kubernetes/ssl/kube-ca.pem”
kube-controller:
extra_args:
horizontal-pod-autoscaler-downscale-delay: “5m0s”
horizontal-pod-autoscaler-upscale-delay: “1m0s”
horizontal-pod-autoscaler-sync-period: “30s”
kubelet:
extra_args:
read-only-port: 10255

In order to deploy metrics services, you must have your Kubernetes cluster configured and deployed properly.

Note: For deploy and test examples, Rancher v2.0.6 and k8s v1.10.1 cluster are being used.

Resource metrics

If HPA wants to use resource metrics, package metrics-server is needed at kube-system namespace of Kubernetes cluster.

To accomplish this, follow these steps:

  1. Configure kubectl to connect to the proper Kubernetes cluster.
  2. Clone Github metrics-server repo:git clone https://github.com/kubernetes-incubator/metrics-server
  3. Install metrics-server package (assuming that Kubernetes is up to version 1.8):kubectl create -f metrics-server/deploy/1.8+/
  4. Check that metrics-server is running properly. Check service pod and logs at namespace kube-system

    # kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE

    metrics-server-6fbfb84cdd-t2fk9 1/1 Running 0 8h

    # kubectl -n kube-system logs metrics-server-6fbfb84cdd-t2fk9
    I0723 08:09:56.193136 1 heapster.go:71] /metrics-server –source=kubernetes.summary_api:”
    I0723 08:09:56.193574 1 heapster.go:72] Metrics Server version v0.2.1
    I0723 08:09:56.194480 1 configs.go:61] Using Kubernetes client with master “https://10.43.0.1:443” and version
    I0723 08:09:56.194501 1 configs.go:62] Using kubelet port 10255
    I0723 08:09:56.198612 1 heapster.go:128] Starting with Metric Sink
    I0723 08:09:56.780114 1 serving.go:308] Generated self-signed cert (apiserver.local.config/certificates/apiserver.crt, apiserver.local.config/certificates/apiserver.key)
    I0723 08:09:57.391518 1 heapster.go:101] Starting Heapster API server…
    [restful] 2018/07/23 08:09:57 log.go:33: [restful/swagger] listing is available at https:///swaggerapi
    [restful] 2018/07/23 08:09:57 log.go:33: [restful/swagger] https:///swaggerui/ is mapped to folder /swagger-ui/
    I0723 08:09:57.394080 1 serve.go:85] Serving securely on 0.0.0.0:443

  5. Check that the metrics API is accesible from kubectl:
    • If you are accessing directly to the Kubernetes cluster, use the server URL at kubectl config like ‘https://:6443’

      # kubectl get –raw /apis/metrics.k8s.io/v1beta1
      {“kind”:”APIResourceList”,”apiVersion”:”v1″,”groupVersion”:”metrics.k8s.io/v1beta1″,”resources”:[{“name”:”nodes”,”singularName”:””,”namespaced”:false,”kind”:”NodeMetrics”,”verbs”:[“get”,”list”]},{“name”:”pods”,”singularName”:””,”namespaced”:true,”kind”:”PodMetrics”,”verbs”:[“get”,”list”]}]}

    • If you are accessing the Kubernetes cluster through Rancher, server URL at kubectl config like this: https://<RANCHER_URL>/k8s/clusters/<CLUSTER_ID> You also need to add prefix /k8s/clusters/<CLUSTER_ID> to the API path.

      # kubectl get –raw /k8s/clusters/<CLUSTER_ID>/apis/metrics.k8s.io/v1beta1
      {“kind”:”APIResourceList”,”apiVersion”:”v1″,”groupVersion”:”metrics.k8s.io/v1beta1″,”resources”:[{“name”:”nodes”,”singularName”:””,”namespaced”:false,”kind”:”NodeMetrics”,”verbs”:[“get”,”list”]},{“name”:”pods”,”singularName”:””,”namespaced”:true,”kind”:”PodMetrics”,”verbs”:[“get”,”list”]}]}

Custom Metrics (Prometheus)

Custom metrics could be provided with many third-party applications as the source. We are going to use Prometheus for our demonstration. We are assuming that Prometheus is deployed on your Kubernetes cluster, getting proper metrics from pods, nodes, namespaces,…. We’ll use the Prometehus url, http://prometheus.mycompany.io exposed at port 80.

Prometheus is available for deployment in the Rancher v2.0 catalog. Deploy it from the Rancher catalog if it isn’t alrady running on your Kubernetes cluster.

If HPA wants to use custom metrics from Prometheus, package k8s-prometheus-adapter is needed at kube-system namespace on the Kubernetes cluster. Just to facilitate k8s-prometheus-adapter installation, we are going to use the Helm chart available at banzai-charts

To use this chart, follow these steps:

  1. Init helm at k8s cluster:

    kubectl -n kube-system create serviceaccount tiller
    kubectl create clusterrolebinding tiller –clusterrole cluster-admin –serviceaccount=kube-system:tiller
    helm init –service-account tiller

  2. Clone the Github banzai-charts repo:

    git clone https://github.com/banzaicloud/banzai-charts

  3. Install prometheus-adapter char specifying Prometheus URL and port:

    helm install –name prometheus-adapter banzai-charts/prometheus-adapter –set prometheus.url=”http://prometheus.mycompany.io”,prometheus.port=”80″ –namespace kube-system

  4. Check that prometheus-adapter is running properly. Check service pod and logs at namespace kube-system

    # kubectl get pods -n kube-system
    NAME READY STATUS RESTARTS AGE

    prometheus-adapter-prometheus-adapter-568674d97f-hbzfx 1/1 Running 0 7h

    # kubectl logs prometheus-adapter-prometheus-adapter-568674d97f-hbzfx -n kube-system

    I0724 10:18:45.696679 1 round_trippers.go:436] GET https://10.43.0.1:443/api/v1/namespaces/default/pods?labelSelector=app%3Dhello-world 200 OK in 2 milliseconds
    I0724 10:18:45.696695 1 round_trippers.go:442] Response Headers:
    I0724 10:18:45.696699 1 round_trippers.go:445] Date: Tue, 24 Jul 2018 10:18:45 GMT
    I0724 10:18:45.696703 1 round_trippers.go:445] Content-Type: application/json
    I0724 10:18:45.696706 1 round_trippers.go:445] Content-Length: 2581
    I0724 10:18:45.696766 1 request.go:836] Response Body: {“kind”:”PodList”,”apiVersion”:”v1″,”metadata”:{“selfLink”:”/api/v1/namespaces/default/pods”,”resourceVersion”:”6237″},”items”:[{“metadata”:{“name”:”hello-world-54764dfbf8-q6l82″,”generateName”:”hello-world-54764dfbf8-“,”namespace”:”default”,”selfLink”:”/api/v1/namespaces/default/pods/hello-world-54764dfbf8-q6l82″,”uid”:”484cb929-8f29-11e8-99d2-067cac34e79c”,”resourceVersion”:”4066″,”creationTimestamp”:”2018-07-24T10:06:50Z”,”labels”:{“app”:”hello-world”,”pod-template-hash”:”1032089694″},”annotations”:{“cni.projectcalico.org/podIP”:”10.42.0.7/32″},”ownerReferences”:[{“apiVersion”:”extensions/v1beta1″,”kind”:”ReplicaSet”,”name”:”hello-world-54764dfbf8″,”uid”:”4849b9b1-8f29-11e8-99d2-067cac34e79c”,”controller”:true,”blockOwnerDeletion”:true}]},”spec”:{“volumes”:[{“name”:”default-token-ncvts”,”secret”:{“secretName”:”default-token-ncvts”,”defaultMode”:420}}],”containers”:[{“name”:”hello-world”,”image”:”rancher/hello-world”,”ports”:[{“containerPort”:80,”protocol”:”TCP”}],”resources”:{“requests”:{“cpu”:”500m”,”memory”:”64Mi”}},”volumeMounts”:[{“name”:”default-token-ncvts”,”readOnly”:true,”mountPath”:”/var/run/secrets/kubernetes.io/serviceaccount”}],”terminationMessagePath”:”/dev/termination-log”,”terminationMessagePolicy”:”File”,”imagePullPolicy”:”Always”}],”restartPolicy”:”Always”,”terminationGracePeriodSeconds”:30,”dnsPolicy”:”ClusterFirst”,”serviceAccountName”:”default”,”serviceAccount”:”default”,”nodeName”:”34.220.18.140″,”securityContext”:{},”schedulerName”:”default-scheduler”,”tolerations”:[{“key”:”node.kubernetes.io/not-ready”,”operator”:”Exists”,”effect”:”NoExecute”,”tolerationSeconds”:300},{“key”:”node.kubernetes.io/unreachable”,”operator”:”Exists”,”effect”:”NoExecute”,”tolerationSeconds”:300}]},”status”:{“phase”:”Running”,”conditions”:[{“type”:”Initialized”,”status”:”True”,”lastProbeTime”:null,”lastTransitionTime”:”2018-07-24T10:06:50Z”},{“type”:”Ready”,”status”:”True”,”lastProbeTime”:null,”lastTransitionTime”:”2018-07-24T10:06:54Z”},{“type”:”PodScheduled”,”status”:”True”,”lastProbeTime”:null,”lastTransitionTime”:”2018-07-24T10:06:50Z”}],”hostIP”:”34.220.18.140″,”podIP”:”10.42.0.7″,”startTime”:”2018-07-24T10:06:50Z”,”containerStatuses”:[{“name”:”hello-world”,”state”:{“running”:{“startedAt”:”2018-07-24T10:06:54Z”}},”lastState”:{},”ready”:true,”restartCount”:0,”image”:”rancher/hello-world:latest”,”imageID”:”docker-pullable://rancher/[email protected]:4b1559cb4b57ca36fa2b313a3c7dde774801aa3a2047930d94e11a45168bc053″,”containerID”:”docker://cce4df5fc0408f03d4adf82c90de222f64c302bf7a04be1c82d584ec31530773″}],”qosClass”:”Burstable”}}]}
    I0724 10:18:45.699525 1 api.go:74] GET http://prometheus-server.prometheus.34.220.18.140.xip.io/api/v1/query?query=sum%28rate%28container_fs_read_seconds_total%7Bpod_name%3D%22hello-world-54764dfbf8-q6l82%22%2Ccontainer_name%21%3D%22POD%22%2Cnamespace%3D%22default%22%7D%5B5m%5D%29%29+by+%28pod_name%29&time=1532427525.697 200 OK
    I0724 10:18:45.699620 1 api.go:93] Response Body: {“status”:”success”,”data”:{“resultType”:”vector”,”result”:[{“metric”:{“pod_name”:”hello-world-54764dfbf8-q6l82″},”value”:[1532427525.697,”0″]}]}}
    I0724 10:18:45.699939 1 wrap.go:42] GET /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/fs_read?labelSelector=app%3Dhello-world: (12.431262ms) 200 [[kube-controller-manager/v1.10.1 (linux/amd64) kubernetes/d4ab475/system:serviceaccount:kube-system:horizontal-pod-autoscaler] 10.42.0.0:24268]
    I0724 10:18:51.727845 1 request.go:836] Request Body: {“kind”:”SubjectAccessReview”,”apiVersion”:”authorization.k8s.io/v1beta1″,”metadata”:{“creationTimestamp”:null},”spec”:{“nonResourceAttributes”:{“path”:”/”,”verb”:”get”},”user”:”system:anonymous”,”group”:[“system:unauthenticated”]},”status”:{“allowed”:false}}

  5. Check that metrics API is accesible from kubectl:
    • Accessing directly to the Kubernetes cluster, server URL at kubectl config, such as https://<K8s_URL>:6443

      # kubectl get –raw /apis/custom.metrics.k8s.io/v1beta1
      {“kind”:”APIResourceList”,”apiVersion”:”v1″,”groupVersion”:”custom.metrics.k8s.io/v1beta1″,”resources”:[{“name”:”pods/fs_usage_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_rss”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_cpu_period”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_cfs_throttled”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_io_time”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_read”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_sector_writes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_user”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/last_seen”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/tasks_state”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_cpu_quota”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/start_time_seconds”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_write”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_cache”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_usage_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_cfs_periods”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_cfs_throttled_periods”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_reads_merged”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_working_set_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/network_udp_usage”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_inodes_free”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_inodes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_io_time_weighted”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_failures”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_swap”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_cpu_shares”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_memory_swap_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_usage”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_io_current”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_writes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_failcnt”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_reads”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_writes_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_writes_merged”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/network_tcp_usage”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_max_usage_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_memory_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_memory_reservation_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_load_average_10s”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_system”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_reads_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_sector_reads”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]}]}

  • Accessing the Kubernetes cluster through Rancher, server URL at kubectl config like https://<RANCHER_URL>/k8s/clusters/<CLUSTER_ID> You need to add prefix /k8s/clusters/<CLUSTER_ID>

# kubectl get –raw /k8s/clusters/<CLUSTER_ID>/apis/custom.metrics.k8s.io/v1beta1
{“kind”:”APIResourceList”,”apiVersion”:”v1″,”groupVersion”:”custom.metrics.k8s.io/v1beta1″,”resources”:[{“name”:”pods/fs_usage_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_rss”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_cpu_period”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_cfs_throttled”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_io_time”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_read”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_sector_writes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_user”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/last_seen”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/tasks_state”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_cpu_quota”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/start_time_seconds”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_write”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_cache”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_usage_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_cfs_periods”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_cfs_throttled_periods”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_reads_merged”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_working_set_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/network_udp_usage”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_inodes_free”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_inodes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_io_time_weighted”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_failures”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_swap”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_cpu_shares”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_memory_swap_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_usage”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_io_current”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_writes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_failcnt”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_reads”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_writes_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_writes_merged”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/network_tcp_usage”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/memory_max_usage_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_memory_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/spec_memory_reservation_limit_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_load_average_10s”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/cpu_system”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_reads_bytes”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]},{“name”:”pods/fs_sector_reads”,”singularName”:””,”namespaced”:true,”kind”:”MetricValueList”,”verbs”:[“get”]}]}

ClusterRole and ClusterRoleBinding

By default, HPA will try to read metrics (resource and custom) with user system:anonymous. This user is needed to define view-resource-metrics and view-custom-metrics ClusterRole and ClusterRoleBindings assigning them to system:anonymous to open read access to the metrics.

To accomplish this, follow these steps:

  1. Configure kubectl to connect proper k8s cluster.
  2. Copy ClusterRole and ClusterRoleBinding manifest for:
    • resource metrics: ApiGroups metrics.k8s.io
      “`
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
      name: view-resource-metrics
      rules:
    • apiGroups:
      • metrics.k8s.io
        resources:
      • pods
      • nodes
        verbs:
      • get
      • list
      • watch
        —
        apiVersion: rbac.authorization.k8s.io/v1
        kind: ClusterRoleBinding
        metadata:
        name: view-resource-metrics
        roleRef:
        apiGroup: rbac.authorization.k8s.io
        kind: ClusterRole
        name: view-resource-metrics
        subjects:
    • apiGroup: rbac.authorization.k8s.io
      kind: User
      name: system:anonymous
      “`
    • custom metrics: ApiGroups custom.metrics.k8s.io

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: view-custom-metrics
rules:
– apiGroups:
– custom.metrics.k8s.io
resources:
– “*”
verbs:
– get
– list
– watch

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: view-custom-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: view-custom-metrics
subjects:
– apiGroup: rbac.authorization.k8s.io
kind: User
name: system:anonymous

  1. Create them at your Kubernetes cluster (if you want to use custom metrics):

    # kubectl create -f <RESOURCE_METRICS_MANIFEST>
    # kubectl create -f <CUSTOM_METRICS_MANIFEST>

Service deployment

For HPA to work properly, service deployments should have resources request definition for containers.

Lets see a hello-world example for testing if HPA is working well.

To do this, follow these steps:
1. Configure kubectl to connect proper k8s cluster.
2. Copy hello-world deployment manifest.

“`
apiVersion: apps/v1beta2
kind: Deployment
metadata:
labels:
app: hello-world
name: hello-world
namespace: default
spec:
replicas: 1
selector:
matchLabels:
app: hello-world
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
labels:
app: hello-world
spec:
containers:
– image: rancher/hello-world
imagePullPolicy: Always
name: hello-world
resources:
requests:
cpu: 500m
memory: 64Mi
ports:
– containerPort: 80
protocol: TCP
restartPolicy: Always

apiVersion: v1
kind: Service
metadata:
name: hello-world
namespace: default
spec:
ports:
– port: 80
protocol: TCP
targetPort: 80
selector:
app: hello-world
“`

  1. Deploy it at k8s cluster

    # kubectl create -f <HELLO_WORLD_MANIFEST>

  2. Copy HPA for resource or custom metrics:
  3. resource metrics
    “`
    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    metadata:
    name: hello-world
    namespace: default
    spec:
    scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: hello-world
    minReplicas: 1
    maxReplicas: 10
    metrics:

    • type: Resource
      resource:
      name: cpu
      targetAverageUtilization: 50
    • type: Resource
      resource:
      name: memory
      targetAverageValue: 1000Mi
      “`
    • custom metrics (same as resource but adding custom cpu_system metric)

    apiVersion: autoscaling/v2beta1
    kind: HorizontalPodAutoscaler
    metadata:
    name: hello-world
    namespace: default
    spec:
    scaleTargetRef:
    apiVersion: extensions/v1beta1
    kind: Deployment
    name: hello-world
    minReplicas: 1
    maxReplicas: 10
    metrics:
    – type: Resource
    resource:
    name: cpu
    targetAverageUtilization: 50
    – type: Resource
    resource:
    name: memory
    targetAverageValue: 100Mi
    – type: Pods
    pods:
    metricName: cpu_system
    targetAverageValue: 20m

  4. To get HPA info and description and check that resource metrics data are shown:
    • resource metrics

    # kubectl get hpa
    NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
    hello-world Deployment/hello-world 1253376 / 100Mi, 0% / 50% 1 10 1 6m
    # kubectl describe hpa
    Name: hello-world
    Namespace: default
    Labels: <none>
    Annotations: <none>
    CreationTimestamp: Mon, 23 Jul 2018 20:21:16 +0200
    Reference: Deployment/hello-world
    Metrics: ( current / target )
    resource memory on pods: 1253376 / 100Mi
    resource cpu on pods (as a percentage of request): 0% (0) / 50%
    Min replicas: 1
    Max replicas: 10
    Conditions:
    Type Status Reason Message
    —- —— —— ——-
    AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale
    ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
    ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
    Events: <none>

    • custom metrics

    kubectl describe hpa
    Name: hello-world
    Namespace: default
    Labels: <none>
    Annotations: <none>
    CreationTimestamp: Tue, 24 Jul 2018 18:36:28 +0200
    Reference: Deployment/hello-world
    Metrics: ( current / target )
    resource memory on pods: 3514368 / 100Mi
    “cpu_system” on pods: 0 / 20m
    resource cpu on pods (as a percentage of request): 0% (0) / 50%
    Min replicas: 1
    Max replicas: 10
    Conditions:
    Type Status Reason Message
    —- —— —— ——-
    AbleToScale True ReadyForNewScale the last scale time was sufficiently old as to warrant a new scale
    ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
    ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
    Events: <none>

  5. Generating load for the service to test up and down autoscalation. Any tool could be used at this point, but we’ve used https://github.com/rakyll/hey to generate http requests to our hello-world service, and observe if autoscaling is working propwrly.
  6. Observing autoscale up and down
    • Resource metrics
    • Autoscale up to 2 pods when cpu usage is up to target:

      # kubectl describe hpa
      Name: hello-world
      Namespace: default
      Labels: <none>
      Annotations: <none>
      CreationTimestamp: Mon, 23 Jul 2018 22:22:04 +0200
      Reference: Deployment/hello-world
      Metrics: ( current / target )
      resource memory on pods: 10928128 / 100Mi
      resource cpu on pods (as a percentage of request): 56% (280m) / 50%
      Min replicas: 1
      Max replicas: 10
      Conditions:
      Type Status Reason Message
      —- —— —— ——-
      AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
      ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
      ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
      Events:
      Type Reason Age From Message
      —- —— —- —- ——-
      Normal SuccessfulRescale 13s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target

# kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-world-54764dfbf8-k8ph2 1/1 Running 0 1m
hello-world-54764dfbf8-q6l4v 1/1 Running 0 3h

Autoscale up to 3 pods when cpu usage limit is up to target for every horizontal-pod-autoscaler-upscale-delay 3 minutes by default

# kubectl describe hpa
Name: hello-world
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 23 Jul 2018 22:22:04 +0200
Reference: Deployment/hello-world
Metrics: ( current / target )
resource memory on pods: 9424896 / 100Mi
resource cpu on pods (as a percentage of request): 66% (333m) / 50%
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message
—- —— —— ——-
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 3
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal SuccessfulRescale 4m horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 16s horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target # kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-world-54764dfbf8-f46kh 0/1 Running 0 1m
hello-world-54764dfbf8-k8ph2 1/1 Running 0 5m
hello-world-54764dfbf8-q6l4v 1/1 Running 0 3h

Autoscale down to 1 pods when all metrics below target for horizontal-pod-autoscaler-downscale-delay 5 minutes by default:

kubectl describe hpa
Name: hello-world
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Mon, 23 Jul 2018 22:22:04 +0200
Reference: Deployment/hello-world
Metrics: ( current / target )
resource memory on pods: 10070016 / 100Mi
resource cpu on pods (as a percentage of request): 0% (0) / 50%
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message
—- —— —— ——-
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 1
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal SuccessfulRescale 10m horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 6m horizontal-pod-autoscaler New size: 3; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 1s horizontal-pod-autoscaler New size: 1; reason: All metrics below target kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-world-54764dfbf8-q6l4v 1/1 Running 0 3h

  • custom metrics
    • Autoscale up to 2 pods when cpu usage is up to target:

kubectl describe hpa
Name: hello-world
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 24 Jul 2018 18:01:11 +0200
Reference: Deployment/hello-world
Metrics: ( current / target )
resource memory on pods: 8159232 / 100Mi
“cpu_system” on pods: 7m / 20m
resource cpu on pods (as a percentage of request): 64% (321m) / 50%
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message
—- —— —— ——-
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 2
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal SuccessfulRescale 16s horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-world-54764dfbf8-5pfdr 1/1 Running 0 3s
hello-world-54764dfbf8-q6l82 1/1 Running 0 6h

Autoscale up to 3 pods when cpu_system usage limit is up to target:

kubectl describe hpa
Name: hello-world
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 24 Jul 2018 18:01:11 +0200
Reference: Deployment/hello-world
Metrics: ( current / target )
resource memory on pods: 8374272 / 100Mi
“cpu_system” on pods: 27m / 20m
resource cpu on pods (as a percentage of request): 71% (357m) / 50%
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message
—- —— —— ——-
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 3
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal SuccessfulRescale 3m horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 3s horizontal-pod-autoscaler New size: 3; reason: pods metric cpu_system above target kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-world-54764dfbf8-5pfdr 1/1 Running 0 3m
hello-world-54764dfbf8-m2hrl 1/1 Running 0 1s
hello-world-54764dfbf8-q6l82 1/1 Running 0 6h

Autoscale up to 4 pods when cpu usage limit is up to target for every horizontal-pod-autoscaler-upscale-delay 3 minutes by default

kubectl describe hpa
Name: hello-world
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 24 Jul 2018 18:01:11 +0200
Reference: Deployment/hello-world
Metrics: ( current / target )
resource memory on pods: 8374272 / 100Mi
“cpu_system” on pods: 27m / 20m
resource cpu on pods (as a percentage of request): 71% (357m) / 50%
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message
—- —— —— ——-
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 3
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from cpu resource utilization (percentage of request)
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal SuccessfulRescale 5m horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 3m horizontal-pod-autoscaler New size: 3; reason: pods metric cpu_system above target
Normal SuccessfulRescale 4s horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-world-54764dfbf8-2p9xb 1/1 Running 0 5m
hello-world-54764dfbf8-5pfdr 1/1 Running 0 2m
hello-world-54764dfbf8-m2hrl 1/1 Running 0 1s
hello-world-54764dfbf8-q6l82 1/1 Running 0 6h

Autoscale down to 1 pods when all metrics below target for horizontal-pod-autoscaler-downscale-delay 5 minutes by default:

kubectl describe hpa
Name: hello-world
Namespace: default
Labels: <none>
Annotations: <none>
CreationTimestamp: Tue, 24 Jul 2018 18:01:11 +0200
Reference: Deployment/hello-world
Metrics: ( current / target )
resource memory on pods: 8101888 / 100Mi
“cpu_system” on pods: 8m / 20m
resource cpu on pods (as a percentage of request): 0% (0) / 50%
Min replicas: 1
Max replicas: 10
Conditions:
Type Status Reason Message
—- —— —— ——-
AbleToScale True SucceededRescale the HPA controller was able to update the target scale to 1
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events:
Type Reason Age From Message
—- —— —- —- ——-
Normal SuccessfulRescale 10m horizontal-pod-autoscaler New size: 2; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 8m horizontal-pod-autoscaler New size: 3; reason: pods metric cpu_system above target
Normal SuccessfulRescale 5m horizontal-pod-autoscaler New size: 4; reason: cpu resource utilization (percentage of request) above target
Normal SuccessfulRescale 13s horizontal-pod-autoscaler New size: 1; reason: All metrics below target kubectl get pods
NAME READY STATUS RESTARTS AGE
hello-world-54764dfbf8-q6l82 1/1 Running 0 6h

Conclusion

We’ve seen how Kubernetes HPA can be used on Rancher for autoscaling your deployments up and down. It’s a very nice and useful feature to accomodate deployments scale to real service load and to accomplish services SLA’s.

We’ve also seen how horizontal-pod-autoscaler-downscale-delay (5m by default) and horizontal-pod-autoscaler-upscale-delay (3m by default) could be parametrized at kube-controller to adjust the up and down scale reaction.

For our custom metric we’ve used as the example cpu_system, but could use any metric that is exported to Prometheus and makes sense over you service performance, like http_request_number, http_response_time, etc.

To facilitate HPA use, we are working to integrate metric-server as an addon to RKE cluster deployments. It’s already included in RKE v0.1.9-rc2 for testing, but not officially supported yet. It would be supported in RKE v0.1.9.

Raul Sanchez

Raul Sanchez

DevOps Lead

twitter

Source

Feature Highlight: CPU Manager – Kubernetes

Feature Highlight: CPU Manager

Authors: Balaji Subramaniam (Intel), Connor Doyle (Intel)

This blog post describes the CPU Manager, a beta feature in Kubernetes. The CPU manager feature enables better placement of workloads in the Kubelet, the Kubernetes node agent, by allocating exclusive CPUs to certain pod containers.

cpu manager

Sounds Good! But Does the CPU Manager Help Me?

It depends on your workload. A single compute node in a Kubernetes cluster can run many pods and some of these pods could be running CPU-intensive workloads. In such a scenario, the pods might contend for the CPU resources available in that compute node. When this contention intensifies, the workload can move to different CPUs depending on whether the pod is throttled and the availability of CPUs at scheduling time. There might also be cases where the workload could be sensitive to context switches. In all the above scenarios, the performance of the workload might be affected.

If your workload is sensitive to such scenarios, then CPU Manager can be enabled to provide better performance isolation by allocating exclusive CPUs for your workload.

CPU manager might help workloads with the following characteristics:

  • Sensitive to CPU throttling effects.
  • Sensitive to context switches.
  • Sensitive to processor cache misses.
  • Benefits from sharing a processor resources (e.g., data and instruction caches).
  • Sensitive to cross-socket memory traffic.
  • Sensitive or requires hyperthreads from the same physical CPU core.

Ok! How Do I use it?

Using the CPU manager is simple. First, enable CPU manager with the Static policy in the Kubelet running on the compute nodes of your cluster. Then configure your pod to be in the Guaranteed Quality of Service (QoS) class. Request whole numbers of CPU cores (e.g., 1000m, 4000m) for containers that need exclusive cores. Create your pod in the same way as before (e.g., kubectl create -f pod.yaml). And voilĂ , the CPU manager will assign exclusive CPUs to each of container in the pod according to their CPU requests.

apiVersion: v1
kind: Pod
metadata:
name: exclusive-2
spec:
containers:
– image: quay.io/connordoyle/cpuset-visualizer
name: exclusive-2
resources:
# Pod is in the Guaranteed QoS class because requests == limits
requests:
# CPU request is an integer
cpu: 2
memory: “256M”
limits:
cpu: 2
memory: “256M”

Pod specification requesting two exclusive CPUs.

Hmm … How Does the CPU Manager Work?

For Kubernetes, and the purposes of this blog post, we will discuss three kinds of CPU resource controls available in most Linux distributions. The first two are CFS shares (what’s my weighted fair share of CPU time on this system) and CFS quota (what’s my hard cap of CPU time over a period). The CPU manager uses a third control called CPU affinity (on what logical CPUs am I allowed to execute).

By default, all the pods and the containers running on a compute node of your Kubernetes cluster can execute on any available cores in the system. The total amount of allocatable shares and quota are limited by the CPU resources explicitly reserved for kubernetes and system daemons. However, limits on the CPU time being used can be specified using CPU limits in the pod spec. Kubernetes uses CFS quota to enforce CPU limits on pod containers.

When CPU manager is enabled with the “static” policy, it manages a shared pool of CPUs. Initially this shared pool contains all the CPUs in the compute node. When a container with integer CPU request in a Guaranteed pod is created by the Kubelet, CPUs for that container are removed from the shared pool and assigned exclusively for the lifetime of the container. Other containers are migrated off these exclusively allocated CPUs.

All non-exclusive-CPU containers (Burstable, BestEffort and Guaranteed with non-integer CPU) run on the CPUs remaining in the shared pool. When a container with exclusive CPUs terminates, its CPUs are added back to the shared CPU pool.

More Details Please …

cpu manager

The figure above shows the anatomy of the CPU manager. The CPU Manager uses the Container Runtime Interface’s UpdateContainerResources method to modify the CPUs on which containers can run. The Manager periodically reconciles the current State of the CPU resources of each running container with cgroupfs.

The CPU Manager uses Policies to decide the allocation of CPUs. There are two policies implemented: None and Static. By default, the CPU manager is enabled with the None policy from Kubernetes version 1.10.

The Static policy allocates exclusive CPUs to pod containers in the Guaranteed QoS class which request integer CPUs. On a best-effort basis, the Static policy tries to allocate CPUs topologically in the following order:

  1. Allocate all the CPUs in the same processor socket if available and the container requests at least an entire socket worth of CPUs.
  2. Allocate all the logical CPUs (hyperthreads) from the same physical CPU core if available and the container requests an entire core worth of CPUs.
  3. Allocate any available logical CPU, preferring to acquire CPUs from the same socket.

How is Performance Isolation Improved by CPU Manager?

With CPU manager static policy enabled, the workloads might perform better due to one of the following reasons:

  1. Exclusive CPUs can be allocated for the workload container but not the other containers. These containers do not share the CPU resources. As a result, we expect better performance due to isolation when an aggressor or a co-located workload is involved.
  2. There is a reduction in interference between the resources used by the workload since we can partition the CPUs among workloads. These resources might also include the cache hierarchies and memory bandwidth and not just the CPUs. This helps improve the performance of workloads in general.
  3. CPU Manager allocates CPUs in a topological order on a best-effort basis. If a whole socket is free, the CPU Manager will exclusively allocate the CPUs from the free socket to the workload. This boosts the performance of the workload by avoiding any cross-socket traffic.
  4. Containers in Guaranteed QoS pods are subject to CFS quota. Very bursty workloads may get scheduled, burn through their quota before the end of the period, and get throttled. During this time, there may or may not be meaningful work to do with those CPUs. Because of how the resource math lines up between CPU quota and number of exclusive CPUs allocated by the static policy, these containers are not subject to CFS throttling (quota is equal to the maximum possible cpu-time over the quota period).

Ok! Ok! Do You Have Any Results?

Glad you asked! To understand the performance improvement and isolation provided by enabling the CPU Manager feature in the Kubelet, we ran experiments on a dual-socket compute node (Intel Xeon CPU E5-2680 v3) with hyperthreading enabled. The node consists of 48 logical CPUs (24 physical cores each with 2-way hyperthreading). Here we demonstrate the performance benefits and isolation provided by the CPU Manager feature using benchmarks and real-world workloads for three different scenarios.

How Do I Interpret the Plots?

For each scenario, we show box plots that illustrates the normalized execution time and its variability of running a benchmark or real-world workload with and without CPU Manager enabled. The execution time of the runs are normalized to the best-performing run (1.00 on y-axis represents the best performing run and lower is better). The height of the box plot shows the variation in performance. For example if the box plot is a line, then there is no variation in performance across runs. In the box, middle line is the median, upper line is 75th percentile and lower line is 25th percentile. The height of the box (i.e., difference between 75th and 25th percentile) is defined as the interquartile range (IQR). Whiskers shows data outside that range and the points show outliers. The outliers are defined as any data 1.5x IQR below or above the lower or upper quartile respectively. Every experiment is run ten times.

Protection from Aggressor Workloads

We ran six benchmarks from the PARSEC benchmark suite (the victim workloads) co-located with a CPU stress container (the aggressor workload) with and without the CPU Manager feature enabled. The CPU stress container is run as a pod in the Burstable QoS class requesting 23 CPUs with –cpus 48 flag. The benchmarks are run as pods in the Guaranteed QoS class requesting a full socket worth of CPUs (24 CPUs on this system). The figure below plots the normalized execution time of running a benchmark pod co-located with the stress pod, with and without the CPU Manager static policy enabled. We see improved performance and reduced performance variability when static policy is enabled for all test cases.

execution time

Performance Isolation for Co-located Workloads

In this section, we demonstrate how CPU manager can be beneficial to multiple workloads in a co-located workload scenario. In the box plots below we show the performance of two benchmarks (Blackscholes and Canneal) from the PARSEC benchmark suite run in the Guaranteed (Gu) and Burstable (Bu) QoS classes co-located with each other, with and without the CPU manager static policy enabled.

Starting from the top left and proceeding clockwise, we show the performance of Blackscholes in the Bu QoS class (top left), Canneal in the Bu QoS class (top right), Canneal in Gu QoS class (bottom right) and Blackscholes in the Gu QoS class (bottom left, respectively. In each case, they are co-located with Canneal in the Gu QoS class (top left), Blackscholes in the Gu QoS class (top right), Blackscholes in the Bu QoS class (bottom right) and Canneal in the Bu QoS class (bottom left) going clockwise from top left, respectively. For example, Bu-blackscholes-Gu-canneal plot (top left) is showing the performance of Blackscholes running in the Bu QoS class when co-located with Canneal running in the Gu QoS class. In each case, the pod in Gu QoS class requests cores worth a whole socket (i.e., 24 CPUs) and the pod in Bu QoS class request 23 CPUs.

There is better performance and less performance variation for both the co-located workloads in all the tests. For example, consider the case of Bu-blackscholes-Gu-canneal (top left) and Gu-canneal-Bu-blackscholes (bottom right). They show the performance of Blackscholes and Canneal run simultaneously with and without the CPU manager enabled. In this particular case, Canneal gets exclusive cores due to CPU manager since it is in the Gu QoS class and requesting integer number of CPU cores. But Blackscholes also gets exclusive set of CPUs as it is the only workload in the shared pool. As a result, both Blackscholes and Canneal get some performance isolation benefits due to the CPU manager.

performance comparison

Performance Isolation for Stand-Alone Workloads

This section shows the performance improvement and isolation provided by the CPU manager for stand-alone real-world workloads. We use two workloads from the TensorFlow official models: wide and deep and ResNet. We use the census and CIFAR10 dataset for the wide and deep and ResNet models respectively. In each case the pods (wide and deep, ResNet request 24 CPUs which corresponds to a whole socket worth of cores. As shown in the plots, CPU manager enables better performance isolation in both cases.

performance comparison

Limitations

Users might want to get CPUs allocated on the socket near to the bus which connects to an external device, such as an accelerator or high-performance network card, in order to avoid cross-socket traffic. This type of alignment is not yet supported by CPU manager.
Since the CPU manager provides a best-effort allocation of CPUs belonging to a socket and physical core, it is susceptible to corner cases and might lead to fragmentation.
The CPU manager does not take the isolcpus Linux kernel boot parameter into account, although this is reportedly common practice for some low-jitter use cases.

Acknowledgements

We thank the members of the community who have contributed to this feature or given feedback including members of WG-Resource-Management and SIG-Node.
cmx.io (for the fun drawing tool).

Notices and Disclaimers

Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to www.intel.com/benchmarks.

Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. No computer system can be absolutely secure. Check with your system manufacturer or retailer or learn more at intel.com.

Workload Configuration:
https://gist.github.com/balajismaniam/fac7923f6ee44f1f36969c29354e3902
https://gist.github.com/balajismaniam/7c2d57b2f526a56bb79cf870c122a34c
https://gist.github.com/balajismaniam/941db0d0ec14e2bc93b7dfe04d1f6c58
https://gist.github.com/balajismaniam/a1919010fe9081ca37a6e1e7b01f02e3
https://gist.github.com/balajismaniam/9953b54dd240ecf085b35ab1bc283f3c

System Configuration:
CPU
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 48
On-line CPU(s) list: 0-47
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
Model name: IntelÂŽ XeonÂŽ CPU E5-2680 v3
Memory
256 GB
OS/Kernel
Linux 3.10.0-693.21.1.el7.x86_64

Intel, the Intel logo, Xeon are trademarks of Intel Corporation or its subsidiaries in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Š Intel Corporation.

Source

Roundup – @JetstackHQ’s Tuesday Twitter Tips for Kubernetes /

By Matt Bates

Last year we were successful with a series of Kubernetes tips shared via Twitter : it was called Tuesday Tips. Following a bit of a hiatus,
we’d like to bring it back. We’re starting with a roundup of our previous tips (those that are still valid anyway!)

This blog post compiles a summary of them, and ranks them according to popularity. Looking back it’s amazing how much the project has changed, so we’re exploring new features and running another series.

First time around the top tip was:

#1 Software engineers love shell auto-completion because it saves time and keystrokes – this tweet shows how to enable it for the kubectl command.

Add kubectl shell auto-completion for bash/zsh in 1.3+ by sourcing kubectl completion . #tuesdaytip #k8s https://t.co/bb5s6J9NZN

— Jetstack (@JetstackHQ) July 26, 2016

#2 You don’t have to do anything special to get your service distributed across nodes.

Create a service prior to a RC/RS and pods will spread across nodes. The default scheduler has service anti-affinity. #tuesdaytip

— Jetstack (@JetstackHQ) June 21, 2016

#3 We showed you a new and easy way to spin up a Job.

Use kubectl run with ‘–restart=Never’ and it’ll spin up a Job (vs Deployment+RS with default restart policy of Always) #tuesdaytip

— Jetstack (@JetstackHQ) June 7, 2016

#4 You need to be able to access certain types of pod using a predictable network identity – you can declare the DNS entry using PodSpec. This was a two-part tip – we gave you the annotations to achieve this in the previous version too.

As of #k8s v1.3, you can modify a pod’s hostname and subdomain via new field in the PodSpec: https://t.co/Gw8br7Y1Dg #tuesdaytip

— Jetstack (@JetstackHQ) July 12, 2016

To achieve the same behaviour in 1.2 you can use the https://t.co/TbFWey17Ha + https://t.co/V1zOJ2SA4S annotations #tuesdaytip

— Jetstack (@JetstackHQ) July 12, 2016

#5 A bash one-liner for copying resources from one namespace to another. This deserved to place higher.

Use kubectl’s standard streams to easily copy resources across namespaces #K8s #TuesdayTip https://t.co/zDZCPUjkeG pic.twitter.com/kKr3VRN4t2

— Jetstack (@JetstackHQ) August 2, 2016

#6 DaemonSets run on all nodes – even those where scheduling is disabled for maintenance.

Nodes that are marked with “SchedulingDisabled” will still run pods from DaemonSets #TuesdayTip #Kubernetes

— Jetstack (@JetstackHQ) August 9, 2016

#7 Add a record of what has been done to your resource annotation.

kubectl has a –record flag to store create/update commands as a resource annotation, useful for introspection #tuesdaytip #kubernetes

— Jetstack (@JetstackHQ) August 23, 2016

#8 This is an important one – use kubectl drain to decommission nodes prior to maintenance (but see #6!).

Use kubectl drain to decommission a #k8s node prior to upgrade/maintenance; cordons the node (unschedulable) + deletes all pods #tuesdaytip

— Jetstack (@JetstackHQ) June 14, 2016

#9 A guest tweet and a valuable one.

This week’s #Kubernetes #TuesdayTip, courtesy of @asynchio!! https://t.co/jwnGItvf74

— Jetstack (@JetstackHQ) August 16, 2016

#10 Last, but certainly not least, as it’s still really useful for keeping track of your infrastructure.

Add your own custom labels to nodes using kubelet param –node-labels= Eg use a node-label for role (master/worker) #tuesdaytip #kubernetes

— Jetstack (@JetstackHQ) July 5, 2016

Source

Building a CI/CD Pipeline with Kubernetes using Auto Devops, Rancher, and Gitlab

Build a CI/CD Pipeline with Kubernetes and Rancher 2.0

Recorded Online Meetup of best practices and tools for building pipelines with containers and kubernetes.

Watch the training

This blog coincides with the Rancher Labs Online Meetup August 2018 covering Continuous Integration/Continuous Delivery (CI/CD). This topic is becoming increasingly important in a world where services are getting small, with frequent updates. It allows for companies to completely automate the building, testing and deployment of code to ensure that it is done in a consistent, repeatable way.

There are several different CI/CD tools available and there are a bunch of them that will integrate with Kubernetes natively.

This blog is going to cover CI/CD using the hosted GitLab.com solution but the Kubernetes integrations that will be covered are generic and should work with any CI/CD provider that interface directly to Kubernetes using a service account.

Prerequisites

  1. Rancher 2.0 cluster for deploying workloads to.
  2. gitlab.com login.

For the purposes of this blog we are going to use one of the templates that GitLab provides, first step is to login to gitlab.com via https://gitlab.com/users/sign_in

Create a Project

The first step is to create a project:

  • Click New project.
  • Choose the Create from template tab.
  • Click use template under Ruby on Rails
  • Set Project name.
  • Click Create Project.
  • Wait for the project to finish importing.

Add your Kubernetes endpoint to your project

Under Operations choose Kubernetes

Click Kubernetes cluster, then choose Add existing cluster
add existing cluster

All of the above fields will need to be filled in, the following sections will detail how to fill this in.

API URL

The API URL is the URL that GitLab will use to talk to the Kubernetes API in your cluster that is going to be used to deploy your workloads. Depending on where your Kubernetes cluster is running you will need to ensure that the port is open to allow for communication from gitlab.com to the <address>:<port> of the Kubernetes cluster.

In order to retrieve the API URL we are going to run a script on the Rancher Server that controls your Kubernetes cluster, this will generate a kubeconfig file that contains the information we need to configure the Kubernetes settings with GitLab.

  • Log onto the server that is running your Rancher Server container.
  • Download the content of get_kubeconfig_custom_cluster_rancher2.sh from https://gist.github.com/superseb/f6cd637a7ad556124132ca39961789a4
  • Create a file on the server and paste the contents into it.
  • Make the file executable chmod +x <filename>.
  • Run the script as ./<filename> <name_of_cluster_to _deploy_to>
  • This will generate a kubeconfig file in the local directory.
  • Run cat kubeconfig | grep server:
  • The https value is what needs to be populated into the API URL field in the Add existing cluster

CA Certificate

The CA certificate is required as these are often custom certificates that aren’t in the certificate store of the GitLab server, this will allow the connection to be secured.

From the folder containing the kubeconfig generated in the API URL instructions:

  • Run cat kubeconfig | grep certificate-authority-data.

This give you a base64 encoded certificate string, the field in GitLab requires it in PEM format.

  • Save the contents of the encoded string to a file i.e. cert.base64
  • Run base64 -d cert.base64 > cert.pem.
  • This will then give you the certificate in pem format, which you can then copy and paste into the CA Certificate field in GitLab

Token

For the gitlab.com instance to be able to talk to the cluster we are going to create a service account that it will use. We are also going to create a namespace for GitLab to deploy the apps to.

To simplify this I’ve put all of this into a file, which can be viewed at http://x.co/rm082018

For this to create the necessary prerequisites you will need to run the following command:

kubectl apply -f http://x.co/rm082018 (optionally add –kubeconfig <kubeconfig> if you want to use a cluster other than the default one specified in your .kube/config file)

This will create a service account and create a token for it, which is the token that we need to specify in the GitLab Kubernetes configuration pane.

To obtain the secret execute

kubectl describe secrets/gitlab-secret -n gitlab-managed-apps | grep token:

Copy the token and paste that into the GitLab configuration

Project Namespace

If you have followed this blog so far and have used applied my Kubernetes manifest file then you will need to set the Project Namespace as gitlab-managed-apps. If you have updated the manifest then you will need to set this to reflect the namespace that you have set.

Rancher Server 2.0 setup

As part of the GitLab template projects they deploy a PostgresSQL pod. This means that you will need to have a dynamic storage provisioner, if you don’t have one of these set up then go into the catalog on the cluster that you are going to deploy to and Launch the Library NFS provisioner. This isn’t recommended for production use but it will get the auto devops function working.

Enabling Auto devops

In the GitLab UI, go into the Settings – CI/CD and expand Auto DevOps

Click the Enable Auto Devops radio button

In the Domain section it requires you to specify the DNS name that will be used to reach the service that is going to be deployed. The DNS name should point to the ingress on the cluster that the service will be deployed to. For testing you can just use <host-ip>.nip.io which will resolve to the host ip that is specified.

This will not give resilience and will only allow http not https but again this is enough to show it working, if you want to use https then you would need to add a wildcard cert the ingress controller.

Click Save changes and this should automatically launch you a pipeline and start a job running.

You can go into CI/CD – Pipelines to see the progress.

At the end of the production phase you should see the http address that you can access your application on.

Hopefully this has allowed you to successfully deploy a nice demonstration CI/CD pipeline. As I stated at the start the Kubernetes piece should work for the majority of CI/CD Kubernetes integrations.

Chris Urwin

Chris Urwin

UK Technical Lead

twitter

Source

KubeVirt: Extending Kubernetes with CRDs for Virtualized Workloads

KubeVirt: Extending Kubernetes with CRDs for Virtualized Workloads

Author: David Vossel (Red Hat)

What is KubeVirt?

KubeVirt is a Kubernetes addon that provides users the ability to schedule traditional virtual machine workloads side by side with container workloads. Through the use of Custom Resource Definitions (CRDs) and other Kubernetes features, KubeVirt seamlessly extends existing Kubernetes clusters to provide a set of virtualization APIs that can be used to manage virtual machines.

Why Use CRDs Over an Aggregated API Server?

Back in the middle of 2017, those of us working on KubeVirt were at a crossroads. We had to make a decision whether or not to extend Kubernetes using an aggregated API server or to make use of the new Custom Resource Definitions (CRDs) feature.

At the time, CRDs lacked much of the functionality we needed to deliver our feature set. The ability to create our own aggregated API server gave us all the flexibility we needed, but it had one major flaw. An aggregated API server significantly increased the complexity involved with installing and operating KubeVirt.

The crux of the issue for us was that aggregated API servers required access to etcd for object persistence. This meant that cluster admins would have to either accept that KubeVirt needs a separate etcd deployment which increases complexity, or provide KubeVirt with shared access to the Kubernetes etcd store which introduces risk.

We weren’t okay with this tradeoff. Our goal wasn’t to just extend Kubernetes to run virtualization workloads, it was to do it in the most seamless and effortless way possible. We felt that the added complexity involved with an aggregated API server sacrificed the part of the user experience involved with installing and operating KubeVirt.

Ultimately we chose to go with CRDs and trust that the Kubernetes ecosystem would grow with us to meet the needs of our use case. Our bets were well placed. At this point there are either solutions in place or solutions under discussion that solve every feature gap we encountered back in 2017 when were evaluating CRDs vs an aggregated API server.

Building Layered “Kubernetes like” APIs with CRDs

We designed KubeVirt’s API to follow the same patterns users are already familiar with in the Kubernetes core API.

For example, in Kubernetes the lowest level unit that users create to perform work is a Pod. Yes, Pods do have multiple containers but logically the Pod is the unit at the bottom of the stack. A Pod represents a mortal workload. The Pod gets scheduled, eventually the Pod’s workload terminates, and that’s the end of the Pod’s lifecycle.

Workload controllers such as the ReplicaSet and StatefulSet are layered on top of the Pod abstraction to help manage scale out and stateful applications. From there we have an even higher level controller called a Deployment which is layered on top of ReplicaSets help manage things like rolling updates.

In KubeVirt, this concept of layering controllers is at the very center of our design. The KubeVirt VirtualMachineInstance (VMI) object is the lowest level unit at the very bottom of the KubeVirt stack. Similar in concept to a Pod, a VMI represents a single mortal virtualized workload that executes once until completion (powered off).

Layered on top of VMIs we have a workload controller called a VirtualMachine (VM). The VM controller is where we really begin to see the differences between how users manage virtualized workloads vs containerized workloads. Within the context of existing Kubernetes functionality, the best way to describe the VM controller’s behavior is to compare it to a StatefulSet of size one. This is because the VM controller represents a single stateful (immortal) virtual machine capable of persisting state across both node failures and multiple restarts of its underlying VMI. This object behaves in the way that is familiar to users who have managed virtual machines in AWS, GCE, OpenStack or any other similar IaaS cloud platform. The user can shutdown a VM, then choose to start that exact same VM up again at a later time.

In addition to VMs, we also have a VirtualMachineInstanceReplicaSet (VMIRS) workload controller which manages scale out of identical VMI objects. This controller behaves nearly identically to the Kubernetes ReplicSet controller. The primary difference being that the VMIRS manages VMI objects and the ReplicaSet manages Pods. Wouldn’t it be nice if we could come up with a way to use the Kubernetes ReplicaSet controller to scale out CRDs?

Each one of these KubeVirt objects (VMI, VM, VMIRS) are registered with Kubernetes as a CRD when the KubeVirt install manifest is posted to the cluster. By registering our APIs as CRDs with Kubernetes, all the tooling involved with managing Kubernetes clusters (like kubectl) have access to the KubeVirt APIs just as if they are native Kubernetes objects.

Dynamic Webhooks for API Validation

One of the responsibilities of the Kubernetes API server is to intercept and validate requests prior to allowing objects to be persisted into etcd. For example, if someone tries to create a Pod using a malformed Pod specification, the Kubernetes API server immediately catches the error and rejects the POST request. This all occurs before the object is persistent into etcd preventing the malformed Pod specification from making its way into the cluster.

This validation occurs during a process called admission control. Until recently, it was not possible to extend the default Kubernetes admission controllers without altering code and compiling/deploying an entirely new Kubernetes API server. This meant that if we wanted to perform admission control on KubeVirt’s CRD objects while they are posted to the cluster, we’d have to build our own version of the Kubernetes API server and convince our users to use that instead. That was not a viable solution for us.

Using the new Dynamic Admission Control feature that first landed in Kubernetes 1.9, we now have a path for performing custom validation on KubeVirt API through the use of a ValidatingAdmissionWebhook. This feature allows KubeVirt to dynamically register an HTTPS webhook with Kubernetes at KubeVirt install time. After registering the custom webhook, all requests related to KubeVirt API objects are forwarded from the Kubernetes API server to our HTTPS endpoint for validation. If our endpoint rejects a request for any reason, the object will not be persisted into etcd and the client receives our response outlining the reason for the rejection.

For example, if someone posts a malformed VirtualMachine object, they’ll receive an error indicating what the problem is.

$ kubectl create -f my-vm.yaml
Error from server: error when creating “my-vm.yaml”: admission webhook “virtualmachine-validator.kubevirt.io” denied the request: spec.template.spec.domain.devices.disks[0].volumeName ‘registryvolume’ not found.

In the example output above, that error response is coming directly from KubeVirt’s admission control webhook.

CRD OpenAPIv3 Validation

In addition to the validating webhook, KubeVirt also uses the ability to provide an OpenAPIv3 validation schema when registering a CRD with the cluster. While the OpenAPIv3 schema does not let us express some of the more advanced validation checks that the validation webhook provides, it does offer the ability to enforce simple validation checks involving things like required fields, max/min value lengths, and verifying that values are formatted in a way that matches a regular expression string.

Dynamic Webhooks for “PodPreset Like” Behavior

The Kubernetes Dynamic Admission Control feature is not only limited to validation logic, it also provides the ability for applications like KubeVirt to both intercept and mutate requests as they enter the cluster. This is achieved through the use of a MutatingAdmissionWebhook object. In KubeVirt, we are looking to use a mutating webhook to support our VirtualMachinePreset (VMPreset) feature.

A VMPreset acts in a similar way to a PodPreset. Just like a PodPreset allows users to define values that should automatically be injected into pods at creation time, a VMPreset allows users to define values that should be injected into VMs at creation time. Through the use of a mutating webhook, KubeVirt can intercept a request to create a VM, apply VMPresets to the VM spec, and then validate that the resulting VM object. This all occurs before the VM object is persisted into etcd which allows KubeVirt to immediately notify the user of any conflicts at the time the request is made.

Subresources for CRDs

When comparing the use of CRDs to an aggregated API server, one of the features CRDs lack is the ability to support subresources. Subresources are used to provide additional resource functionality. For example, the pod/logs and pod/exec subresource endpoints are used behind the scenes to provide the kubectl logs and kubectl exec command functionality.

Just like Kubernetes uses the pod/exec subresource to provide access to a pod’s environment, in KubeVirt we want subresources to provide serial-console, VNC, and SPICE access to a virtual machine. By adding virtual machine guest access through subresources, we can leverage RBAC to provide access control for these features.

So, given that the KubeVirt team decided to use CRD’s instead of an aggregated API server for custom resource support, how can we have subresources for CRDs when the CRD feature expiclity does not support subresources?

We created a workaround for this limitation by implementing a stateless aggregated API server that exists only to serve subresource requests. With no state, we don’t have to worry about any of the issues we identified earlier with regards to access to etcd. This means the KubeVirt API is actually supported through a combination of both CRDs for resources and an aggregated API server for stateless subresources.

This isn’t a perfect solution for us. Both aggregated API servers and CRDs require us to register an API GroupName with Kubernetes. This API GroupName field essentially namespaces the API’s REST path in a way that prevents API naming conflicts between other third party applications. Because CRDs and aggregated API servers can’t share the same GroupName, we have to register two separate GroupNames. One is used by our CRDs and the other is used by the aggregated API server for subresource requests.

Having two GroupNames in our API is slightly inconvenient because it means the REST path for the endpoints that serve the KubeVirt subresource requests have a slightly different base path than the resources.

For example, the endpoint to create a VMI object is as follows.

/apis/kubevirt.io/v1alpha2/namespaces/my-namespace/virtualmachineinstances/my-vm

However, the subresource endpoint to access graphical VNC looks like this.

/apis/subresources.kubevirt.io/v1alpha2/namespaces/my-namespace/virtualmachineinstances/my-vm/vnc

Notice that the first request uses kubevirt.io and the second request uses subresource.kubevirt.io. We don’t like that, but that’s how we’ve managed to combine CRDs with a stateless aggregated API server for subresources.

One thing worth noting is that in Kubernetes 1.10 a very basic form of CRD subresource support was added in the form of the /status and /scale subresources. This support does not help us deliver the virtualization features we want subresources for. However, there have been discussions about exposing custom CRD subresources as webhooks in a future Kubernetes version. If this functionality lands, we will gladly transition away from our stateless aggregated API server workaround to use a subresource webhook feature.

CRD Finalizers

A CRD finalizer is a feature that lets us provide a pre-delete hook in order to perform actions before allowing a CRD object to be removed from persistent storage. In KubeVirt, we use finalizers to guarantee a virtual machine has completely terminated before we allow the corresponding VMI object to be removed from etcd.

API Versioning for CRDs

The Kubernetes core APIs have the ability to support multiple versions for a single object type and perform conversions between those versions. This gives the Kubernetes core APIs a path for advancing the v1alpha1 version of an object to a v1beta1 version and so forth.

Prior to Kubernetes 1.11, CRDs did not have support for multiple versions. This meant when we wanted to progress a CRD from kubevirt.io/v1alpha1 to kubevirt.io/v1beta1, the only path available to was to backup our CRD objects, delete the registered CRD from Kubernetes, register a new CRD with the updated version, convert the backed up CRD objects to the new version, and finally post the migrated CRD objects back to the cluster.

That strategy was not exactly a viable option for us.

Fortunately thanks to some recent work to rectify this issue in Kubernetes, the latest Kubernetes v1.11 now supports CRDs with multiple versions. Note however that this initial multi version support is limited. While a CRD can now have multiple versions, the feature does not currently contain a path for performing conversions between versions. In KubeVirt, the lack of conversion makes it difficult us to evolve our API as we progress versions. Luckily, support for conversions between versions is underway and we look forward to taking advantage of that feature once it lands in a future Kubernetes release.

Source

Kubernetes 1.8: Hidden Gems – The Resource Metrics API, the Custom Metrics API and HPA v2 /

By Luke Addison

In the coming weeks we will be releasing a series of blog posts called Kubernetes 1.8: Hidden Gems, accenting some of the less obvious but wonderful features in the latest Kubernetes release. In this week’s gem, Luke looks at some of the main components in the core metrics and monitoring pipelines and in particular how they can be used to scale Kubernetes workloads.

One of the features that makes Kubernetes so powerful is its extensibility. In particular, Kubernetes allows developers to easily extend the core API Server with their own API servers, which we will refer to as ‘add-on’ API servers. The resource metrics API (also known as the master metrics API or just the metrics API) introduced in 1.8 and the custom metrics API, introduced in 1.6, are implemented in exactly this way. The resource metrics API is designed to be consumed by core system components, such as the scheduler and kubectl top, whilst the custom metrics API has a wider use case.

One Kubernetes component that makes use of both the resource metrics API and the custom metrics API is the HorizontalPodAutoscaler (HPA) controller which manages HPA resources. HPA resources are used to automatically scale the number of Pods in a ReplicationController, Deployment or ReplicaSet based on observed metrics (note that StatefulSet is not supported).

The first version of HPA (v1) was only able to scale based on observed CPU utilisation. Although useful for some cases, CPU is not always the most suitable or relevant metric to autoscale an application. HPA v2, introduced in 1.6, is able to scale based on custom metrics and has been moved from alpha to beta in 1.8. This allows users to scale on any number of application-specific metrics; for example, metrics might include the length of a queue and ingress requests per second.

The purpose of the resource metrics API is to provide a stable, versioned API that core Kubernetes components can rely on. Implementations of the API provide resource usage metrics for pods and nodes through the API Server and form part of the core metrics pipeline.

In order to get a resource metrics add-on API server up and running we first need to configure the aggregation layer. The aggregation layer is a new feature in Kubernetes 1.7 that allows add-on API servers to register themselves with kube-aggregator. The aggregator will then proxy relevant requests to these add-on API servers so that they can serve custom API resources.

kube-aggregator architecture

Configuring the aggregation layer involves setting a number of flags on the API Server. The exact flags can be found here and more information about these flags can be found in the kube-apiserver reference documentation. In order to set these flags you will need to obtain a CA certificate if your cluster provider has not taken care of that already. For more details on the various CAs used by the API Server, take a look at the excellent apiserver-builder documentation.

We now need to deploy the add-on API Server itself to serve these metrics. We can use Heapster’s implementation of the resource metrics API by running it with the –api-server flag set to true, however the recommended way is to deploy metrics-server, which is a slimmed-down version of Heapster specifically designed to serve resource usage metrics. You can do this using the deployment manifests provided in the metrics-server repository. Pay special attention to the APIService resource included with the manifests. This resource claims a URL path in the Kubernetes API (/apis/metrics.k8s.io/v1beta1 in this case) and tells the aggregator to proxy anything sent to that path to the registered service. For more information about metrics-server check out the metrics-server repository.

To test our resource metrics API we can use kubectl get –raw. The following command should return a list of resource usage metrics for all the nodes in our cluster.

$ kubectl get –raw “/apis/metrics.k8s.io/v1beta1/nodes” | jq
{
“kind”: “NodeMetricsList”,
“apiVersion”: “metrics.k8s.io/v1beta1”,
“metadata”: {
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes”
},
“items”: [
{
“metadata”: {
“name”: “node1.lukeaddison.co.uk”,
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes/node1.lukeaddison.co.uk”,
“creationTimestamp”: “2017-10-09T14:21:06Z”
},
“timestamp”: “2017-10-09T14:21:00Z”,
“window”: “1m0s”,
“usage”: {
“cpu”: “247m”,
“memory”: “1846432Ki”
}
},
{
“metadata”: {
“name”: “node2.lukeaddison.co.uk”,
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes/node2.lukeaddison.co.uk”,
“creationTimestamp”: “2017-10-09T14:21:06Z”
},
“timestamp”: “2017-10-09T14:21:00Z”,
“window”: “1m0s”,
“usage”: {
“cpu”: “511m”,
“memory”: “3589560Ki”
}
},
{
“metadata”: {
“name”: “node3.lukeaddison.co.uk”,
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes/node3.lukeaddison.co.uk”,
“creationTimestamp”: “2017-10-09T14:21:06Z”
},
“timestamp”: “2017-10-09T14:21:00Z”,
“window”: “1m0s”,
“usage”: {
“cpu”: “301m”,
“memory”: “2453620Ki”
}
}
]
}

The resource metrics API allows HPA v2 to scale on resource metrics such as CPU and memory usage, however this API does not allow us to consume application specific metrics – we need something extra.

The purpose of the custom metrics API is to provide a stable, versioned API that end-users and Kubernetes components can rely on. Implementations of the custom metrics API provide custom metrics to the HPA controller and form part of the monitoring pipeline.

The steps required to configure your cluster to use a custom metrics API implementation can be found in the Kubernetes HPA docs. There are not a huge number of implementations yet and none that are officially part of Kubernetes, but a good one to try at the moment is the Prometheus adapter. This adapter translates queries for custom metrics into PromQL, the Prometheus query language, in order to query Prometheus itself and pass the results back to the caller.

There is a nice walk-through by DirectXMan12 that covers cluster prerequisites, how to deploy Prometheus and how to inject a Prometheus adapter container into your Prometheus deployment. This adapter can then be registered with the API Server using an APIService resource to tell the aggregator where to forward requests for custom metrics. Note that the walk-through uses the API Server path /apis/custom-metrics.metrics.k8s.io but for 1.8 a decision was made to use /apis/custom.metrics.k8s.io so you will need to change your APIService resource appropriately. luxas has a nice example of everything – thanks!

As before, we can test our new add-on API server using kubectl get –raw. The following command should return a list of all custom metrics from the Prometheus adapter.

$ kubectl get –raw “/apis/custom.metrics.k8s.io/v1beta1” | jq
{
“kind”: “APIResourceList”,
“apiVersion”: “v1”,
“groupVersion”: “custom.metrics.k8s.io/v1beta1”,
“resources”: [
{
“name”: “pods/tasks_state”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“get”
]
},
{
“name”: “pods/memory_failcnt”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“get”
]
},

{
“name”: “pods/memory_swap”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“get”
]
}
]
}

With the Prometheus adapter deployed, we can use the power of Prometheus to scale our workloads on custom metrics specifically tailored to our applications.

The following example shows a HPA that scales an nginx deployment using a single resource metric (CPU) and two custom metrics (packets-per-second and requests-per-second)

apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
– type: Resource
resource:
name: cpu
targetAverageUtilization: 50
– type: Pods
pods:
metricName: packets-per-second
targetAverageValue: 1k
– type: Object
object:
metricName: requests-per-second
target:
apiVersion: extensions/v1beta1
kind: Ingress
name: main-route
targetValue: 2k

  • To get the value for the resource metric, the HPA controller uses the resource metrics API by querying the API Server path /apis/metrics.k8s.io/v1beta1/pods.
  • For custom metrics, the HPA controller uses the custom metrics API and will query the API server paths /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/packets-per-second for type Pods and /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/ingress.extensions/main-route/requests-per-second for type Object.

Note that these APIs are still in a beta status, and so may still be subject to changes. If you are using these APIs, please check the Kubernetes release notes before performing cluster upgrades to ensure your old resources are still valid.

Source

Migrating from Rancher 1.6 to Rancher 2.0 – A Short Checklist

How to Migrate from Rancher 1.6 to Rancher 2.1 Online Meetup

Key terminology differences, implementing key elements, and transforming Compose to YAML

Watch the video

Rancher 1.6 is a widely used container orchestration platform that runs and manages Docker and Kubernetes in production. Cattle is the base orchestration engine for Rancher 1.6 that is being used by many opensource and enterprise setups.

With the release of Rancher 2.0, we shifted from Cattle as the base orchestration platform to Kubernetes. As aptly noted here, Rancher 2.0 aims at helping users align with the Kubernetes Everywhere reality of the infrastructure and cloud domain.

However Rancher 2.0 differs from Rancher 1.6 because it brings in a new orchestration technology, Kubernetes. Currently, there is no straightforward upgrade path available between these versions.

So as a Rancher 1.6 user who’s interested in moving to 2.0, what steps should you take?

Checklist for Migration

In this article, I will provide a short checklist you can follow to do the migration:

Installation

Instructions for a Rancher 2.0 installation can be found in the Rancher Documentation.

Note that the Docker versions currently supported by Rancher 2.0 are:

  • 1.12.6
  • 1.13.1
  • 17.03.2

This list is per the validated Docker versions found under External Dependencies in the Kubernetes upstream release.

For Kubernetes, Rancher currently uses the 1.10 version and is planning to go with the 1.11 version in the upcoming 2.0.7 release. Rancher will keep updating to the most-recent upstream Kubernetes releases.

For a basic single node dev installation, you would provision a single Linux host with Docker and install Rancher within a Docker container using the familiar docker run command, just like the Rancher 1.6 installation.

docker run -d –restart=unless-stopped
-p 80:80 -p 443:443
rancher/rancher:latest

For development environments, we recommend installing Rancher by running a single Docker container.

One difference from the 1.6 version is that Rancher now requires you to use SSL for security, so you need to provide a certificate during installation. If you don’t provide a certificate, Rancher will generate a self-signed certificate for you. For all the certificate options, see here.

Installing a High Availability Rancher setup is a lot different compared to 1.6. The steps are outlined here.

Note that any existing automation that you have around a 1.6 Rancher Server in HA mode will need to be redone for 2.0.

Configure Authentication

If you have set up authentication with Rancher 1.6, you will be familiar with the array of auth providers supported.

Imgur

With Rancher 2.0, all the above auth providers are still supported, and there are few new ones added to the set as well! Here is what the upcoming v2.0.7 authentication provider line up looks like:

Imgur

The basic configuration requirements for setting up auth are still the same as 1.6. Look here for detailed documentation about configuring every auth provider.

One difference to highlight is that in the Rancher 2.0 setup, local auth is always turned on, even if you configure any other auth mechanism. This setting enables the Rancher admins to have a guaranteed access to the setup always!

Add a Cluster and a Project

After you have a running Rancher installation with your choice of authentication enabled, the next thing to do is create a Cluster and a Project under which workloads can be deployed.

In the Rancher 1.6 setup, after installation, you would create an environment to which compute nodes could be added. In Rancher 2.0, you need to create a cluster and add the compute nodes to the cluster.

This is what the Cluster view looks like after it is setup. Within each cluster a Default project will be available.

Imgur

The equivalent of a 1.6 Cattle environment is a 2.0 cluster with a project, for the following reasons:

  • In 1.6 compute resources were assigned to an environment. Here in 2.0, you assign them to a cluster.
  • In 1.6 users were added to the environment, where they could deploy services and share access to the hosts that belonged to that environment. In 2.0, users are added to projects and the workloads they deploy have access to the resources granted to the project.

This model of a cluster and a project allows for multi-tenancy because hosts are owned by the cluster, and the cluster can be further divided into multiple projects where users can manage their apps.

Create Namespaces

After adding a cluster and a project, the next step is to define any namespaces used to group your application workloads. In Rancher 1.6, stacks were created to group together the services belonging to your application. In 2.0, you need to create namespaces for the same purpose. A default namespace will be available within the project, which is the namespace where workloads are deployed if one is not explicitly provided.

Just like 1.6, Rancher 2.0 supports service discovery within and across namespaces.

Imgur

Migrate Apps

Once your Kubernetes cluster and a project are in place, the next step is to migrate the workloads.

  • If you are a Rancher 1.6 user using Cattle environments, then you’ll need to understand the changes in Rancher 2.0 around scheduling, load balancing, service discovery, service monitoring, and more while you migrate your workloads.

    You can follow the upcoming blog series that talks about how you can do this migration using either the Rancher 2.0 UI or a conversion from Docker Compose config to Kubernetes YAML. It aims to explore various areas around workload deployment and their equivalent options available in Rancher 2.0.

  • If you have been running your workloads in a Rancher 1.6 Kubernetes environment already, you can import the Kubernetes YAML specs directly into a Rancher 2.0 cluster using either the UI or Rancher CLI.

Hope this migration checklist helps you get started!

Prachi Damle

Prachi Damle

Principal Software Engineer

Source

Dynamically Expand Volume with CSI and Kubernetes

There is a very powerful storage subsystem within Kubernetes itself, covering a fairly broad spectrum of use cases. Whereas, when planning to build a product-grade relational database platform with Kubernetes, we face a big challenge: coming up with storage. This article describes how to extend latest Container Storage Interface 0.2.0 and integrate with Kubernetes, and demonstrates the essential facet of dynamically expanding volume capacity.

Introduction

As we focalize our customers, especially in financial space, there is a huge upswell in the adoption of container orchestration technology.

They are looking forward to open source solutions to redesign already existing monolithic applications, which have been running for several years on virtualization infrastructure or bare metal.

Considering extensibility and the extent of technical maturity, Kubernetes and Docker are at the very top of the list. But migrating monolithic applications to a distributed orchestration like Kubernetes is challenging, the relational database is critical for the migration.

With respect to the relational database, we should pay attention to storage. There is a very powerful storage subsystem within Kubernetes itself. It is very useful and covers a fairly broad spectrum of use cases. When planning to run a relational database with Kubernetes in production, we face a big challenge: coming up with storage. There are still some fundamental functionalities which are left unimplemented. Specifically, dynamically expanding volume. It sounds boring but is highly required, except for actions like create and delete and mount and unmount.

Currently, expanding volume is only available with those storage provisioners:

  • gcePersistentDisk
  • awsElasticBlockStore
  • OpenStack Cinder
  • glusterfs
  • rbd

In order to enable this feature, we should set feature gate ExpandPersistentVolumes true and turn on the PersistentVolumeClaimResize admission plugin. Once PersistentVolumeClaimResize has been enabled, resizing will be allowed by a Storage Class whose allowVolumeExpansion field is set to true.

Unfortunately, dynamically expanding volume through the Container Storage Interface (CSI) and Kubernetes is unavailable, even though the underlying storage providers have this feature.

This article will give a simplified view of CSI, followed by a walkthrough of how to introduce a new expanding volume feature on the existing CSI and Kubernetes. Finally, the article will demonstrate how to dynamically expand volume capacity.

Container Storage Interface (CSI)

To have a better understanding of what we’re going to do, the first thing we need to know is what the Container Storage Interface is. Currently, there are still some problems for already existing storage subsystem within Kubernetes. Storage driver code is maintained in the Kubernetes core repository which is difficult to test. But beyond that, Kubernetes needs to give permissions to storage vendors to check code into the Kubernetes core repository. Ideally, that should be implemented externally.

CSI is designed to define an industry standard that will enable storage providers who enable CSI to be available across container orchestration systems that support CSI.

This diagram depicts a kind of high-level Kubernetes archetypes integrated with CSI:

csi diagram

  • Three new external components are introduced to decouple Kubernetes and Storage Provider logic
  • Blue arrows present the conventional way to call against API Server
  • Red arrows present gRPC to call against Volume Driver

For more details, please visit: https://github.com/container-storage-interface/spec/blob/master/spec.md

Extend CSI and Kubernetes

In order to enable the feature of expanding volume atop Kubernetes, we should extend several components including CSI specification, “in-tree” volume plugin, external-provisioner and external-attacher.

Extend CSI spec

The feature of expanding volume is still undefined in latest CSI 0.2.0. The new 3 RPCs, including RequiresFSResize and ControllerResizeVolume and NodeResizeVolume, should be introduced.

service Controller {
rpc CreateVolume (CreateVolumeRequest)
returns (CreateVolumeResponse) {}
……
rpc RequiresFSResize (RequiresFSResizeRequest)
returns (RequiresFSResizeResponse) {}
rpc ControllerResizeVolume (ControllerResizeVolumeRequest)
returns (ControllerResizeVolumeResponse) {}
}

service Node {
rpc NodeStageVolume (NodeStageVolumeRequest)
returns (NodeStageVolumeResponse) {}
……
rpc NodeResizeVolume (NodeResizeVolumeRequest)
returns (NodeResizeVolumeResponse) {}
}

Extend “In-Tree” Volume Plugin

In addition to the extend CSI specification, the csiPlugin interface within Kubernetes should also implement expandablePlugin. The csiPlugin interface will expand PersistentVolumeClaim representing for ExpanderController.

type ExpandableVolumePlugin interface {
VolumePlugin
ExpandVolumeDevice(spec Spec, newSize resource.Quantity, oldSize resource.Quantity) (resource.Quantity, error)
RequiresFSResize() bool
}

Implement Volume Driver

Finally, to abstract complexity of the implementation, we should hard code the separate storage provider management logic into the following functions which is well-defined in the CSI specification:

  • CreateVolume
  • DeleteVolume
  • ControllerPublishVolume
  • ControllerUnpublishVolume
  • ValidateVolumeCapabilities
  • ListVolumes
  • GetCapacity
  • ControllerGetCapabilities
  • RequiresFSResize
  • ControllerResizeVolume

Demonstration

Let’s demonstrate this feature with a concrete user case.

  • Create storage class for CSI storage provisioner

allowVolumeExpansion: true
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: csi-qcfs
parameters:
csiProvisionerSecretName: orain-test
csiProvisionerSecretNamespace: default
provisioner: csi-qcfsplugin
reclaimPolicy: Delete
volumeBindingMode: Immediate

  • Deploy CSI Volume Driver including storage provisioner csi-qcfsplugin across Kubernetes cluster
  • Create PVC qcfs-pvc which will be dynamically provisioned by storage class csi-qcfs

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: qcfs-pvc
namespace: default
….
spec:
accessModes:
– ReadWriteOnce
resources:
requests:
storage: 300Gi
storageClassName: csi-qcfs

  • Create MySQL 5.7 instance to use PVC qcfs-pvc
  • In order to mirror the exact same production-level scenario, there are actually two different types of workloads including:
    • Batch insert to make MySQL consuming more file system capacity
    • Surge query request
  • Dynamically expand volume capacity through edit pvc qcfs-pvc configuration

The Prometheus and Grafana integration allows us to visualize corresponding critical metrics.

prometheus grafana

We notice that the middle reading shows MySQL datafile size increasing slowly during bulk inserting. At the same time, the bottom reading shows file system expanding twice in about 20 minutes, from 300 GiB to 400 GiB and then 500 GiB. Meanwhile, the upper reading shows the whole process of expanding volume immediately completes and hardly impacts MySQL QPS.

Conclusion

Regardless of whatever infrastructure applications have been running on, the database is always a critical resource. It is essential to have a more advanced storage subsystem out there to fully support database requirements. This will help drive the more broad adoption of cloud native technology.

Source