By Luke Addison
In the coming weeks we will be releasing a series of blog posts called Kubernetes 1.8: Hidden Gems, accenting some of the less obvious but wonderful features in the latest Kubernetes release. In this week’s gem, Luke looks at some of the main components in the core metrics and monitoring pipelines and in particular how they can be used to scale Kubernetes workloads.
One of the features that makes Kubernetes so powerful is its extensibility. In particular, Kubernetes allows developers to easily extend the core API Server with their own API servers, which we will refer to as ‘add-on’ API servers. The resource metrics API (also known as the master metrics API or just the metrics API) introduced in 1.8 and the custom metrics API, introduced in 1.6, are implemented in exactly this way. The resource metrics API is designed to be consumed by core system components, such as the scheduler and kubectl top, whilst the custom metrics API has a wider use case.
One Kubernetes component that makes use of both the resource metrics API and the custom metrics API is the HorizontalPodAutoscaler (HPA) controller which manages HPA resources. HPA resources are used to automatically scale the number of Pods in a ReplicationController, Deployment or ReplicaSet based on observed metrics (note that StatefulSet is not supported).
The first version of HPA (v1) was only able to scale based on observed CPU utilisation. Although useful for some cases, CPU is not always the most suitable or relevant metric to autoscale an application. HPA v2, introduced in 1.6, is able to scale based on custom metrics and has been moved from alpha to beta in 1.8. This allows users to scale on any number of application-specific metrics; for example, metrics might include the length of a queue and ingress requests per second.
The purpose of the resource metrics API is to provide a stable, versioned API that core Kubernetes components can rely on. Implementations of the API provide resource usage metrics for pods and nodes through the API Server and form part of the core metrics pipeline.
In order to get a resource metrics add-on API server up and running we first need to configure the aggregation layer. The aggregation layer is a new feature in Kubernetes 1.7 that allows add-on API servers to register themselves with kube-aggregator. The aggregator will then proxy relevant requests to these add-on API servers so that they can serve custom API resources.
Configuring the aggregation layer involves setting a number of flags on the API Server. The exact flags can be found here and more information about these flags can be found in the kube-apiserver reference documentation. In order to set these flags you will need to obtain a CA certificate if your cluster provider has not taken care of that already. For more details on the various CAs used by the API Server, take a look at the excellent apiserver-builder documentation.
We now need to deploy the add-on API Server itself to serve these metrics. We can use Heapster’s implementation of the resource metrics API by running it with the –api-server flag set to true, however the recommended way is to deploy metrics-server, which is a slimmed-down version of Heapster specifically designed to serve resource usage metrics. You can do this using the deployment manifests provided in the metrics-server repository. Pay special attention to the APIService resource included with the manifests. This resource claims a URL path in the Kubernetes API (/apis/metrics.k8s.io/v1beta1 in this case) and tells the aggregator to proxy anything sent to that path to the registered service. For more information about metrics-server check out the metrics-server repository.
To test our resource metrics API we can use kubectl get –raw. The following command should return a list of resource usage metrics for all the nodes in our cluster.
$ kubectl get –raw “/apis/metrics.k8s.io/v1beta1/nodes” | jq
{
“kind”: “NodeMetricsList”,
“apiVersion”: “metrics.k8s.io/v1beta1”,
“metadata”: {
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes”
},
“items”: [
{
“metadata”: {
“name”: “node1.lukeaddison.co.uk”,
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes/node1.lukeaddison.co.uk”,
“creationTimestamp”: “2017-10-09T14:21:06Z”
},
“timestamp”: “2017-10-09T14:21:00Z”,
“window”: “1m0s”,
“usage”: {
“cpu”: “247m”,
“memory”: “1846432Ki”
}
},
{
“metadata”: {
“name”: “node2.lukeaddison.co.uk”,
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes/node2.lukeaddison.co.uk”,
“creationTimestamp”: “2017-10-09T14:21:06Z”
},
“timestamp”: “2017-10-09T14:21:00Z”,
“window”: “1m0s”,
“usage”: {
“cpu”: “511m”,
“memory”: “3589560Ki”
}
},
{
“metadata”: {
“name”: “node3.lukeaddison.co.uk”,
“selfLink”: “/apis/metrics.k8s.io/v1beta1/nodes/node3.lukeaddison.co.uk”,
“creationTimestamp”: “2017-10-09T14:21:06Z”
},
“timestamp”: “2017-10-09T14:21:00Z”,
“window”: “1m0s”,
“usage”: {
“cpu”: “301m”,
“memory”: “2453620Ki”
}
}
]
}
The resource metrics API allows HPA v2 to scale on resource metrics such as CPU and memory usage, however this API does not allow us to consume application specific metrics – we need something extra.
The purpose of the custom metrics API is to provide a stable, versioned API that end-users and Kubernetes components can rely on. Implementations of the custom metrics API provide custom metrics to the HPA controller and form part of the monitoring pipeline.
The steps required to configure your cluster to use a custom metrics API implementation can be found in the Kubernetes HPA docs. There are not a huge number of implementations yet and none that are officially part of Kubernetes, but a good one to try at the moment is the Prometheus adapter. This adapter translates queries for custom metrics into PromQL, the Prometheus query language, in order to query Prometheus itself and pass the results back to the caller.
There is a nice walk-through by DirectXMan12 that covers cluster prerequisites, how to deploy Prometheus and how to inject a Prometheus adapter container into your Prometheus deployment. This adapter can then be registered with the API Server using an APIService resource to tell the aggregator where to forward requests for custom metrics. Note that the walk-through uses the API Server path /apis/custom-metrics.metrics.k8s.io but for 1.8 a decision was made to use /apis/custom.metrics.k8s.io so you will need to change your APIService resource appropriately. luxas has a nice example of everything – thanks!
As before, we can test our new add-on API server using kubectl get –raw. The following command should return a list of all custom metrics from the Prometheus adapter.
$ kubectl get –raw “/apis/custom.metrics.k8s.io/v1beta1” | jq
{
“kind”: “APIResourceList”,
“apiVersion”: “v1”,
“groupVersion”: “custom.metrics.k8s.io/v1beta1”,
“resources”: [
{
“name”: “pods/tasks_state”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“get”
]
},
{
“name”: “pods/memory_failcnt”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“get”
]
},
…
{
“name”: “pods/memory_swap”,
“singularName”: “”,
“namespaced”: true,
“kind”: “MetricValueList”,
“verbs”: [
“get”
]
}
]
}
With the Prometheus adapter deployed, we can use the power of Prometheus to scale our workloads on custom metrics specifically tailored to our applications.
The following example shows a HPA that scales an nginx deployment using a single resource metric (CPU) and two custom metrics (packets-per-second and requests-per-second)
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
– type: Resource
resource:
name: cpu
targetAverageUtilization: 50
– type: Pods
pods:
metricName: packets-per-second
targetAverageValue: 1k
– type: Object
object:
metricName: requests-per-second
target:
apiVersion: extensions/v1beta1
kind: Ingress
name: main-route
targetValue: 2k
- To get the value for the resource metric, the HPA controller uses the resource metrics API by querying the API Server path /apis/metrics.k8s.io/v1beta1/pods.
- For custom metrics, the HPA controller uses the custom metrics API and will query the API server paths /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/packets-per-second for type Pods and /apis/custom.metrics.k8s.io/v1beta1/namespaces/default/ingress.extensions/main-route/requests-per-second for type Object.
Note that these APIs are still in a beta status, and so may still be subject to changes. If you are using these APIs, please check the Kubernetes release notes before performing cluster upgrades to ensure your old resources are still valid.