The Kubernetes Cluster API – Heptio

I’ve been working with Kubernetes since filing my first commit in October 2016. I’ve had the chance to collaborate with the community on Kops, Kubicorn, and Kubeadm, but there’s one gap that has been nagging me for years: how to to create the right abstraction for bringing up a Kubernetes cluster and managing it once it’s online. As it turned out, I wasn’t alone. So begins the story of Cluster API.

In 2017 I spent an afternoon enjoying lunch at the Google office in Seattle’s Fremont neighborhood meeting with Robert Bailey and Weston Hutchins. We had connected via open source and shared a few similar ideas about declarative infrastructure built on new primitives in Kubernetes. Robert Bailey and Jacob Beacham began to spearhead the charge from Google. We slowly began to start formalizing an effort to create a system for bootstrapping and managing a Kubernetes cluster in a declarative way. I remember the grass roots nature of the project. Google began work on evangelizing these ideas within the Kubernetes community.

Following an email to the Kubernetes mailing list sig cluster lifecycle, the Cluster API working group was born. The group rapidly discovered other projects such as archon and work from Loodse that also had similar ideas. It was clear we were all thinking of a brighter future with declarative infrastructure.

We started brainstorming what a declarative Kubernetes cluster might look like. We each consulted Kubernetes “elders” at our respective companies. The engineers at Google consulted Tim Hockin while I talked this over with Joe Beda, co-founder of Kubernetes and my long time colleague. Tim suggested we start building tooling and playing with abstractions to get a feel for what would work or not. At the time of this “sandbox” stage, we started prototyping in the kube-deploy repository. We needed a place to start hacking on the code and being that this feature was originally intentionally called out of scope finding a home was challenging. Later we were able to move out of that kube-deploy repository to cluster-api which is where the code lives today.

Now “Cluster API,” which is short for “Cluster Management API,” is a great example of a bad name. As Tim St. Clair (Heptio) suggested, a better name for this layer of software is probably “cluster framework”. The community is still figuring out how we plan on dealing with this conundrum.

One of the first decisions the working group made was in regard to the scope of the API itself. In other words, what would our new abstraction be responsible for representing, and what would our abstraction intentionally ignore. We landed on two primary new resources, Clusters, and Machines. The Cluster resource was intended to map cleanly to the official Kubernetes bootstrap tool kubeadm, and the Machine resource was intended to be a simple representation of some compute load in a cloud (EC2 Instances, Google Virtual Machines, etc) or a physical machine. We explicitly decided to keep the new Machine resource separate from the existing node resource, as we could munge the two together later if necessary. According to Jacob Beacham, “the biggest motivation being that this was the only way to build the functionality outside of core, since the Node API is already defined in core.” Each Cluster resource would be mapped to a set of Machine resources, and ultimately all of these combined would represent a single Kubernetes cluster.

We later looked at implementing higher level resources to manage the Machine resource, in the same way deployments and ReplicaSets manage pods in Kubernetes today. Following this logic we modeled MachineDeployment after Deployment and MachineSet after ReplicaSet. These would allow unique strategy implementations for how various controllers would manage scaling and mutating underlying Machines.

We also decided early on that “how” a controller reconciles one of these resources is unique to the controller. In other words, the API should, by design, never include bash commands or any logic that suggests “how” to bring a cluster up, only “what” the cluster should look like after it’s been stood up. Where and how the controller reasons about what it needs to do is in scope for the controller and out of scope for the API. For example, Cluster API would define what version of Kubernetes to install, but would never define how to install that version.

With ClusterAPI, we hope to solve for many of the technical concerns in managing Kubernetes clusters, by drawing upon the lessons we’ve learned from Kubeadm, Kops, Kubicorn, and Kube-up, Gardener, etc.. So we set out to build ClusterAPI with the following goals in mind:

Facilitate Atomic Transactions

While keeping the spirit of planning for failure and mitigating hazards in our software, we knew we wanted to build software that would make it possible to guarantee a successful infrastructure mutation or no mutation at all. We learned this from Kops, when a cluster upgrade or create would fail partially through and we would orphan costly infrastructure in a cloud account.

Enabling Cluster Automation

With Cluster API we find that cluster level configuration is now declared through a common API, making it easy to automate and build tooling to interface with the new API. Tools like the cluster autoscaler now are liberated from having to concern themselves with how a node is created/destroyed. This simplifies the tooling and enables new tooling to be crafted around updating the cluster definition based on arbitrary business needs. This will change how operators think about managing a cluster.

Keep infrastructure resilient

Kops, Kubicorn, and Kube-up all have a fatal flaw. They run only for a finite amount of time. They all have some concept of accomplishing a task, and then terminating the program. This was a good starting point, but it didn’t offer the goal seeking and resilient behavior users are used to with Kubernetes. We needed a controller, to reconcile state over time. If a machine goes down, we don’t want to have to worry about bringing it back up.

Create a better user experience

Standing up a Kubernetes cluster is hard. Period. We hoped to build tooling that would go from 0 to Kubernetes in a friendly way, that made sense to operators. Furthermore, we hoped the API abstractions we created would also resonate with engineers so we could encourage them to build tooling around these new abstractions. For example, if the abstraction was user-friendly, we could port the upstream autoscaler over to using the new abstraction so it no longer had to concern itself with implementation — simply updating a record in Kubernetes.

Provide a solution for cluster upgrades

We wanted a turnkey solution to upgrades. Upgrading a Kubernetes cluster is tedious and risky, and having a residual controller in place not only solved the implementation of how to upgrade a Kubernetes cluster but it also gave us visibility into the state of the current upgrade.

Bring the community together

As it stands every Kubernetes installer to date represents a cluster in a different way, and the user experience is fragmented. This diminishes Kubernetes adoption and frankly pisses users off. We hoped to reduce this fragmentation and provide a solution to defining “what” a cluster looks like, and provide tooling to jumpstart implementations that solve “how” to bring a cluster to life.

All of these lessons and more are starting to sing in the Cluster API repositories. We are on the verge of alpha and beta releases for clouds like AWS and GCP. We hope that the community driven API becomes a standard teams can count on, and we hope that the community can start to offer an arsenal of controller implementations that bring these various variants of clusters to life.

Going Further

Learn More about the Cluster API from Kris Nova (Heptio) and Loc Nguyen (VMware) live at Kubecon 2018 during their presentation on the topic. The talk will be recorded in case you can’t make it. We will upload the video to our advocacy site as soon as we can.

Source

Leave a Reply Cancel reply