Setting up the Kubernetes AWS Cloud Provider – Heptio

The AWS cloud provider for Kubernetes enables a couple of key integration points for Kubernetes running on AWS; namely, dynamic provisioning of Elastic Block Store (EBS) volumes, and dynamic provisioning/configuration of Elastic Load Balancers (ELBs) for exposing Kubernetes Service objects. Unfortunately, the documentation surrounding how to set up the AWS cloud provider with Kubernetes is woefully inadequate. This article is an attempt to help address that shortcoming.

More details are provided below, but at a high-level here’s what you’ll need to make the AWS cloud provider in Kubernetes work:

You’ll need the hostname of each node to match EC2’s private DNS entry for that node
You’ll need an IAM role and policy that EC2 instances can assume as an instance profile
You’ll need some Kubernetes-specific tags applied to the AWS resources used by the cluster
You’ll add some particular command-line flags to the Kubernetes API server, Kubernetes controller manager, and the Kubelet

Let’s dig into these requirements in a bit more detail.

Node Hostname

It’s important that the name of the Node object in Kubernetes matches the private DNS entry for the instance in EC2. You can use hostnamectl or a configuration management tool (take your pick) to set the instance’s hostname to the FQDN that matches the EC2 private DNS entry. This typically looks something like ip-10–15–30–45.us-west-1.compute.internal, where 10–15–30–45 is the private IP address and us-west-1 is the region where the instance was launched.

If you’re unsure what it is, or if you’re looking for a programmatic way to retrieve the FQDN, just curl the AWS metadata server:

curl http://169.254.169.254/latest/meta-data/local-hostname

Make sure you set the hostname before attempting to bootstrap the Kubernetes cluster, or you’ll end up with nodes whose name in Kubernetes doesn’t match up, and you’ll see various “permission denied”/”unable to enumerate” errors in the logs. For what it’s worth, preliminary testing indicates that this step — setting the hostname to the FQDN — is necessary for Ubuntu but may not be needed for CentOS/RHEL.

IAM Role and Policy

Because the AWS cloud provider performs some tasks on behalf of the operator — like creating an ELB or an EBS volume — the instances need IAM permissions to perform these tasks. Thus, you need to have an IAM instance profile assigned to the instances that give them permissions.

The exact permissions that are needed are best documented in this GitHub repository for the future out-of-tree AWS cloud provider. Separate permissions are needed for the control plane nodes versus the worker nodes; the control plane nodes need more permissions than the worker nodes. This means you’ll end up with two IAM instance profiles: one for the control plane nodes with a broader set of permissions, and one for the worker nodes with a more restrictive set of permissions.

AWS Tags

The AWS cloud provider needs a specific tag to be present on almost all the AWS resources that a Kubernetes cluster needs. The tag key is kubernetes.io/cluster/cluster-name where cluster-name is an arbitrary name for the cluster; the value of the tag is immaterial (this tag replaces an older KubernetesCluster tag you may see referenced). Note that Kubernetes itself will also use this tag on things that it creates, and it will use a value of “owned”. This value does not need to be used on resources that Kubernetes itself did not create. Most of the documentation I’ve seen indicates that the tag is needed on all instances and on exactly one security group (this is the security group that will be modified to allow ELBs to access the nodes, so the worker nodes should be a part of this security group). However, I’ve also found it necessary to make sure the kubernetes.io/cluster/cluster-name tag is present on subnets and route tables in order for the integration to work as expected.

Kubernetes Configuration

On the Kubernetes side of the house, you’ll need to make sure that the –cloud-provider=aws command-line flag is present for the API server, controller manager, and every Kubelet in the cluster.

If you’re using kubeadm to set up your cluster, you can have kubeadm add the flags to the API server and controller manager by using the apiServerExtraArgs and controllerManagerExtraArgs sections in a configuration file, like this:

apiServerExtraArgs:
cloud-provider: aws
controllerManagerExtraArgs:
cloud-provider: aws

Likewise, you can use the nodeRegistration section of a kubeadm configuration file to pass extra arguments to the Kubelet, like this:

nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws

I’d probably also recommend setting the name of the Kubelet to the node’s private DNS entry in EC2 (this ensures it matches the hostname, as described earlier in this article). Thus, the full nodeRegistration section might look like this:

nodeRegistration:
name: ip-10–15–30–45.us-west-1.compute.internal
kubeletExtraArgs:
cloud-provider: aws

You would need to substitute the correct fully-qualified domain name for each instance, of course.

Finally, for dynamic provisioning of Persistent Volumes you’ll need to create a default Storage Class (read about Storage Classes here). The AWS cloud provider has one, but it doesn’t get created automatically. Use this command to define the default Storage Class:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/storage-class/aws/default.yaml

This will create a Storage Class named “gp2” that has the necessary annotation to make it the default Storage Class (see here). Once this Storage Class is defined, dynamic provisioning of Persistent Volumes should work as expected.

Troubleshooting

Troubleshooting is notoriously difficult, as most errors seem to be “transparently swallowed” instead of exposed to the user/operator. Here are a few notes that may be helpful:

You _must_ have the –cloud-provider=aws flag added to the Kubelet before adding the node to the cluster. Key to the AWS integration is a particular field on the Node object — the .spec.providerID field — and that field will only get populated if the flag is present when the node is first added to the cluster. If you add a node to the cluster and then add the command-line flag afterward, this field/value won’t get populated and the integration won’t work as expected. No error is surfaced in this situation (at least, not that I’ve been able to find).
If you do find yourself with a missing .spec.providerID field on the Node object, you can add it with a kubectl edit node command. The format of the value for this field is aws:///<az-of-instance>/<instance-id>.
Missing AWS tags on resources will cause odd behaviors, like failing to create an ELB for a LoadBalancer-type Service. I haven’t had time to test all the different failure scenarios, but if the cloud provider integration isn’t working as expected I’d double-check that the Kubernetes-specific tags are present on all the AWS resources.

Hopefully, the information in this article helps remove some of the confusion and lack of clarity around getting the AWS cloud provider working with your Kubernetes cluster. I’ll do my best to keep this document updated as I discover additional failure scenarios or find more detailed documentation. If you have questions, feel free to hit me on Twitter or find me in the Kubernetes Slack community. (If you’re an expert in the AWS cloud provider code and can help flesh out the details of this post, please contact me as well!) Have fun out there, fellow Kubernauts!

Source

Leave a Reply Cancel reply