Setting up the Kubernetes AWS Cloud Provider – Heptio

The AWS cloud provider for Kubernetes enables a couple of key integration points for Kubernetes running on AWS; namely, dynamic provisioning of Elastic Block Store (EBS) volumes, and dynamic provisioning/configuration of Elastic Load Balancers (ELBs) for exposing Kubernetes Service objects. Unfortunately, the documentation surrounding how to set up the AWS cloud provider with Kubernetes is woefully inadequate. This article is an attempt to help address that shortcoming.

More details are provided below, but at a high-level here’s what you’ll need to make the AWS cloud provider in Kubernetes work:

  • You’ll need the hostname of each node to match EC2’s private DNS entry for that node
  • You’ll need an IAM role and policy that EC2 instances can assume as an instance profile
  • You’ll need some Kubernetes-specific tags applied to the AWS resources used by the cluster
  • You’ll add some particular command-line flags to the Kubernetes API server, Kubernetes controller manager, and the Kubelet

Let’s dig into these requirements in a bit more detail.

Node Hostname

It’s important that the name of the Node object in Kubernetes matches the private DNS entry for the instance in EC2. You can use hostnamectl or a configuration management tool (take your pick) to set the instance’s hostname to the FQDN that matches the EC2 private DNS entry. This typically looks something like ip-10–15–30–45.us-west-1.compute.internal, where 10–15–30–45 is the private IP address and us-west-1 is the region where the instance was launched.

If you’re unsure what it is, or if you’re looking for a programmatic way to retrieve the FQDN, just curl the AWS metadata server:

curl http://169.254.169.254/latest/meta-data/local-hostname

Make sure you set the hostname before attempting to bootstrap the Kubernetes cluster, or you’ll end up with nodes whose name in Kubernetes doesn’t match up, and you’ll see various “permission denied”/”unable to enumerate” errors in the logs. For what it’s worth, preliminary testing indicates that this step — setting the hostname to the FQDN — is necessary for Ubuntu but may not be needed for CentOS/RHEL.

IAM Role and Policy

Because the AWS cloud provider performs some tasks on behalf of the operator — like creating an ELB or an EBS volume — the instances need IAM permissions to perform these tasks. Thus, you need to have an IAM instance profile assigned to the instances that give them permissions.

The exact permissions that are needed are best documented in this GitHub repository for the future out-of-tree AWS cloud provider. Separate permissions are needed for the control plane nodes versus the worker nodes; the control plane nodes need more permissions than the worker nodes. This means you’ll end up with two IAM instance profiles: one for the control plane nodes with a broader set of permissions, and one for the worker nodes with a more restrictive set of permissions.

AWS Tags

The AWS cloud provider needs a specific tag to be present on almost all the AWS resources that a Kubernetes cluster needs. The tag key is kubernetes.io/cluster/cluster-name where cluster-name is an arbitrary name for the cluster; the value of the tag is immaterial (this tag replaces an older KubernetesCluster tag you may see referenced). Note that Kubernetes itself will also use this tag on things that it creates, and it will use a value of “owned”. This value does not need to be used on resources that Kubernetes itself did not create. Most of the documentation I’ve seen indicates that the tag is needed on all instances and on exactly one security group (this is the security group that will be modified to allow ELBs to access the nodes, so the worker nodes should be a part of this security group). However, I’ve also found it necessary to make sure the kubernetes.io/cluster/cluster-name tag is present on subnets and route tables in order for the integration to work as expected.

Kubernetes Configuration

On the Kubernetes side of the house, you’ll need to make sure that the –cloud-provider=aws command-line flag is present for the API server, controller manager, and every Kubelet in the cluster.

If you’re using kubeadm to set up your cluster, you can have kubeadm add the flags to the API server and controller manager by using the apiServerExtraArgs and controllerManagerExtraArgs sections in a configuration file, like this:

apiServerExtraArgs:
cloud-provider: aws
controllerManagerExtraArgs:
cloud-provider: aws

Likewise, you can use the nodeRegistration section of a kubeadm configuration file to pass extra arguments to the Kubelet, like this:

nodeRegistration:
kubeletExtraArgs:
cloud-provider: aws

I’d probably also recommend setting the name of the Kubelet to the node’s private DNS entry in EC2 (this ensures it matches the hostname, as described earlier in this article). Thus, the full nodeRegistration section might look like this:

nodeRegistration:
name: ip-10–15–30–45.us-west-1.compute.internal
kubeletExtraArgs:
cloud-provider: aws

You would need to substitute the correct fully-qualified domain name for each instance, of course.

Finally, for dynamic provisioning of Persistent Volumes you’ll need to create a default Storage Class (read about Storage Classes here). The AWS cloud provider has one, but it doesn’t get created automatically. Use this command to define the default Storage Class:

kubectl apply -f https://raw.githubusercontent.com/kubernetes/kubernetes/master/cluster/addons/storage-class/aws/default.yaml

This will create a Storage Class named “gp2” that has the necessary annotation to make it the default Storage Class (see here). Once this Storage Class is defined, dynamic provisioning of Persistent Volumes should work as expected.

Troubleshooting

Troubleshooting is notoriously difficult, as most errors seem to be “transparently swallowed” instead of exposed to the user/operator. Here are a few notes that may be helpful:

  • You _must_ have the –cloud-provider=aws flag added to the Kubelet before adding the node to the cluster. Key to the AWS integration is a particular field on the Node object — the .spec.providerID field — and that field will only get populated if the flag is present when the node is first added to the cluster. If you add a node to the cluster and then add the command-line flag afterward, this field/value won’t get populated and the integration won’t work as expected. No error is surfaced in this situation (at least, not that I’ve been able to find).
  • If you do find yourself with a missing .spec.providerID field on the Node object, you can add it with a kubectl edit node command. The format of the value for this field is aws:///<az-of-instance>/<instance-id>.
  • Missing AWS tags on resources will cause odd behaviors, like failing to create an ELB for a LoadBalancer-type Service. I haven’t had time to test all the different failure scenarios, but if the cloud provider integration isn’t working as expected I’d double-check that the Kubernetes-specific tags are present on all the AWS resources.

Hopefully, the information in this article helps remove some of the confusion and lack of clarity around getting the AWS cloud provider working with your Kubernetes cluster. I’ll do my best to keep this document updated as I discover additional failure scenarios or find more detailed documentation. If you have questions, feel free to hit me on Twitter or find me in the Kubernetes Slack community. (If you’re an expert in the AWS cloud provider code and can help flesh out the details of this post, please contact me as well!) Have fun out there, fellow Kubernauts!

Source

Containers vs. Serverless Computing | Rancher Labs

A Detailed Overview of Rancher’s Architecture

This newly-updated, in-depth guidebook provides a detailed overview of the features and functionality of the new Rancher: an open-source enterprise Kubernetes platform.

Get the eBook

Serverless computing is a hot topic right now—perhaps even hotter than
Docker containers. Is that because serverless computing is a replacement
for containers? Or is it just another popular technology that can be
used alongside containers? In this post, I take a look at what you need
to know about serverless computing, and how it should figure into your
IT strategy.

Serverless Is Not Server-less

But first, let’s clear up one point: As you may already know,
serverless computing does not mean that there are no servers involved.
It’s a cloud-based service, and just like everything else in the cloud,
it runs on servers. That said, serverless is called serverless because
the service provider handles all of the server-side IT. All you need to
do is write code and deploy it. The serverless computing provider takes
care of just about everything else. So your experience is serverless,
even if the underlying infrastructure is not.

How Serverless Works

How does it work? One of the most popular serverless platforms is AWS
Lambda. To use it, you write code (in C#, Java, Node.js, or Python),
set a few simple configuration parameters, and upload everything (along
with required dependencies) to Lambda. In Lambda terminology, the
package that you’ve uploaded is called a function. You can run the
function by calling it from an application running on an AWS service
such as S3 or EC2. Lambda then deploys your function in a container,
which persists until your function has done its job, then disappears.
The key point to keep in mind is that Lambda takes care of provisioning,
deploying, and managing the container. All you do is provide the code
that runs in the container. Everything else goes on behind the scenes.

A Serverless World?

Does this mean that we now live in a world where software developers and
IT teams no longer need to deal directly with containers, or with
nuts-and-bolts backend IT at all? Will you be able to just write code,
toss it to Lambda, and let AWS take care of everything else? If that
sounds too good to be true, it’s for a very good reason—It is too
good to be true. Serverless computing of the type represented by AWS
Lambda can be an extremely valuable resource, and if it isn’t already
part of your DevOps delivery chain, it probably should be. The key word,
however, is “part.” Serverless computing is very well suited to a
variety of tasks, but it is far from being an all-around substitute for
deploying and managing your own containers. Serverless computing is
really designed to work with containers, rather than replacing them.

What Serverless Computing Does Well

What, then, are the advantages of serverless computing? When used for
the kinds of services which it was designed to host, serverless
computing can be:

Inexpensive

With serverless, you typically pay only for the actual time and volume
of traffic used. Lambda, for example, breaks its time-based pricing down
into increments of 100 milliseconds. The actual cost is generally quite
low as well, in part because serverless functions are small, perform
relatively simple tasks, and run in generic containers with very little
overhead.

Low maintenance

The list of things that you don’t need to do when you deploy a function
on a serverless platform is much longer than the list of things that you
do need to do. Among other things, you don’t need to provision
containers, set system policies and availability levels, or handle any
backend server tasks, for that matter. You can use automatic scaling, or
manually scale use by means of some simple capacity-based settings, if
you want to.

Simple

The standardized programming environment and the lack of server and
container-deployment overhead means that you can focus on writing code.
From the point of view of your main application, the serverless function
is basically an external service which doesn’t need to be closely
integrated into the application’s container ecosystem.

Serverless Use Cases

When would you use serverless computing? Consider these possibilities:

  • Handling backend tasks for a website or mobile application. A
    serverless function can take a request (for information from a user
    database or an external source, for example) from the site or
    application frontend, retrieve the information, and hand it back to
    the frontend. It’s a quick and relatively simple task that can be
    performed as needed, with very little use of frontend time or
    resources—billing only for the actual duration of the backend
    task.
  • Processing real-time data streams and uploads. A serverless function
    can clean up, parse, and filter incoming data streams, process
    uploaded files, manage input from real-time devices, and take care
    of other workhorse tasks associated with intermittent or
    high-throughput data streams. Using serverless functions moves
    resource-intensive real-time processes out of the main application.
  • Taking care of high-volume background processes. You can use
    serverless functions to move data to long-term storage, and to
    convert, process, and analyze data, and forward metrics to an
    analytics service. In a point-of-sale system, for example,
    serverless functions could coordinate inventory, customer, order,
    and transaction databases, as well as intermittent tasks such as
    restocking and flagging variances.

network
image

The Limits of Serverless Computing

But serverless computing has some very definite limits. Lambda, for
example, has built-in restrictions on size, memory use, and time
available for a function to run. These, along with the limited list of
natively supported programming languages, are not necessarily intrinsic
to serverless computing at a fundamental level, but they reflect the
practical constraints of the system. It is important, for example, to
keep functions small and prevent them from taking up too much of the
system’s resources in order to prevent a relatively small number of
high-demand users from locking everyone else out, or overloading the
system. There are also some built-in limits that arise out of the basic
nature of serverless computing. For instance, it may be difficult or
impossible to use most monitoring tools with serverless functions, since
you typically have no access to the function’s container or
container-management system. Debugging and performance analysis may thus
be restricted to fairly primitive or indirect methods. Speed and
response time can also be uneven; these limits, along with the
constraints on size, memory, and duration, are likely to limit its use
in situations where performance is important.

What Containers Can Do Better

The list of things that containers can do better than serverless
functions is probably too long and detailed to present in a single
article. What we’ll do here is simply point out some of the main areas
where serverless functions cannot and should not be expected to replace
container-based applications.

You Can Go Big

A container-based application can be as large and as complex as you need
it to be. You can, for example, refactor a very large and complicated
monolithic application into container-based microservices, tailoring the
new architecture entirely to the requirements of the redesigned system.
If you tried to refactor the same application to run on a serverless
platform, you would encounter multiple bottlenecks based on size and
memory constraints. The resulting application would probably be composed
of extremely fragmented microservices, with a high degree of uncertainty
about availability and latency time for each fragment.

You Have Full Control

Container-based deployment gives you full control over both the
individual containers and the overall container system, as well as the
virtualized infrastructure on which it runs. This allows you to set
policies, allocate and manage resources, have fine-grained control over
security, and make full use of container-management and migration
services. With serverless computing, on the other hand, you have no
choice but to rely on the kindness of strangers.

You Have the Power to Debug, Test, and Monitor

With full control over the container environment comes full power to
look at what goes on both inside and outside of containers. This allows
effective, comprehensive debugging and testing using a full range of
resources, as well as in-depth performance monitoring at all levels. You
can identify and analyze performance problems, and fine-tune performance
on a microservice-by-microservice basis to meet the specific performance
needs of your system. Monitoring access at the system,
container-management, and container levels also makes it possible to
implement full analytics at all of these levels, with drill-down.

Working Together

The truth is that serverless computing and containers work best when
they work together, with each platform doing what it does well. A
container-based application, combined with a full-featured system for
managing and deploying containers, is the best choice by far for
large-scale and complex applications and application suites,
particularly in an enterprise or Internet environment. Serverless
computing, on the other hand, is often best for individual tasks that
can easily be run in the background or accessed as outside services.
Container-based systems can hand off such tasks to serverless
applications without tying up the resources of the main program.
Serverless applications, for their part, can provide services to
multiple clients, and can be updated, upgraded, or switched out with
other serverless applications entirely independently of the container
systems that use their services.

Conclusion

Are serverless computing services and containers competing platforms?
Hardly. Container-based and serverless computing are mutually supporting
parts of the ever-evolving world of contemporary cloud- and continuous
delivery-based software.

Source

Introducing Docker’s Windows Server Application Migration Program

Last week, we announced the Docker Windows Server Application Migration Program, designed to help companies quickly and easily migrate and modernize legacy Windows Server 2008 applications while driving continuous innovation across any application, anywhere.

We recognize that Windows Server 2008 is one of the most widely used operating systems today and the coming end-of-support in January 2020 leaves IT organizations with few viable options to cost-effectively secure their legacy applications and data. The Docker Windows Server Application Migration Program represents the best and only way to containerize and secure legacy Windows Server applications while enabling software-driven business transformation. With this new program, customers get:

  • Docker Enterprise: Leading Container Platform and only one for Windows Server applications.
    Docker Enterprise is the leading container platform in the industry– familiar to millions of developers and IT professionals. It’s also the only one that runs Windows Server applications, with support for Windows Server 2016, 1709, 1803 and soon, 2019 (in addition to multiple Linux distributions.) Organizations routinely save 50% or more through higher server consolidation and reduced hardware and licensing costs when they containerize their existing applications with Docker Enterprise.
  • Industry-proven tools & services: Easily discover, containerize, and migrate with immediate results.Only Docker delivers immediate results with industry-proven services that leverage purpose-built tools for the successful containerization of Windows Server applications in the enterprise. This new service offering is based on proven methodologies from Docker’s extensive experience working with hundreds of enterprises to modernize traditional applications. To accelerate migration of legacy applications, Docker leverages a purpose-built tool, called Docker Application Converter, to automatically scan systems for specific applications and speed up the containerization process by automatically creating Docker artifacts. Similarly, Docker Certified Infrastructure accelerates customers’ ability to operationalize the Docker Enterprise container platform. Docker Certified Infrastructure includes configuration best practices, automation tools and validated solution guides for integrating containers into enterprise IT infrastructure like VMware vSphere, Microsoft Azure and Amazon Web Services.
  • Foundation for continuous innovation: Software-driven transformation that enables continuous innovation across any application, anywhereDocker’s platform and methodologies enable organizations to both modernize existing applications and adopt new technologies to meet business needs. This enables transformation to be driven by business and not technical dependencies. Customers can easily integrate new technology stacks and architecture without friction, including: cloud-native apps, microservices, data science, edge computing, and AI.

But don’t just take our word for it. Tele2, a European telecommunications company, is rolling out Docker Enterprise to containerize their legacy Windows Applications at scale.

“We have a vision to ‘cloudify’ everything and transform how we do business as a telecom provider. Docker Enterprise is a key part of that vision. With Docker Enterprise, we have already containerized over half of our application portfolio and accelerated deployment cycles. We are looking forward to getting the advanced Windows Server support features in Docker Enterprise 2.1 into production.”

— Gregory Bohcke, Technical Architect, Tele2

By containerizing legacy applications and their dependencies with the Docker Enterprise container platform, businesses can be moved to Windows Server 2016 (and later OS) without code changes, saving millions in development costs. And because containerized applications run independently of the underlying operating system, they break the cycle of extensive, dependency-ridden upgrades, creating a future-proof architecture that makes it easy to always stay current on the latest OS.

Source

gRPC Load Balancing on Kubernetes without Tears

gRPC Load Balancing on Kubernetes without Tears

Many new gRPC users are surprised to find that Kubernetes’s default load
balancing often doesn’t work out of the box with gRPC. For example, here’s what
happens when you take a simple gRPC Node.js microservices
app
and deploy it on Kubernetes:

While the voting service displayed here has several pods, it’s clear from
Kubernetes’s CPU graphs that only one of the pods is actually doing any
work—because only one of the pods is receiving any traffic. Why?

In this blog post, we describe why this happens, and how you can easily fix it
by adding gRPC load balancing to any Kubernetes app with
Linkerd, a CNCF service mesh and service sidecar.

First, let’s understand why we need to do something special for gRPC.

gRPC is an increasingly common choice for application developers. Compared to
alternative protocols such as JSON-over-HTTP, gRPC can provide some significant
benefits, including dramatically lower (de)serialization costs, automatic type
checking, formalized APIs, and less TCP management overhead.

However, gRPC also breaks the standard connection-level load balancing,
including what’s provided by Kubernetes. This is because gRPC is built on
HTTP/2, and HTTP/2 is designed to have a single long-lived TCP connection,
across which all requests are multiplexed—meaning multiple requests can be
active on the same connection at any point in time. Normally, this is great, as
it reduces the overhead of connection management. However, it also means that
(as you might imagine) connection-level balancing isn’t very useful. Once the
connection is established, there’s no more balancing to be done. All requests
will get pinned to a single destination pod, as shown below:

The reason why this problem doesn’t occur in HTTP/1.1, which also has the
concept of long-lived connections, is because HTTP/1.1 has several features
that naturally result in cycling of TCP connections. Because of this,
connection-level balancing is “good enough”, and for most HTTP/1.1 apps we
don’t need to do anything more.

To understand why, let’s take a deeper look at HTTP/1.1. In contrast to HTTP/2,
HTTP/1.1 cannot multiplex requests. Only one HTTP request can be active at a
time per TCP connection. The client makes a request, e.g. GET /foo, and then
waits until the server responds. While that request-response cycle is
happening, no other requests can be issued on that connection.

Usually, we want lots of requests happening in parallel. Therefore, to have
concurrent HTTP/1.1 requests, we need to make multiple HTTP/1.1 connections,
and issue our requests across all of them. Additionally, long-lived HTTP/1.1
connections typically expire after some time, and are torn down by the client
(or server). These two factors combined mean that HTTP/1.1 requests typically
cycle across multiple TCP connections, and so connection-level balancing works.

Now back to gRPC. Since we can’t balance at the connection level, in order to
do gRPC load balancing, we need to shift from connection balancing to request
balancing. In other words, we need to open an HTTP/2 connection to each
destination, and balance requests across these connections, as shown below:

In network terms, this means we need to make decisions at L5/L7 rather than
L3/L4, i.e. we need to understand the protocol sent over the TCP connections.

How do we accomplish this? There are a couple options. First, our application
code could manually maintain its own load balancing pool of destinations, and
we could configure our gRPC client to use this load balancing
pool
. This approach gives
us the most control, but it can be very complex in environments like Kubernetes
where the pool changes over time as Kubernetes reschedules pods. Our
application would have to watch the Kubernetes API and keep itself up to date
with the pods.

Alternatively, in Kubernetes, we could deploy our app as headless
services
.
In this case, Kubernetes will create multiple A
records

in the DNS entry for the service. If our gRPC client is sufficiently advanced,
it can automatically maintain the load balancing pool from those DNS entries.
But this approach restricts us to certain gRPC clients, and it’s rarely
possible to only use headless services.

Finally, we can take a third approach: use a lightweight proxy.

Linkerd is a CNCF-hosted service
mesh
for Kubernetes. Most relevant to our purposes, Linkerd also functions as
a service sidecar, where it can be applied to a single service—even without
cluster-wide permissions. What this means is that when we add Linkerd to our
service, it adds a tiny, ultra-fast proxy to each pod, and these proxies watch
the Kubernetes API and do gRPC load balancing automatically. Our deployment
then looks like this:

Using Linkerd has a couple advantages. First, it works with services written in
any language, with any gRPC client, and any deployment model (headless or not).
Because Linkerd’s proxies are completely transparent, they auto-detect HTTP/2
and HTTP/1.x and do L7 load balancing, and they pass through all other traffic
as pure TCP. This means that everything will just work.

Second, Linkerd’s load balancing is very sophisticated. Not only does Linkerd
maintain a watch on the Kubernetes API and automatically update the load
balancing pool as pods get rescheduled, Linkerd uses an exponentially-weighted
moving average
of response latencies to automatically send requests to the
fastest pods. If one pod is slowing down, even momentarily, Linkerd will shift
traffic away from it. This can reduce end-to-end tail latencies.

Finally, Linkerd’s Rust-based proxies are incredibly fast and small. They
introduce <1ms of p99 latency and require <10mb of RSS per pod, meaning that
the impact on system performance will be negligible.

Linkerd is very easy to try. Just follow the steps in the Linkerd Getting
Started Instructions
—install the
CLI on your laptop, install the control plane on your cluster, and “mesh” your
service (inject the proxies into each pod). You’ll have Linkerd running on your
service in no time, and should see proper gRPC balancing immediately.

Let’s take a look at our sample voting service again, this time after
installing Linkerd:

As we can see, the CPU graphs for all pods are active, indicating that all pods
are now taking traffic—without having to change a line of code. Voila,
gRPC load balancing as if by magic!

Linkerd also gives us built-in traffic-level dashboards, so we don’t even need
to guess what’s happening from CPU charts any more. Here’s a Linkerd graph
that’s showing the success rate, request volume, and latency percentiles of
each pod:

We can see that each pod is getting around 5 RPS. We can also see that, while
we’ve solved our load balancing problem, we still have some work to do on our
success rate for this service. (The demo app is built with an intentional
failure—as an exercise to the reader, see if you can figure it out by
using the Linkerd dashboard!)

If you’re interested in a dead simple way to add gRPC load balancing to your
Kubernetes services, regardless of what language it’s written in, what gRPC
client you’re using, or how it’s deployed, you can use Linkerd to add gRPC load
balancing in a few commands.

There’s a lot more to Linkerd, including security, reliability, and debugging
and diagnostics features, but those are topics for future blog posts.

Want to learn more? We’d love to have you join our rapidly-growing community!
Linkerd is a CNCF project, hosted on GitHub, and has a thriving community
on Slack, Twitter, and the mailing lists. Come and join the fun!

Source

Kubernetes Docs Updates, International Edition

Kubernetes Docs Updates, International Edition

Author: Zach Corleissen (Linux Foundation)

As a co-chair of SIG Docs, I’m excited to share that Kubernetes docs have a fully mature workflow for localization (l10n).

Abbreviations galore

L10n is an abbreviation for localization.

I18n is an abbreviation for internationalization.

I18n is what you do to make l10n easier. L10n is a fuller, more comprehensive process than translation (t9n).

Why localization matters

The goal of SIG Docs is to make Kubernetes easier to use for as many people as possible.

One year ago, we looked at whether it was possible to host the output of a Chinese team working independently to translate the Kubernetes docs. After many conversations (including experts on OpenStack l10n), much transformation, and renewed commitment to easier localization, we realized that open source documentation is, like open source software, an ongoing exercise at the edges of what’s possible.

Consolidating workflows, language labels, and team-level ownership may seem like simple improvements, but these features make l10n scalable for increasing numbers of l10n teams. While SIG Docs continues to iterate improvements, we’ve paid off a significant amount of technical debt and streamlined l10n in a single workflow. That’s great for the future as well as the present.

Consolidated workflow

Localization is now consolidated in the kubernetes/website repository. We’ve configured the Kubernetes CI/CD system, Prow, to handle automatic language label assignment as well as team-level PR review and approval.

Language labels

Prow automatically applies language labels based on file path. Thanks to SIG Docs contributor June Yi, folks can also manually assign language labels in pull request (PR) comments. For example, when left as a comment on an issue or PR, this command assigns the label language/ko (Korean).

/language ko

These repo labels let reviewers filter for PRs and issues by language. For example, you can now filter the k/website dashboard for PRs with Chinese content.

Team review

L10n teams can now review and approve their own PRs. For example, review and approval permissions for English are assigned in an OWNERS file in the top subfolder for English content.

Adding OWNERS files to subdirectories lets localization teams review and approve changes without requiring a rubber stamp approval from reviewers who may lack fluency.

What’s next

We’re looking forward to the doc sprint in Shanghai to serve as a resource for the Chinese l10n team.

We’re excited to continue supporting the Japanese and Korean l10n teams, who are making excellent progress.

If you’re interested in localizing Kubernetes for your own language or region, check out our guide to localizing Kubernetes docs and reach out to a SIG Docs chair for support.

Source

Introducing Docker Enterprise 2.1 – Advancing Our Container Platform Leadership

Operational Insights with Docker Enterprise

Today, we’re excited to announce Docker Enterprise 2.1 – the leading enterprise container platform in the market and the only one designed for both Windows and Linux applications. When Docker Enterprise 2.1 is combined with our industry-proven tools and services in the new Windows Server application migration program, organizations get the best platform for securing and modernizing Windows Server applications, while building a foundation for continuous innovation across any application, anywhere.

In addition to expanded support for Windows Server, this latest release further extends our leadership position by introducing advancements across key enterprise requirements of choice, agility and security.

Choice: Expanding Support for Windows Server and New Kubernetes Features

Docker Enterprise 2.1 adds support for Windows Server 1709, 1803 and Windows Server 2019* in addition to Windows Server 2016. This means organizations can take advantage of the latest developments for Docker Enterprise for Windows Server Containers while supporting a broad set of Windows Server applications.

  • Smaller image sizes: The latest releases of Windows Server support much smaller image sizes which means improved performance downloading base images and building applications, contributing to faster application delivery and lower storage costs.
  • Improved compatibility requirements: With Windows Server 1709 and beyond, the host operating system and container images can deploy using different Windows Server versions, making it more flexible and easier to run containers on a shared operating system.
  • Networking enhancements: Windows Server 1709 also introduced expanded support for Swarm-based routing mesh capabilities, including service publishing using ingress mode and VIP-based service discovery when using overlay networks.

*Pending Microsoft’s release of Windows Server 2019

In addition to Windows Server updates, Docker Enterprise also gets updated with Kubernetes 1.11 and support for pod autoscaling among other new Kubernetes features.

Agility: Greater Insights & Serviceability

As many containerized applications are considered business critical, organizations want to be able to manage and secure these applications over their entire lifecycle. Docker Enterprise 2.1 includes several enhancements to help administrators better streamline Day 2 cluster and application operations:

  • New out-of-the-box dashboards: Enhanced health status dashboards provide greater insight into node and container metrics and allow for faster troubleshooting of issues as well as early identification of emerging issues.Node Health Check in Docker Enterprise 2.1
  • Visibility to known vulnerabilities at runtime: Administrators can now identify running containers with known vulnerabilities to better triage and remediate issues.
  • Task activity streams: Administrators can view and track tasks and background activities in the registry including things like vulnerability scans in progress or database updates.
  • Manage images at scale: Online garbage collection and policy-based image pruning help to reduce container image sprawl and reduce storage costs in the registry.

Security: Enhanced Security & Compliance

For many organizations in highly regulated industries, Docker Enterprise 2.1 adds several new important enhancements:

  • SAML 2.0 authentication: Integrate with your preferred Identity Provider through SAML to enable Single Sign-On (SSO) or multi-factor authentication (MFA).
  • FIPS 140-2 validated Docker Engine: The cryptographic modules in Docker Engine – Enterprise have been validated against FIPS 140-2 standards which also impacts industries that follow FISMA, HIPAA and HITECH and others.
  • Detailed audit logs: Docker Enterprise 2.1 now includes detailed logs across both the cluster and registry to capture users, actions, and timestamps for a full audit trail. These are required for forensic analysis after a security incident and to meet certain compliance regulations.
  • Kubernetes network encryption: Protect all host-to-host communications with the optional IPSec encryption module that includes key management and key rotation.

How to Get Started

We’re excited to share Docker Enterprise 2.1 – the first and only enterprise container platform for both Windows and Linux applications – and it’s available today!

To learn more about this release:

container platform, docker enterprise, Docker enterprise 2.1, migration program, new release, Windows Server application

Source

Securing the Base Infrastructure of a Kubernetes Cluster

Securing the Base Infrastructure of a Kubernetes Cluster

The first article in this series Securing Kubernetes for Cloud Native Applications, provided a discussion on why it’s difficult to secure Kubernetes, along with an overview of the various layers that require our attention, when we set about the task of securing that platform.

The very first layer in the stack, is the base infrastructure layer. We could define this in many different ways, but for the purposes of our discussion, it’s the sum of the infrastructure components on top of which Kubernetes is deployed. It’s the physical or abstracted hardware layer for compute, storage, and networking purposes, and the environment in which these resources exist. It also includes the operating system, most probably Linux, and a container runtime environment, such as Docker.

Much of what we’ll discuss, applies equally well to infrastructure components that underpin systems other than Kubernetes, but we’ll pay special attention to those factors that will enhance the security of Kubernetes.

Machines, Data Centers, and the Public Cloud

The adoption of the cloud as the vehicle for workload deployment, whether its public, private, or a hybrid mix, continues apace. And whilst the need for specialist bare-metal server provisioning hasn’t entirely gone away, the infrastructure that underpins the majority of today’s compute resource, is the virtual machine. It doesn’t really matter, however, if the machines we deploy are virtual (cloud-based or otherwise), or physical, the entity is going to reside in a data center, hosted by our own organisation, or a chosen third-party, such as a public cloud provider.

Data centers are complex, and there is a huge amount to think about when it comes to the consideration of security. It’s a general resource for hosting the data processing requirements of an entire organisation, or even, co-tenanted workloads from a multitude of independent organisations from different industries and geographies. For this reason, applying security to the many different facets of infrastructure at this level, tends to be a full-blown corporate or supplier responsibility. It will be governed according to factors such as, national or international regulation (HIPAA, GDPR), industry compliance requirements (PCI DSS), and often results in the pursuit of certified standards accreditation (ISO 27001, FIPS).

In the case of a public cloud environment, a supplier can and will provide the necessary adherence to regulatory and compliance standards at the infrastructure layer, but at some point, it comes down to the service consumer (you and me), to further build on this secure foundation. It’s a shared responsibility. As a public cloud service consumer, this begs the question, “what should I secure, and how should I go about it?” There are a lot of people with a lot of views on the topic, but one credible entity is the Center for Internet Security (CIS), a non-profit organisation dedicated to safeguarding public and private entities from the threat of malign cyber activity.

CIS Benchmarks

The CIS provides a range of tools, techniques, and information for combating the potential threat to the systems and data we rely on. CIS Benchmarks, for example, are per-platform best practice configuration guidelines for security, consensually compiled by security professionals and subject matter experts. In recognition of the ever increasing number of organisations embarking on transformation programmes, which involve migration to public and/or hybrid cloud infrastructure, the CIS have made it their business to provide benchmarks for the major public cloud providers. The CIS Amazon Web Services Foundations Benchmark is an example, and there are similar benchmarks for the other major public cloud providers.

These benchmarks provide foundational security configuration advice, covering identity and access management (IAM), ingress and egress, and logging and monitoring best practice, amongst other things. Implementing these benchmark recommendations is a great start, but it shouldn’t be the end of the journey. Each public cloud provider will have their own set of detailed recommended best practices1,2,3, and a lot of benefit can be taken from other expert voices in the domain, such as the Cloud Security Alliance.

Let’s take a moment to look at a typical cloud-based scenario that requires some careful planning from a security perspective.

Cloud Scenario: Private vs. Public Networks

How can we balance the need to keep a Kubernetes cluster secure by limiting access, whilst enabling the required access for external clients via the Internet, and also from within our own organisation?

  • Use a private network for the machines that host Kubernetes – ensure that the host machines that represent the cluster’s nodes don’t have public IP addresses. Removing the ability to make a direct connection with any of the host machines, significantly reduces the available options for attack. This simple precaution provides significant benefits, and would prevent the kind of compromises that result in the exploitation of compute resource for cryptocurrency mining, for example.
  • Use a bastion host to access the private network – external access to the host’s private network, which will be required to administer the cluster, should be provided via a suitably configured bastion host. The Kubernetes API will often also be exposed in a private network behind the bastion host. It may also be exposed publicly, but it is recommended to at least restrict access by whitelisting IP addresses from an organization’s internal network and/or its VPN server.
  • Use VPC peering with internal load balancers/DNS – where workloads running in a Kubernetes cluster with a private network, need to be accessed by other private, off-cluster clients, the workloads can be exposed with a service that invokes an internal load balancer. For example, to have an internal load balancer created in an AWS environment, the service would need the following annotation: service.beta.kubernetes.io/aws-load-balancer-internal: 0.0.0.0/0. If clients reside in another VPC, then the VPCs will need to be peered.
  • Use an external load balancer with ingress – workloads are often designed to be consumed by anonymous, external clients originating from the Internet; how is it possible to allow traffic to find the workloads in the cluster, when it’s deployed to a private network? We can achieve this in a couple of different ways, depending on the requirement at hand. The first option would be to expose workloads using a Kubernetes service object, which would result in the creation of an external cloud load balancer service (e.g. AWS ELB) on a public subnet. This approach can be quite costly, as each service exposed invokes a dedicated load balancer, but may be the preferred solution for non-HTTP services. For HTTP-based services, a more cost effective approach would be to deploy an ingress controller to the cluster, fronted by a Kubernetes service object, which in turn creates the load balancer. Traffic addressed to the load balancer’s DNS name is routed to the ingress controller endpoint(s), which evaluates the rules associated with any defined ingress objects, before further routing to the endpoints of the services in the matched rules.

This scenario demonstrates the need to carefully consider how to configure the infrastructure to be secure, whilst providing the capabilities required for delivering services to their intended audience. It’s not a unique scenario, and there will be other situations that will require similar treatment.

Locking Down the Operating System and Container Runtime

Assuming we’ve investigated and applied the necessary security configuration to make the machine-level infrastructure and its environment secure, the next task is to lock down the host operating system (OS) of each machine, and the container runtime that’s responsible for managing the lifecycle of containers.

Linux OS

Whilst it’s possible to run Microsoft Windows Server as the OS for Kubernetes worker nodes, more often than not, the control plane and worker nodes will run a variant of the Linux operating system. There might be many factors that govern the choice of Linux distribution to use (commercials, in-house skills, OS maturity), but if its possible, use a minimal distribution that has been designed just for the purpose of running containers. Examples include CoreOS Container Linux, Ubuntu Core, and the Atomic Host variants. These operating systems have been stripped down to the bare minimum to facilitate running containers at scale, and as a consequence, have a significantly reduced attack surface.

Again, the CIS have a number of different benchmarks for different flavours of Linux, providing best practice recommendations for securing the OS. These benchmarks cover what might be considered the mainstream distributions of Linux, such as RHEL, Ubuntu, SLES, Oracle Linux and Debian. If your preferred distribution isn’t covered, there is a distribution independent CIS benchmark, and there are often distribution-specific guidelines, such as the CoreOS Container Linux Hardening Guide.

Docker Engine

The final component in the infrastructure layer is the container runtime. In the early days of Kubernetes, there was no choice available; the container runtime was necessarily the Docker engine. With the advent of the Kubernetes Container Runtime Interface, however, it’s possible to remove the Docker engine dependency in favour of a runtime such as CRI-O, containerd or Frakti.4 In fact, as of Kubernetes version 1.12, an alpha feature (Runtime Class) allows for running multiple container runtimes, side-by-side in a cluster. Whichever container runtimes are deployed, they need securing.

Despite the varied choice, the Docker engine remains the default container runtime for Kubernetes (although this may change to containerd in the near future), and we’ll consider its security implications here. It’s built with a large number of configurable security settings, some of which are turned on by default, but which can be bypassed on a per-container basis. One such example is the whitelist of Linux kernel capabilities applied to each container on creation, which helps to diminish the privileges available inside a running container.

Once again, the CIS maintain a benchmark for the Docker platform, the CIS Docker Benchmark. It provides best practice recommendations for configuring the Docker daemon for optimal security. There’s even a handy open source tool (script) called Docker Bench for Security, that can be run against a Docker engine, which evaluates the system for conformance to the CIS Docker Benchmark. The tool can be run periodically to expose any drift from the desired configuration.

Some care needs to be taken when considering and measuring the security configuration of the Docker engine when it’s used as the container runtime for Kubernetes. Kubernetes ignores much of the available functions of the Docker daemon, in preference of its own security controls. For example, the Docker daemon is configured to apply a default whitelist of available Linux kernel system calls to every created container, using a seccomp profile. Unless specified, Kubernetes will instruct Docker to create pod containers ‘unconfined’ from a seccomp perspective, giving containers access to each and every syscall available. In other words, what may get configured at the lower ‘Docker layer’, may get undone at a higher level in the platform stack. We’ll cover how to mitigate these discrepancies with security contexts, in a future article.

Summary

It might be tempting to focus all our attention on the secure configuration of the Kubernetes components of a platform. But as we’ve seen in this article, the lower layer infrastructure components are equally important, and are ignored at our peril. In fact, providing a secure infrastructure layer can even mitigate problems we might introduce in the cluster layer itself. Keeping our nodes private, for example, will prevent an inadequately secured kubelet from being exploited for nefarious purposes. Infrastructure components deserve the same level of attention, as the Kubernetes components, themselves.

In the next article, we’ll move on to discuss the implications of securing the next layer in the stack, the Kubernetes cluster components.

Source

Heptio will be joining forces with VMware on a shared cloud native mission

Today we are incredibly excited to announce that Heptio will be acquired by VMware. It is a watershed day for our company, and we hope for the industry as a whole. The inevitable question is … why have we decided to join forces (now)?

Life at Heptio has been pretty exceptional since we founded the company two years ago. In a short period, we have made strong contributions in the Kubernetes and cloud native ecosystem, assembled a remarkable team and onboarded some of the most prestigious enterprises as customers. We were incredibly well capitalized and supported by our investors. So what gives?

Shared vision.

Heptio’s mission is to build a platform that accelerates IT in a multi-cloud world. We are on the precipice of a major transformation—the de-coupling of applications from the environments where they are run. And we feel a responsibility to help organizations navigate this transformation to true cloud native architecture. To realize the greatest possible impact, Heptio would need access to an entirely different level of resources and execution capabilities than we have today.

Who is best positioned to lead this transformation? The company that led a parallel transformation—the software defined data center. VMware. They have experience, execution muscle, customer trust and full leadership commitment.

When we first started conversations with VMware, the alignment of our respective visions was uncanny. With virtualization, VMware helped enterprises change the way their infrastructure operates. VMware values our products and services—together we can apply these technologies to change the way business operates, and where they run their applications.

Customer Value.

We live in a really interesting time. Enterprise companies are dealing with waves of disruption in the software space, and increasingly fragmented and complicated hosting environments. Kubernetes has an important role to play as a ubiquitous, uniform framework—to be as available and invisible as a utility, like electricity. We believe that an enterprise should pick their infrastructure hosting environment based solely on pragmatic attributes: cost economics, data locality and sovereignty, and connectivity to the consumers and workloads they support.

The value for enterprises is not the electricity, nor the vehicle through which it is delivered; value is created when applications are plugged in. The missing piece is a control plane that shapes the experience in deploying and accessing cloud native technologies. It must address day 2 challenges, integrating technologies into a practical enterprise environment, and instituting policies and management capabilities. It is going to take a hard push from an engaged, enterprise-friendly company to make it real. We are convinced that VMware possesses the ability and commitment to create a platform that works everywhere and meets the unique needs of enterprises. Together we can change the game.

Community Connection

From the start, Heptio has maintained a healthy relationship with the open source community. We’re tied into the Kubernetes steering committee and a number of SIGs, plus our team has shepherded five open source projects (Sonobuoy, Contour, Gimbal, Ark and ksonnet). We feel like the community trusts us. That trust continues to be well placed. The team at VMware have a parallel appreciation for the community; they fully understand the importance of being closely connected to foster more innovation. They have so much energy and resources already focused on this area; the time is right to join forces and accelerate the value delivered to the open source community.

Culture First.

I’ve left culture to the final topic for this post, but the fact that VMware puts its culture first is central to our decision to join their fold. We think a lot about the culture of a company not only as an expression of its values, but as a blueprint for how it creates value in the world. Even before we started conversations with VMware, we were aware of similarities in our culture and core values. We have some great people working at Heptio that ‘grew up’ at VMware—they enjoyed their work and had tremendous respect for their colleagues. This made us feel good about joining them, and instilled confidence that our teams would gel and we could focus our energy on our shared mission.

In Closing.

At Heptio, we’ve often (internally) lamented that we’re not great at celebrating our achievements. But today, we can’t avoid a proper celebration. I’m so proud of our team and what they’ve built in such a compressed time frame, and so grateful to our community for their incredible support. I’m immensely excited to join forces with an organization that shares our mission and that has proved they know how to deliver transformative technology. We’re fired up to have an even bigger impact.

Source

What is a CaaS? Containers as a Service, Defined

When public clouds first began gaining popularity, it seemed that
providers were quick to append the phrase “as a service” to everything
imaginable, as a way of indicating that a given application, service, or
infrastructure component was designed to run in the cloud. It should
therefore come as no surprise that Container as a Service, or CaaS,
refers to a cloud-based container environment. But there is a bit more
to the CaaS story than this. CaaS is not just a marketing fad. Below, I
explain what CaaS means, and why it’s valuable.

CaaS Differentiators

Container as a Service offerings are often about more than just giving
IT pros a way of running their containerized applications in the cloud.
Each cloud provider is free to create its own flavor of CaaS—and some
CaaS platforms don’t run in a major public cloud. There are two main
areas in which CaaS providers try to differentiate their offerings.
First, there is the user interface. On-premises container environments
tend to be managed through the Docker command line. However, some IT
pros prefer a GUI-based management interface over the command line. As
such, some cloud providers allow subscribers to point-and-click their
way through container creation and management. A second key
differentiator between CaaS providers is orchestration, and the
supplementary services that are attached to the orchestration engine. A
provider may use an orchestration engine, for example, to achieve
automatic scaling for containerized workloads, based on the parameters
that the administrator has established. Similarly, a cloud provider’s
orchestration engine may be used to handle container lifecycle
management tasks, or the creation of container-related reports. It is
worth noting that public cloud container services can be somewhat rigid
with regard to orchestrator selection. Microsoft Azure, for example,
allows you to choose between DC/OS, Kubernetes, and Swarm. No other
selections are available. In contrast, other container management
platforms, such as Rancher, are designed to be modular rather than
limiting you to using a few pre-defined choices.

The Advantages of Using CaaS

Many of the advantages that CaaS brings to the table are similar to the
benefits of general container usage. However, there are at least two
extra benefits that CaaS provides. The first of these benefits is that
CaaS makes it much easier to run applications in the cloud than it might
otherwise be. Applications that were designed for on-premises use do not
always behave as expected when installed on a cloud-based virtual
machine. Because containers allow for true application portability,
however, it is possible to create an application container, test the
newly containerized application on-premises, and then upload the
application to the public cloud. The containerized application should
work in the same way in the cloud as it does on-premises. Another
advantage to using CaaS is that doing so allows organizations to achieve
a greater degree of agility. Agility is one of those overused IT
buzzwords that has kind of lost its meaning. However, I tend to think of
agility as the ability to roll out a new production workload as quickly
as possible. Given this definition, CaaS definitely delivers. Imagine
for a moment that an organization’s development staff is building a new
application, and that there is a pressing need for the application to be
rolled out quickly. The developers could containerize the application,
but what if the organization is not yet using containers in production?
Better still, what happens if the organization’s container environment
lacks the capacity to host the application? This is where CaaS really
shines. Public cloud providers usually let you deploy a container
environment with just a few mouse clicks. This eliminates time-consuming
tasks such as deploying container hosts, building clusters, or testing
the container infrastructure. Cloud providers use automation to
provision for their subscribers’ container environments that have been
proven to be configured correctly. This automation eliminates the
time-consuming setup and testing process, and therefore allows the
organization to begin rolling out containerized applications almost
immediately.

Multi-Cloud Solutions

Although it is tempting to think of CaaS as solely being a service that
a cloud provider offers to its customers, it is becoming increasingly
common for organizations to host containers on multiple clouds. Doing so
can help with resilience to failure, and with load balancing. Yet
hosting containers on multiple clouds also introduces significant
challenges related to cross-cloud container management, and cross-cloud
workload scaling. These challenges can be addressed by using management
tools such as those from Rancher, which can
manage containers both on-premises and in the cloud. Brien Posey is a
Fixate IO contributor, and a 16-time Microsoft MVP with over two decades
of IT experience. Prior to going freelance, Brien was CIO for a national
chain of hospitals and healthcare facilities. He also served as lead
network engineer for the United States Department of Defense at Fort
Knox. Brien has also worked as a network administrator for some of the
largest insurance companies in America. In addition to his continued
work in IT, Brien has spent the last three years training as a
Commercial Scientist-Astronaut Candidate for a mission to study polar
mesospheric clouds from space. You can follow Posey’s spaceflight
training at www.brienposey.com/space.

Source

The Push to Modernize at William & Mary

At William & Mary, our IT infrastructure team needs to be nimble enough to support a leading-edge research university — and deliver the stability expected of a 325 year old institution. We’re not a large school, but we have a long history. We’re a public university located in Williamsburg, Virginia, and founded in 1693, making us the second-oldest institution of higher education in America. Our alumni range from three U.S. presidents to Jon Stewart.

The Linux team in the university’s central IT department is made up of 5 engineers. We run web servers, DNS, LDAP, the backend for our ERP system, components of the content management system, applications for administrative computing, some academic computing, plus a long list of niche applications and middleware. In a university environment with limited IT resources, legacy applications and infrastructure are expensive and time-consuming to keep going.

Some niche applications are tools built by developers in university departments outside of IT. Others are academic projects. We provide infrastructure for all of them, and sometimes demand can ramp up quickly. For instance, an experimental online course catalog was discovered by our students during a registration period. Many students decided they liked the experimental version better and told their friends. The unexpected demand at 7am sent developers and engineers scrambling.

More recently, IT was about to start on a major upgrade of our ERP system that would traditionally require at least 100 new virtual machines to be provisioned and maintained. The number of other applications was also set to double. This put a strain on our network and compute infrastructure. Even with a largely automated provisioning process, we didn’t have much time to spare.

We wanted to tackle both the day-to-day infrastructure management challenges and the scalability concerns. That’s what led us to look at Docker. After successfully running several production applications on Docker Engine – Community, we deployed the updated ERP application in containers on Docker Enterprise. We’re currently running on five bare metal Dell servers that support 47 active services and over 100 containers with room to grow.

Docker Enterprise also dramatically simplifies our application release cycles. Now most applications, including the ERP deployment, are being containerized. Individual departments can handle their own application upgrades, rollbacks and other changes without waiting for us to provision new infrastructure. We can also scale resources quickly, taking advantage of the public cloud as needed.

Just like our researchers have done for years, Docker has also enabled deeper collaboration with our counterparts at other universities. As we all work on completing the same major ERP upgrade we’re able to easily share and adopt enhancements much faster than with traditional architectures.

Today, Docker Enterprise is our application platform of choice. Going forward, it opens up all kinds of possibilities. We are already exploring public cloud for bursting compute resources and large-scale storage. In a year or two, we expect to operate 50 percent to 80 percent in the cloud.
Source