Containers – The Journey to Production /

8/May 2015

By Matt Barker

Tuesday the 21st of April was the inaugural [ Contain ] meetup.

Hosted at the Hoxton Hotel, Shoreditch, we were fortunate to have representation from:

The theme chosen for the event was:

“Containers – The Journey to Production”

A quick straw poll of the 70+ members of the audience showed that over 80% were using containers – but just 5 people were using containers in production. The theme of the evening seemed appropriate as many consider how to get to production successfully.

Here are a selection of questions and answers from the panel and audience discussion.

There is a general misconception that Docker and containerisation can solve everyone’s problems now – but this is not true and panellist highlighted the several major themes for improvement, including:

  • Security: Containers do not provide ‘perfect isolation’, especially compared to that provided by hypervisor. This weakness is very widely understood but not yet properly solved. The panel highlighted that enhanced security is actively being addressed, from using SELinux to encapsulating a container in a VM (or even in VMs in containers – a la Google). The approach is very much ‘combine something we know and trust with containers’ to get the best of both worlds. Notable recent developments included Project Atomic from RedHat, VMWare with Photon and Lightwave and Rancher Labs, another exciting startup in the container space, with Rancher VM.
  • Persistence: Containers are a good fit for stateless applications but as soon as persistence of state is required (i.e. to a database) it soon becomes complicated. Bristol-born ClusterHQ are leading efforts to introduce container data management services and tools with their open source project Flocker.
  • Ecosystem maturity: There are many tools out there in very active development but for many right now it is difficult to choose the right selection for a pure container infrastructure that will have the longer-term development and support necessary for many production systems. This is especially true if starting out from scratch. Container management was highlighted as a key component, for managing containers at any reasonable scale in production. Current open source projects are not quite there yet – e.g. Kubernetes is not yet 1.0 – but the ecosystem is growing and maturing fast. It was pointed out that smart container schedulers remain a very rare breed however, and this will show up in production systems.

The panel were pretty unanimous in the benefits of containers, including:

  • Isolation: Using Linux namespaces, a container has its own isolated environment, including its own file system and processes etc. Added to the fact that containers are lightweight, using a shared kernel, containerisation is attractive for multi-tenancy.
  • Infrastructure resource efficiency: Containers help to drive improved infrastructure resource efficiency by utilising near-100% of CPU cores and memory with increased server density.
  • Portability: With common formats (e.g. Docker, appc) evolving to describe container images, with runtime environments using standard Linux kernel features, it makes it really easy to share and ship software between environments (i.e. laptop to server, cloud to cloud) solving dependency hell and dev/prod parity in one big hit.
  • Ease of use: Free and open source Docker tooling has made it easy and accessible to build, run, share and collaborate with containers, especially in existing environments that make good use of devops tools and methodology.

Adoption of containers is growing, but for many it is only to use them in place of virtual machines, missing the many wider benefits it can provide. The general consensus was that containers are a great stepping-stone to ultimately building better software. As they’re so lightweight, it becomes possible to think very differently about software; for example, containers can wrap a process with its own filesystem and spin up and down in an instant. Containers offer the potential to embrace the principles of micro-service architecture and provide the foundations to make software rapidly iterable, and highly scalable and resilient.

The panel was split on this, with a couple saying that rkt from CoreOS would be worth a look, but it is probably not production-ready yet.

There was an argument against looking in-depth at other container technologies initially, asking why you would go with anything else other than the clear market leader Docker, who have a mature and rapidly growing ecosystem.

There was panel consensus that competition is undoubtedly good for the growth and evolution of the market. This will likely lead to other viable container options in the future.

Another split of opinions on this question with some panel members asking what the point of this would be. Their argument was that you should initially be focusing on improving areas that are ripe for change, not trying to tackle huge, complex beasts that have been in-place for a very long time. The benefits of containers also arguably do not apply.

The audience and other panel members were strongly in favour of tackling these apps, arguing that any improvement is better than nothing and you can still get the benefits of portability and ease of development.

Dave from Crane likens the process to going on a diet –

“I might still look fat, but I’ve lost at least 3 stone in the past year. Splitting Websphere up, and containerising will lose you that 3 stone of ‘fat’ even if it’s not totally obvious to someone looking at you for the first time that benefits have been made.”

There was also audience interest in using containers for archiving legacy applications.

There was a strong feeling from the audience that you shouldn’t even be trying for private-public hybrid cloud, as it’s too difficult and complex. It’s also arguably a minority case requirement.

Some audience members posited that hybrid public-to-public would be useful, and an agreement that this is an exciting possibility.

The panel said that containers might not directly lead to hybrid clouds in the short-term. But an interesting point made was that just knowing that you can move containers across clouds is appealing to executives and IT leaders, and a benefit in itself – however unlikely it might actually be the case.

Portability is key and this was reiterated by many on the panel and audience.

Notes:

The next [ Contain ] event will be the 9th of June and will focus on Container Management. Sign up here.

Jonatan Bjork has also written a blog post on the event, see here.

Source

The Metrics that Matter: Horizontal Pod Autoscaling with Metrics Server

Take a deep dive into Best Practices in Kubernetes Networking

From overlay networking and SSL to ingress controllers and network security policies, we’ve seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.

Watch the video

Sometimes I feel that those of us with a bend toward distributed systems engineering like pain. Building distributed systems is hard. Every organization regardless of industry, is not only looking to solve their business problems, but to do so at potentially massive scale. On top of the challenges that come with scale, they are also concerned with creating new features and avoiding regression. And even if they achieve all of those objectives with excellence, there’s still concerns about information security, regulatory compliance, and building value into all the investment of the business.

If that picture sounds like your team and your system is now in production – congratulations! You’ve survived round 1.

Regardless of your best attempts to build a great system, sometimes life happens. There’s lots of examples of this. A great product, or viral adoption, may bring unprecidented success, and bring with it an end to how you thought your system may handle scale.

Pokémon GO Cloud Datastore Transactions Per Second Expected vs. Actual

Source: Bringing Pokémon GO to life on Google Cloud, pulled 30 May 2018

You know this may happen, and you should be prepared. That’s what this series of posts is about. Over the course of this series we’re going to cover things you should be tracking, why you should track it, and possible mitigations to handle possible root causes.

We’ll walk through each metric, methods for tracking it and things you can do about it. We’ll be using different tools for gathering and analyzing this data. We won’t be diving into too many details, but we’ll have links so you can learn more. Without further ado, let’s get started.

Metrics are for Monitoring, and More

These posts are focused upon monitoring and running Kubernetes clusters. Logs are great, but at scale they are more useful for post-mortem analysis than alerting operators that there’s a growing problem. Metrics Server allows for the monitoring of container CPU and memory usage as well as on the nodes they’re running.

This allows operators to set and monitor KPIs (Key Performance Indicators). These operator-defined levels give operations teams a way to determine when an application or node is unhealthy. This gives them all the data they need to see problems as they manifest.

In addition, Metrics Server allows Kubernetes to enable Horizontal Pod Autoscaling. This capability allows Kubernetes to scale pod instance count for a number of API objects based upon metrics reported by the Kubernetes Metrics API, reported by Metrics Server.

Setting up Metrics Server in Rancher-Managed Kubernetes Clusters

Metrics Server became the standard for pulling container metrics starting with Kubernetes 1.8 by plugging into the Kubernetes Monitoring Architecture. Prior to this standardization, the default was Heapster, which has been deprecated in favor of Metrics Server.

Today, under normal circumstances, Metrics Server won’t run on a Kubernetes Cluster provisioned by Rancher 2.0.2. This will be fixed in a later version of Rancher 2.0. Check our Github repo for the latest version of Rancher.

In order to make this work, you’ll have to modify the cluster definition via the Rancher Server API. Doing so will allow the Rancher Server to modify the Kubelet and KubeAPI arguments to include the flags required for Metrics Server to function properly.

Instructions for doing this on a Rancher Provisioned cluster, as well as instructions for modifying other hyperkube-based clusters is availabe on github here.

Jason Van Brackel

Jason Van Brackel

Senior Solutions Architect

Jason van Brackel is a Senior Solutions Architect for Rancher. He is also the organizer of the Kubernetes Philly Meetup and loves teaching at code camps, user groups and other meetups. Having worked professionally with everything from COBOL to Go, Jason loves learning, and solving challenging problems.

Source

IPVS-Based In-Cluster Load Balancing Deep Dive

IPVS-Based In-Cluster Load Balancing Deep Dive

Author: Jun Du(Huawei), Haibin Xie(Huawei), Wei Liang(Huawei)

Editor’s note: this post is part of a series of in-depth articles on what’s new in Kubernetes 1.11

Introduction

Per the Kubernetes 1.11 release blog post , we announced that IPVS-Based In-Cluster Service Load Balancing graduates to General Availability. In this blog, we will take you through a deep dive of the feature.

What Is IPVS?

IPVS (IP Virtual Server) is built on top of the Netfilter and implements transport-layer load balancing as part of the Linux kernel.

IPVS is incorporated into the LVS (Linux Virtual Server), where it runs on a host and acts as a load balancer in front of a cluster of real servers. IPVS can direct requests for TCP- and UDP-based services to the real servers, and make services of the real servers appear as virtual services on a single IP address. Therefore, IPVS naturally supports Kubernetes Service.

Why IPVS for Kubernetes?

As Kubernetes grows in usage, the scalability of its resources becomes more and more important. In particular, the scalability of services is paramount to the adoption of Kubernetes by developers/companies running large workloads.

Kube-proxy, the building block of service routing has relied on the battle-hardened iptables to implement the core supported Service types such as ClusterIP and NodePort. However, iptables struggles to scale to tens of thousands of Services because it is designed purely for firewalling purposes and is based on in-kernel rule lists.

Even though Kubernetes already support 5000 nodes in release v1.6, the kube-proxy with iptables is actually a bottleneck to scale the cluster to 5000 nodes. One example is that with NodePort Service in a 5000-node cluster, if we have 2000 services and each services have 10 pods, this will cause at least 20000 iptable records on each worker node, and this can make the kernel pretty busy.

On the other hand, using IPVS-based in-cluster service load balancing can help a lot for such cases. IPVS is specifically designed for load balancing and uses more efficient data structures (hash tables) allowing for almost unlimited scale under the hood.

IPVS-based Kube-proxy

Parameter Changes

Parameter: –proxy-mode In addition to existing userspace and iptables modes, IPVS mode is configured via –proxy-mode=ipvs. It implicitly uses IPVS NAT mode for service port mapping.

Parameter: –ipvs-scheduler

A new kube-proxy parameter has been added to specify the IPVS load balancing algorithm, with the parameter being –ipvs-scheduler. If it’s not configured, then round-robin (rr) is the default value.

  • rr: round-robin
  • lc: least connection
  • dh: destination hashing
  • sh: source hashing
  • sed: shortest expected delay
  • nq: never queue

In the future, we can implement Service specific scheduler (potentially via annotation), which has higher priority and overwrites the value.

Parameter: –cleanup-ipvs Similar to the –cleanup-iptables parameter, if true, cleanup IPVS configuration and IPTables rules that are created in IPVS mode.

Parameter: –ipvs-sync-period Maximum interval of how often IPVS rules are refreshed (e.g. ‘5s’, ‘1m’). Must be greater than 0.

Parameter: –ipvs-min-sync-period Minimum interval of how often the IPVS rules are refreshed (e.g. ‘5s’, ‘1m’). Must be greater than 0.

Parameter: –ipvs-exclude-cidrs A comma-separated list of CIDR’s which the IPVS proxier should not touch when cleaning up IPVS rules because IPVS proxier can’t distinguish kube-proxy created IPVS rules from user original IPVS rules. If you are using IPVS proxier with your own IPVS rules in the environment, this parameter should be specified, otherwise your original rule will be cleaned.

Design Considerations

IPVS Service Network Topology

When creating a ClusterIP type Service, IPVS proxier will do the following three things:

  • Make sure a dummy interface exists in the node, defaults to kube-ipvs0
  • Bind Service IP addresses to the dummy interface
  • Create IPVS virtual servers for each Service IP address respectively

Here comes an example:

# kubectl describe svc nginx-service
Name: nginx-service

Type: ClusterIP
IP: 10.102.128.4
Port: http 3080/TCP
Endpoints: 10.244.0.235:8080,10.244.1.237:8080
Session Affinity: None

# ip addr

73: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN qlen 1000
link/ether 1a:ce:f5:5f:c1:4d brd ff:ff:ff:ff:ff:ff
inet 10.102.128.4/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever

# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.102.128.4:3080 rr
-> 10.244.0.235:8080 Masq 1 0 0
-> 10.244.1.237:8080 Masq 1 0 0

Please note that the relationship between a Kubernetes Service and IPVS virtual servers is 1:N. For example, consider a Kubernetes Service that has more than one IP address. An External IP type Service has two IP addresses – ClusterIP and External IP. Then the IPVS proxier will create 2 IPVS virtual servers – one for Cluster IP and another one for External IP. The relationship between a Kubernetes Endpoint (each IP+Port pair) and an IPVS virtual server is 1:1.

Deleting of a Kubernetes service will trigger deletion of the corresponding IPVS virtual server, IPVS real servers and its IP addresses bound to the dummy interface.

Port Mapping

There are three proxy modes in IPVS: NAT (masq), IPIP and DR. Only NAT mode supports port mapping. Kube-proxy leverages NAT mode for port mapping. The following example shows IPVS mapping Service port 3080 to Pod port 8080.

TCP 10.102.128.4:3080 rr
-> 10.244.0.235:8080 Masq 1 0 0
-> 10.244.1.237:8080 Masq 1 0

Session Affinity

IPVS supports client IP session affinity (persistent connection). When a Service specifies session affinity, the IPVS proxier will set a timeout value (180min=10800s by default) in the IPVS virtual server. For example:

# kubectl describe svc nginx-service
Name: nginx-service

IP: 10.102.128.4
Port: http 3080/TCP
Session Affinity: ClientIP

# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.102.128.4:3080 rr persistent 10800

Iptables & Ipset in IPVS Proxier

IPVS is for load balancing and it can’t handle other workarounds in kube-proxy, e.g. packet filtering, hairpin-masquerade tricks, SNAT, etc.

IPVS proxier leverages iptables in the above scenarios. Specifically, ipvs proxier will fall back on iptables in the following 4 scenarios:

  • kube-proxy start with –masquerade-all=true
  • Specify cluster CIDR in kube-proxy startup
  • Support Loadbalancer type service
  • Support NodePort type service

However, we don’t want to create too many iptables rules. So we adopt ipset for the sake of decreasing iptables rules. The following is the table of ipset sets that IPVS proxier maintains:

set name members usage
KUBE-CLUSTER-IP All Service IP + port masquerade for cases that masquerade-all=true or clusterCIDR specified
KUBE-LOOP-BACK All Service IP + port + IP masquerade for resolving hairpin issue
KUBE-EXTERNAL-IP Service External IP + port masquerade for packets to external IPs
KUBE-LOAD-BALANCER Load Balancer ingress IP + port masquerade for packets to Load Balancer type service
KUBE-LOAD-BALANCER-LOCAL Load Balancer ingress IP + port with externalTrafficPolicy=local accept packets to Load Balancer with externalTrafficPolicy=local
KUBE-LOAD-BALANCER-FW Load Balancer ingress IP + port with loadBalancerSourceRanges Drop packets for Load Balancer type Service with loadBalancerSourceRanges specified
KUBE-LOAD-BALANCER-SOURCE-CIDR Load Balancer ingress IP + port + source CIDR accept packets for Load Balancer type Service with loadBalancerSourceRanges specified
KUBE-NODE-PORT-TCP NodePort type Service TCP port masquerade for packets to NodePort(TCP)
KUBE-NODE-PORT-LOCAL-TCP NodePort type Service TCP port with externalTrafficPolicy=local accept packets to NodePort Service with externalTrafficPolicy=local
KUBE-NODE-PORT-UDP NodePort type Service UDP port masquerade for packets to NodePort(UDP)
KUBE-NODE-PORT-LOCAL-UDP NodePort type service UDP port with externalTrafficPolicy=local accept packets to NodePort Service with externalTrafficPolicy=local

In general, for IPVS proxier, the number of iptables rules is static, no matter how many Services/Pods we have.

Run kube-proxy in IPVS Mode

Currently, local-up scripts, GCE scripts, and kubeadm support switching IPVS proxy mode via exporting environment variables (KUBE_PROXY_MODE=ipvs) or specifying flag (–proxy-mode=ipvs). Before running IPVS proxier, please ensure IPVS required kernel modules are already installed.

ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack_ipv4

Finally, for Kubernetes v1.10, feature gate SupportIPVSProxyMode is set to true by default. For Kubernetes v1.11, the feature gate is entirely removed. However, you need to enable –feature-gates=SupportIPVSProxyMode=true explicitly for Kubernetes before v1.10.

Get Involved

The simplest way to get involved with Kubernetes is by joining one of the many Special Interest Groups (SIGs) that align with your interests. Have something you’d like to broadcast to the Kubernetes community? Share your voice at our weekly community meeting, and through the channels below.

Thank you for your continued feedback and support.
Post questions (or answer questions) on Stack Overflow
Join the community portal for advocates on K8sPort
Follow us on Twitter @Kubernetesio for latest updates
Chat with the community on Slack
Share your Kubernetes story

Source

Are you Ready to Manage your Infrastructure like Google? // Jetstack Blog

19/Jun 2015

By Matt Bates

Google’s Kubernetes open source project for container management has just recently celebrated its first birthday. In its first year, it has attracted massive community and enterprise interest. The numbers speak for themselves: almost 400 contributors from across industry; over 8000 stars and 12000+ commits on Github. And many will have heard it mentioned in almost every other conversation at recent container meetups and industry conferences – no doubt with various different pronunciations!

In a series of blog posts in the run-up to the eagerly anticipated 1.0 release of Kubernetes this summer, container specialists Jetstack will be taking a close look at how it works and how to get started, featuring insight based on our experiences to date. Future posts will walk through deployment of a modern-stack micro-service application on Kubernetes locally and in the cloud. We’ll be using a variety of technology along the way, including Weave, Flocker and MongoDB.

Jetstack Logo

Over a month ago, Google lifted the lid on its internal Borg system in a research paper. This once-secret sauce runs Google’s entire infrastructure, managing vast server clusters across the globe – a crown jewel that until not long ago was never mentioned, even as a secret code name.

Unlike previous Google papers, such as Map-Reduce, Google went one step further and kicked off an open source implementation of the container management system in advance of the paper. Although Kubernetes is not strictly a straight, like-for-like implementation, it is heavily inspired by Borg and its predecessor. Importantly, it implements lessons learned in using these systems at massive scale in production.

Arguably, Kubernetes is even better than Borg – and it’s free and available to us all. Pretty awesome, right?

Container Ship

Containerising a single application, running it elsewhere and then collaborating with others is relatively straightforward, and this is testament to the great, albeit imperfect, Docker toolset and image format. It’s a wild success for good reason.

But today’s applications are increasingly complex software systems with many moving parts. They need to be deployed and updated at a rapid pace to keep up with our ability to iterate and innovate. With lots of containers, it soon becomes hard work to coordinate and deploy the sprawl, and importantly, keep them running in production.

Just consider a simple web application deployed using containers. There will be a web server (or many), a reverse proxy, load balancers and backend datastore – already a handful of containers to deploy and manage. And as we now head into a world of micro services, this web application might feasibly be further decomposed into many loosely coupled services. These might use dedicated and perhaps different datastores, and will be developed and managed entirely by separate teams in the organisation. Let’s not forget that each of these containers will also require replicas for scale-out and high availability. This means 10s of containers and this is just in the case of a simple web app.

It’s not just the number of containers that becomes challenging: services may need to be deployed together, to certain regions and zones for availability. These services need to understand how to find each other in this containerised world.

Kubernetes Log

The underlying container technology to Docker has actually been baked into the Linux Kernel for some time. It is these capabilities that Google have used in Borg for over a decade, helping them to innovate rapidly and develop some of the Internet’s best-loved services. At an estimated cost of $200M per data centre, squeezing every last drop of performance is a big incentive for Google and its balance sheet.

Lightweight and rapid to start and stop, containers are used for everything at Google – literally everything and that includes VMs. Google report that they start a colossal two billion containers every week, everything from GMail to Maps, AppEngine to Docs.

Kubernetes has elegant abstractions that enable developers to think about applications as services, rather than the nuts and bolts of individual containers on ‘Pet’ servers – specific servers, specific IPs and hostnames.

Pods, replication controllers and services are the fundamental units of Kubernetes used to describe the desired state of a system – including, for example, the number of instances of an application, the container images to deploy and the services to expose. In the next blog, we’ll dig into the detail of these concepts and see them in action.

Kubernetes handles the deployment, according to the rules, and goes a step further by pro-actively monitoring, scaling, and auto-healing these services in order to maintain this desired state. In effect, Kubernetes herds the server ‘Cattle’ and chooses appropriate resources from a cluster to schedule and expose services, including with IPs and DNS – automatically and transparently.

One of the great benefits of Kubernetes is a whole lot less deployment complexity. As it is application-centric, the configuration is simple to grasp and use by developers and ops alike. The friction to rapidly deploy services is diminished. And with smart scheduling, these services can be positioned in the right place at the right time to maximise cluster resource efficiency.

Kubernetes Overview

Kubernetes isn’t just for the Google cloud. It runs almost everywhere. Google of course supports Kubernetes on their cloud platform, on top of GCE (Google Compute Engine) with VMs but also with a more dedicated, hosted Kubernetes-as-Service called GKE (Google Container Engine). Written in Go and completely open source, Kubernetes can also be deployed in public or private cloud, on VMs or bare metal.

Kubernetes offers a real promise of cloud-native portability. Kubernetes configuration artefacts that describe services and all its components, can be moved from cloud to cloud with ease. Applications are packaged as container images based on Docker (and more recently Rkt). This openness means no lock-in and a complete flexibility to move services and workloads, for reasons of performance, cost efficiency and more.

Kubernetes is an exciting project that brings Google’s infrastructure technology to us all. It changes the way we think about modern application stack deployment, management and monitoring and has the potential to bring huge efficiencies to resource utilization and portability in cloud environments, as well as lower the friction to innovate.

Stay tuned for the next part where we’ll be detailing Kubernetes core concepts and putting them to practice with a local deployment of a simple web application.

To keep up-to-date and find out more, comment or feedback, please follow us @JetstackHQ.

Source

CRDs and Custom Controllers in Rancher 2.0

A Detailed Overview of Rancher’s Architecture

This newly-updated, in-depth guidebook provides a detailed overview of the features and functionality of the new Rancher: an open-source enterprise Kubernetes platform.

Get the eBook

Rancher 2.0 is often said to be an enterprise grade management platform for Kubernetes, 100% built on Kubernetes. But what does that really mean?

This article will answer that question by giving an overview of Rancher management plane architecture, and explain how API resources are stored, represented and controlled utilizing Kubernetes primitives like CustomResourceDefinition(CRD) and custom controllers.

Rancher API Server

While building Rancher 2.0, we went over several iterations till we set on the current architecture – in which the Rancher API server is built on top of an embedded Kubernetes API server and etcd database. The fact that Kubernetes is highly extendable as a development platform, where the API could be extended defining a new object as a CustomResourceDefinition (CRD), made adoption easy:

Imgur

All Rancher specific resources created using the Rancher API get translated to CRD objects, with their lifecycle managed by one or several controllers that were built following Kubernetes controller pattern.

The diagram below illustrates the high-level architecture of Rancher 2.0. The figure depicts a Rancher server installation that manages two Kubernetes clusters: one cluster created by RKE (Rancher open source Kubernetes installer that can run anywhere, and another cluster created by a GKE driver.

Imgur

As you can see, the Rancher server stores all its resources in an etcd database, similar to how Kubernetes native resources like Pod, Namespace, etc. get stored in a user cluster. To see all the resources Rancher creates in the Rancher server etcd, simply ssh in to the Rancher container, and run kubectl get crd:

Imgur

To see the items of a particular type, run kubectl get <crd name>:

Imgur

and kubectl describe <crd name> <resource name> to get the resource fields and their respective values.

Note that this is the internal representation of the resource, and not necessarily what the end user would see. Kubernetes is a development platform, and the structure of the resource is rich and nested, in order to provide a great level of flexibility to the controllers managing the resource. But when it comes to user experience and APIs, users prefer more flat and concise representation, and certain fields should be dropped as they only carry value for the underlying controllers. The Rancher API layer takes care of doing all that by transforming Kubernetes native resources to user API objects:

Imgur

The example above shows how Cluster – a CRD Rancher uses to represent the clusters it provisions – fields get transformed at the API level. Finalizers/initializers/resourceVersion get dropped as this information is mostly used by controllers; name/namespace are being moved up to become a top level field to flatten the resource representation.

There are some other nice capabilities of the Rancher API framework. It adds features like sorting, object links filtering fields based on user permissions, pluggable validators and formatters for CRD fields.

What makes a CRD?

When it comes to defining the structure of the custom resource, there are some best practices that are recommended to follow. Lets first look at the high level structure of the object:

Imgur

Let’s start with the metadata – a field that can be updated both by end users and the system. Name (and the namespace, if the object is namespace scoped) is used to uniquely identify the resource of a certain type in the etcd database. Labels are used to organize and select a subset of objects. For example, you can label clusters based on the region they are located in:

Imgur

So later on you can easily select the subset of clusters based on their location kubectl get cluster –selector Region=NorthAmerica

Annotations is another way of attaching non-identifying metadata to the object. Widely used by custom controllers and tools, annotations is a way of storing controller specific details on the object. For example, Rancher stores creator id information as an annotation on the cluster, so custom controllers relying on the cluster ownership can easily extract this information.

OwnerReferences and Finalizers are internal fields not available via User APIs, but highly used by the Kubernetes controllers. OwnerReference is used to link parent and child object together, and it enbables a cascading dependents deletion. Finalizer defines pre-deletion hook postponing object removal from etcd database till underlying controller is done with the cleanup job.

Now off to the Spec – a field defining the desired resource state. For cluster, that would be: how many nodes you want in the cluster, what roles these nodes have to play (worker, etcd, control), k8s version, addons information, etc. Think of it as user defined cluster requirements. This field is certainly visible via API, and advisable to be modified only via API – controllers should avoid updating it. (The explanation for why is in the next section.)

Status in turn is the field set and modified by the controllers to reflect the actual state of the object. In the cluster example, it would carry the information about applied spec, cpu/memory statistics and cluster conditions. Each condition describes the status of the object reported by a corresponding controller. Cluster being an essential object with more than one controller acting on it, results in more than one condition attached to it. Here are a couple of self descriptive ones having Value=True indicating that the condition is met:

Imgur

Imgur

Such fine grained control is great from internal controllers stand point as each controller can operate based on its own condition. But as a user you might not care about each particular condition value. On Rancher API level, we have State field aggregating the condition values, and going into Active only when all conditions are met.

Controller example

We’ve mentioned a controller several times; so what is its definition? The most common one is: “Code that brings current state of the system to the desired state”. You can write custom controllers that handle default Kubernetes objects, such as Deployment, Pod, or write a controller managing the CRD resource. Each CRD in Rancher has one or more controllers operating on it, each running as a separate Go routine. Lets look at this diagram, where cluster is the resource, and provisioner and cluster health monitor are 2 controllers operating on it:

Imgur

In a few words, controller:

  • Watches for the resource changes
  • Executes some custom logic based on the resource spec or/and status
  • Updates the resource status with the result

When it comes to Kubernetes resources management, there are several code patterns followed by the Kubernetes open-source community developing controllers in the Go programming language, most of them involving use of the client-Go library (https://github.com/kubernetes/client-go ). The library has nice utilities like Informer/SharedInformer/TaskQueue making it easy to watch and react on resource changes as well as maintaining in-memory cache to minimize number of direct calls to the API server.

The Rancher framework extends client-Go functionality to save the user from writing custom code for managing generic things like finalizers and condition updates for the objects by introducing Object Lifecycle and Conditions management frameworks; it also adds better abstraction for SharedInformer/TaskQueue bundle using GenericController.

Controller scope – Management vs User context

So far we’ve been giving examples using cluster resource – something that represents a user created cluster. Once cluster is provisioned, user can start creating resources like Deployments, Pods, Services. of course, this assums the user is allowed to operate in the cluster – permissions enforced by user project’s RBAC. As we’ve mentioned earlier, the majority of Rancher logic runs as Kubernetes controllers. Only some controllers monitor and manage Rancher management CRDs residing on a management plane etcd, and other do the same, but for the user clusters, and their respective etcds. It brings up another interesting point about Rancher architecture – Management vs User context:

Imgur

As soon as user a cluster gets provisioned, Rancher generates the context allowing access to the user cluster API, and launches several controllers monitoring resources of those clusters. The underlying user cluster controllers mechanism is the same as for management controllers (same third party libraries are used, same watch->process->update mechanism is applied, etc), and the only difference is the API endpoint the controllers are talking to. In management controllers case it’s Rancher API/etcd; in user controller case – user cluster API/etcd.

The similarity of the approach taken when working with resources in user clusters vs resources on the management side is the best justification for Rancher being 100% built on Kubernetes. As a developer, I highly appreciate the model as I don’t have to change the context drastically when switching from developing a feature for the management plane vs for the user. Fully embracing not only Kubernetes as a container orchestrator, but as a development platform helped us to understand the project better, develop features faster and innovate more.

If you want to learn more about Rancher architecture, stay tuned

This article gives a very high level overview of Rancher architecture with the main focus on CRDs. In the next set of articles we will be talking more about custom controllers and best practices based on our experience building Rancher 2.0.

Alena Prokharchyk

Alena Prokharchyk

Software Engineer

Source

CoreDNS GA for Kubernetes Cluster DNS

Author: John Belamaric (Infoblox)

Editor’s note: this post is part of a series of in-depth articles on what’s new in Kubernetes 1.11

Introduction

In Kubernetes 1.11, CoreDNS has reached General Availability (GA) for DNS-based service discovery, as an alternative to the kube-dns addon. This means that CoreDNS will be offered as an option in upcoming versions of the various installation tools. In fact, the kubeadm team chose to make it the default option starting with Kubernetes 1.11.

DNS-based service discovery has been part of Kubernetes for a long time with the kube-dns cluster addon. This has generally worked pretty well, but there have been some concerns around the reliability, flexibility and security of the implementation.

CoreDNS is a general-purpose, authoritative DNS server that provides a backwards-compatible, but extensible, integration with Kubernetes. It resolves the issues seen with kube-dns, and offers a number of unique features that solve a wider variety of use cases.

In this article, you will learn about the differences in the implementations of kube-dns and CoreDNS, and some of the helpful extensions offered by CoreDNS.

We appreciate your feedback

We are conducting a survey to evaluate the adoption of CoreDNS as the DNS for Kubernetes’s cluster.
If you are currently using CoreDNS inside a Kubernetes cluster, please, take 5 minutes to provide us some feedback by filling this survey.

Thank you, we appreciate your collaboration here.

Implementation differences

In kube-dns, several containers are used within a single pod: kubedns, dnsmasq, and sidecar. The kubedns
container watches the Kubernetes API and serves DNS records based on the Kubernetes DNS specification, dnsmasq provides caching and stub domain support, and sidecar provides metrics and health checks.

This setup leads to a few issues that have been seen over time. For one, security vulnerabilities in dnsmasq have led to the need
for a security-patch release of Kubernetes in the past. Additionally, because dnsmasq handles the stub domains,
but kubedns handles the External Services, you cannot use a stub domain in an external service, which is very
limiting to that functionality (see dns#131).

All of these functions are done in a single container in CoreDNS, which is running a process written in Go. The
different plugins that are enabled replicate (and enhance) the functionality found in kube-dns.

Configuring CoreDNS

In kube-dns, you can modify a ConfigMap to change the behavior of your service discovery. This allows the addition of
features such as serving stub domains, modifying upstream nameservers, and enabling federation.

In CoreDNS, you similarly can modify the ConfigMap for the CoreDNS Corefile to change how service discovery
works. This Corefile configuration offers many more options than you will find in kube-dns, since it is the
primary configuration file that CoreDNS uses for configuration of all of its features, even those that are not
Kubernetes related.

When upgrading from kube-dns to CoreDNS using kubeadm, your existing ConfigMap will be used to generate the
customized Corefile for you, including all of the configuration for stub domains, federation, and upstream nameservers. See Using CoreDNS for Service Discovery for more details.

Bug fixes and enhancements

There are several open issues with kube-dns that are resolved in CoreDNS, either in default configuration or with some customized configurations.

Metrics

The functional behavior of the default CoreDNS configuration is the same as kube-dns. However,
one difference you need to be aware of is that the published metrics are not the same. In kube-dns,
you get separate metrics for dnsmasq and kubedns (skydns). In CoreDNS there is a completely
different set of metrics, since it is all a single process. You can find more details on these
metrics on the CoreDNS Prometheus plugin page.

Some special features

The standard CoreDNS Kubernetes configuration is designed to be backwards compatible with the prior
kube-dns behavior. But with some configuration changes, CoreDNS can allow you to modify how the
DNS service discovery works in your cluster. A number of these features are intended to still be
compliant with the Kubernetes DNS specification;
they enhance functionality but remain backward compatible. Since CoreDNS is not
only made for Kubernetes, but is instead a general-purpose DNS server, there are many things you
can do beyond that specification.

Pods verified mode

In kube-dns, pod name records are “fake”. That is, any “a-b-c-d.namespace.pod.cluster.local” query will
return the IP address “a.b.c.d”. In some cases, this can weaken the identity guarantees offered by TLS. So,
CoreDNS offers a “pods verified” mode, which will only return the IP address if there is a pod in the
specified namespace with that IP address.

Endpoint names based on pod names

In kube-dns, when using a headless service, you can use an SRV request to get a list of
all endpoints for the service:

dnstools# host -t srv headless
headless.default.svc.cluster.local has SRV record 10 33 0 6234396237313665.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 10 33 0 6662363165353239.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 10 33 0 6338633437303230.headless.default.svc.cluster.local.
dnstools#

However, the endpoint DNS names are (for practical purposes) random. In CoreDNS, by default, you get endpoint
DNS names based upon the endpoint IP address:

dnstools# host -t srv headless
headless.default.svc.cluster.local has SRV record 0 25 443 172-17-0-14.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 0 25 443 172-17-0-18.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 0 25 443 172-17-0-4.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 0 25 443 172-17-0-9.headless.default.svc.cluster.local.

For some applications, it is desirable to have the pod name for this, rather than the pod IP
address (see for example kubernetes#47992 and coredns#1190). To enable this in CoreDNS, you specify the “endpoint_pod_names” option in your Corefile, which results in this:

dnstools# host -t srv headless
headless.default.svc.cluster.local has SRV record 0 25 443 headless-65bb4c479f-qv84p.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 0 25 443 headless-65bb4c479f-zc8lx.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 0 25 443 headless-65bb4c479f-q7lf2.headless.default.svc.cluster.local.
headless.default.svc.cluster.local has SRV record 0 25 443 headless-65bb4c479f-566rt.headless.default.svc.cluster.local.

Autopath

CoreDNS also has a special feature to improve latency in DNS requests for external names. In Kubernetes, the
DNS search path for pods specifies a long list of suffixes. This enables the use of short names when requesting
services in the cluster – for example, “headless” above, rather than “headless.default.svc.cluster.local”. However,
when requesting an external name – “infoblox.com”, for example – several invalid DNS queries are made by the client,
requiring a roundtrip from the client to kube-dns each time (actually to dnsmasq and then to kubedns, since negative caching is disabled):

  • infoblox.com.default.svc.cluster.local -> NXDOMAIN
  • infoblox.com.svc.cluster.local -> NXDOMAIN
  • infoblox.com.cluster.local -> NXDOMAIN
  • infoblox.com.your-internal-domain.com -> NXDOMAIN
  • infoblox.com -> returns a valid record

In CoreDNS, an optional feature called autopath can be enabled that will cause this search path to be followed
in the server. That is, CoreDNS will figure out from the source IP address which namespace the client pod is in,
and it will walk this search list until it gets a valid answer. Since the first 3 of these are resolved internally
within CoreDNS itself, it cuts out all of the back and forth between the client and server, reducing latency.

A few other Kubernetes specific features

In CoreDNS, you can use standard DNS zone transfer to export the entire DNS record set. This is useful for
debugging your services as well as importing the cluster zone into other DNS servers.

You can also filter by namespaces or a label selector. This can allow you to run specific CoreDNS instances that will only server records that match the filters, exposing only a limited set of your services via DNS.

Extensibility

In addition to the features described above, CoreDNS is easily extended. It is possible to build custom versions
of CoreDNS that include your own features. For example, this ability has been used to extend CoreDNS to do recursive resolution
with the unbound plugin, to server records directly from a database with the pdsql plugin, and to allow multiple CoreDNS instances to share a common level 2 cache with the redisc plugin.

Many other interesting extensions have been added, which you will find on the External Plugins page of the CoreDNS site. One that is really interesting for Kubernetes and Istio users is the kubernetai plugin, which allows a single CoreDNS instance to connect to multiple Kubernetes clusters and provide service discovery across all of them.

What’s Next?

CoreDNS is an independent project, and as such is developing many features that are not directly
related to Kubernetes. However, a number of these will have applications within Kubernetes. For example,
the upcoming integration with policy engines will allow CoreDNS to make intelligent choices about which endpoint
to return when a headless service is requested. This could be used to route traffic to a local pod, or
to a more responsive pod. Many other features are in development, and of course as an open source project, we welcome you to suggest and contribute your own features!

The features and differences described above are a few examples. There is much more you can do with CoreDNS.
You can find out more on the CoreDNS Blog.

Get involved with CoreDNS

CoreDNS is an incubated CNCF project.

We’re most active on Slack (and Github):

More resources can be found:

Source

Helm Tips and Tricks: Updating an App that uses ConfigMap

Take a deep dive into Best Practices in Kubernetes Networking

From overlay networking and SSL to ingress controllers and network security policies, we’ve seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.

Watch the video

This February I got a chance to attend the Helm Summit in Portland. It was filled with amazing talks covering various topics about Helm. I attended almost all of them, and also got the opportunity to present a Lightning Talk myself. The talk was about a Helm trick/tip that I came across and found very useful, so I thought I’d share with you all as an article.

Helm lets us manage Kubernetes applications effortlessly. Helm Charts make installation and upgradation of Kubernetes applications easier. In this article, I want to focus on one out of the many benefits of using Helm: How Helm makes updating an app that uses ConfigMap very easy.

Updating a deployment

Updating a Deployment

This is a sample manifest of a deployment. Let’s assume this is being used to run an app. You specify the pod template under spec.template section. Now if you want to update the app, it is a must that something from this spec.template section changes. Meaning, for a change in container image, the deployment will get updated, but not for a change in spec.replicas.

Updating a deployment that uses a ConfigMap

Some apps require certain configuration files and values. It’s not recommended to have these configuration values/files baked into the container image. This is so, because otherwise every time your configuration file changes, you’d have to recreate the container image. Kubernetes provides a great way of managing configuration files/values, with the ConfigMap resource.

There are 2 ways to expose ConfigMap data to a pod,

  1. Env vars
  2. Volume mounts

We’re going to focus on the volume mounts way of exposing ConfigMap.

I’ve created a very simple chart to use as an example to go over this. Within that chart I have the following manifest for a ConfigMap:

Manifest for a ConfigMap

As you can see, the name of the ConfigMap is nginx-cm, and a file called default.conf is reading its data. This default.conf is an nginx configuration file.

Nginx Configuration File

Now I want to use this ConfigMap nginx-cm for my app. So I’m going to expose it via Volume Mounts in the deployment manifest for my app.

Volume Mounts

As shown in the manifest above, we need to add the ConfigMap under Volumes section, and give it a unique name (config-volume as shown in the example manifest). Then, we add this volume under volume mounts in the containers section. The volumeMounts.mountPath field is the exact location in the container where the configuration file will be made available to the container.

So using these manifests, we can have an app running that uses the content of nginx configuration file made available by the ConfigMap.

Now let’s say it’s time to change the nginx configuration file. Changes to this config file should be followed by an update to the ConfigMap too, otherwise our app that uses the ConfigMap won’t use the new content.

We can for sure use the kubectl update command for updating the ConfigMap. This should be followed by an update to the deployment as well. Will kubectl update work for the deployment too?

I got this message when I tried.

Deployment Unchanged

This is because the deployment’s spec.template portion has not changed even after updating the ConfigMap resource. Even though the ConfigMap’s data section changed, that didn’t cause any changes in the deployment’s spec.template
One workaround for this is to delete all the pods being managed by the deployment, and then the deployment will create new pods that use the updated configMap. But I didn’t quite like this approach, as you have to delete all the pods manually. So I started looking for better solutions. That’s when I came across this Helm trick.
https://github.com/kubernetes/helm/blob/master/docs/charts_tips_and_tricks.md#automatically-roll-deployments-when-configmaps-or-secrets-change

As you can see under annotations, you can provide path to your configmap file, and pass it to sha256 sum function. This updates the annotation section everytime the configmap file changes, thus in turn updating spec.template portion of the deployment. I found this very helpful, because your configuration file contents can keep on changing quite frequently. So because of this trick, Helm ensures that your app will keep reflecting those changes promptly.

I have created this sample Helm chart to show how this trick can be implemented.

Do check it out and hopefully you’ll find it useful for your apps too! Also feel free to add more to it, and maybe share any such tips/tricks that you have come across as well!

Rajashree Mandaogane

Rajashree Mandaogane

Software Engineer

Source

Dynamic Kubelet Configuration – Kubernetes

Author: Michael Taufen (Google)

Editor’s note: this post is part of a series of in-depth articles on what’s new in Kubernetes 1.11

Why Dynamic Kubelet Configuration?

Kubernetes provides API-centric tooling that significantly improves workflows for managing applications and infrastructure. Most Kubernetes installations, however, run the Kubelet as a native process on each host, outside the scope of standard Kubernetes APIs.

In the past, this meant that cluster administrators and service providers could not rely on Kubernetes APIs to reconfigure Kubelets in a live cluster. In practice, this required operators to either ssh into machines to perform manual reconfigurations, use third-party configuration management automation tools, or create new VMs with the desired configuration already installed, then migrate work to the new machines. These approaches are environment-specific and can be expensive.

Dynamic Kubelet configuration gives cluster administrators and service providers the ability to reconfigure Kubelets in a live cluster via Kubernetes APIs.

What is Dynamic Kubelet Configuration?

Kubernetes v1.10 made it possible to configure the Kubelet via a beta config file API. Kubernetes already provides the ConfigMap abstraction for storing arbitrary file data in the API server.

Dynamic Kubelet configuration extends the Node object so that a Node can refer to a ConfigMap that contains the same type of config file. When a Node is updated to refer to a new ConfigMap, the associated Kubelet will attempt to use the new configuration.

How does it work?

Dynamic Kubelet configuration provides the following core features:

  • Kubelet attempts to use the dynamically assigned configuration.
  • Kubelet “checkpoints” configuration to local disk, enabling restarts without API server access.
  • Kubelet reports assigned, active, and last-known-good configuration sources in the Node status.
  • When invalid configuration is dynamically assigned, Kubelet automatically falls back to a last-known-good configuration and reports errors in the Node status.

To use the dynamic Kubelet configuration feature, a cluster administrator or service provider will first post a ConfigMap containing the desired configuration, then set each Node.Spec.ConfigSource.ConfigMap reference to refer to the new ConfigMap. Operators can update these references at their preferred rate, giving them the ability to perform controlled rollouts of new configurations.

Each Kubelet watches its associated Node object for changes. When the Node.Spec.ConfigSource.ConfigMap reference is updated, the Kubelet will “checkpoint” the new ConfigMap by writing the files it contains to local disk. The Kubelet will then exit, and the OS-level process manager will restart it. Note that if the Node.Spec.ConfigSource.ConfigMap reference is not set, the Kubelet uses the set of flags and config files local to the machine it is running on.

Once restarted, the Kubelet will attempt to use the configuration from the new checkpoint. If the new configuration passes the Kubelet’s internal validation, the Kubelet will update Node.Status.Config to reflect that it is using the new configuration. If the new configuration is invalid, the Kubelet will fall back to its last-known-good configuration and report an error in Node.Status.Config.

Note that the default last-known-good configuration is the combination of Kubelet command-line flags with the Kubelet’s local configuration file. Command-line flags that overlap with the config file always take precedence over both the local configuration file and dynamic configurations, for backwards-compatibility.

See the following diagram for a high-level overview of a configuration update for a single Node:

kubelet-diagram

How can I learn more?

Please see the official tutorial at https://kubernetes.io/docs/tasks/administer-cluster/reconfigure-kubelet/, which contains more in-depth details on user workflow, how a configuration becomes “last-known-good,” how the Kubelet “checkpoints” config, and possible failure modes.

Source

High Availability and Services with Kubernetes // Jetstack Blog

By Matt Bates

In our previous blog, Getting Started with a Local Deployment, we deployed an Nginx pod to a standalone (single-node) Kubernetes cluster. This pod was bound to a specified node. If the pod were to fail unexpectedly, Kubernetes (specifically, the Kubelet service) would restart the pod. By default, pods have an ‘Always’ restart policy, but only to the node that it is first bound; it will not be rebound to another node. This means of course that if the node fails then pods will not be rescheduled elsewhere.

It is this very reason that Kubernetes has a higher-level controller – a replication controller – responsible for maintaining pod health across nodes. A replication controller will take a desired state, in this case a number of pod ‘replicas’, and ensure this state exists in the cluster at all times. So if a pod fails (e.g. a node fails or is being maintained), the controller kicks in and starts a pod elsewhere in the cluster. If too many replicas exist for some reason, it kills pods.

Just like pods, the desired state is specified using a declarative YAML (or JSON) configuration. The replication controller is often described as a ‘cookie cutter’; it takes a pod template, as part of this configuration, which includes the type and specification of a pod, and cuts cookies (pods) as needed. But it differs to a pod in so much that this configuration is higher-level, it does not deal with the specific semantics of a pod.

Now we’re dealing with HA of pods, we need to move to a Kubernetes with multiple nodes. The standalone Kubernetes cluster in Part II was a single node – lightweight and ideal for kicking the tyres but we now need more nodes, as would be the case in a production deployment, to see replication controllers and services more typically in action. Virtual machines would be ideal.

There are many ways to deploy Kubernetes to virtual machines, on a variety of OSes with different provisioning methods and networking plug-ins. For now, we will use the ready-made Kubernetes Vagrant deployment – it’s just a few lines at the shell to get started:

export KUBERNETES_PROVIDER=vagrant
export NUM_MINIONS=2
curl -sS https://get.k8s.io | bash

Alternatively, a pre-built Kubernetes release may be downloaded here and started using a script.

cd kubernetes
export KUBERNETES_PROVIDER=vagrant
export NUM_MINIONS=2
./cluster/kube-up.sh

Note with both methods, two Kubernetes nodes (they were previously called minions) are provisioned, with a default 1GB of memory per node, so ensure you have adequate RAM. There are more detailed instructions at the documentation here, including how to use a different virtualisation providers (e.g. VMWare) and tweak a number of settings.

Assuming the VMs started successfully, we can now use kubectl to see the status of the nodes:

kubectl get nodes
NAME LABELS STATUS
10.245.1.3 kubernetes.io/hostname=10.245.1.3 Ready
10.245.1.4 kubernetes.io/hostname=10.245.1.4 Ready

So with this small multi-node cluster, let’s add a replication controller that will ensure that two Nginx pod replicas are maintained, even in case of node failure.

apiVersion: v1
kind: ReplicationController
metadata:
name: nginx-controller
spec:
replicas: 2
selector:
name: nginx
template:
metadata:
labels:
name: nginx
spec:
containers:
– name: nginx
image: nginx
ports:
– containerPort: 80
kubectl create -f nginx-rc.yml

In this replication controller specification, the number of replicas (2) is specified, as well as a selector that informs Kubernetes of the pods it should manage. In this example, all pods with the ‘name’ ‘nginx’ can be used to fulfil the desired state. The selector effectively acts as a query on resource labels across the cluster. The controller then ‘cookie-cuts’ pods as needed with this ‘name’, as per the pod template.

Use Kubectl to check the replication controller was created and its status:

kubectl get replicationcontrollers
# (or the shorter kubectl get rc)

CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS
nginx-controller nginx nginx name=nginx 2

Kubectl should also show the running pods under management of the controller (edited to just show these pods for brevity).

kubectl get pods
# (or the shorter kubectl get po)

NAME READY REASON RESTARTS AGE
nginx-controller-3qyv2 1/1 Running 0 51s
nginx-controller-y4nsj 1/1 Running 0 51s

As described in the previous blog, each pod is assigned an IP address. The controller in this example maintains two pods, both with unique IP addresses.. These addresses are routable in the cluster (that is, from one of the VMs).

With the controller now in operation, pods will be rescheduled on failure – but there is no guarantee that they will have the same IP addresses as before. It’s more dynamic: pods can come and go, and they can be scaled up and down as more or less replicas are requested.

Kubernetes to the rescue again! It solves this problem with another neat abstraction: Services. A service exposes a set of pods to a single network address that remains fixed and stable for the lifetime of the service. The group of pods targeted is specified using a label selector.

In our deployment of Nginx pods, as they are all replicas managed by the replication controller, a frontend application may not care which pod it accesses. The service acts as a proxy, and can be load balanced (as we’ll see), so the frontend and backend are neatly decoupled – adding or removing pods, either manual or automated, is completely transparent to the service consumer. Let’s see this in action and create an Nginx service to direct HTTP traffic to the pods:

kind: Service
apiVersion: v1
metadata:
name: web-service
spec:
selector:
name: nginx
ports:
– protocol: TCP
port: 80
targetPort: 80
kubectl create -f nginx-service.yml

Next, list the services using kubectl and inspect the output:

kubectl get services
# (or the shorter kubectl get se)

NAME LABELS SELECTOR IP(S) PORT(S)
web-service <none> name=nginx 10.247.122.216 80/TCP

The ‘web-service’ has been successfully created; it interfaces all pods that match the selector ‘name=nginx’ and exposes to the IP 10.247.122.216. This IP is virtual and requests are transparently proxied and load balanced across the matching Nginx container pods. This single network interface enables non-Kubernetes services to consume services managed by Kubernetes in a standard way, without being tightly coupled to the API.

How does the virtual IP (VIP) work? The Kubernetes proxy that runs on each node watches for changes to Services and Endpoints via the API. On creation of the Nginx service, the proxy sets up iptables rules to forward requests made to the VIP and port (HTTP 80) to a local port (randomly chosen) to the right backend pod(s).

As the VIP is cluster-only, use Vagrant to SSH to one of the nodes (or minions, as they were previously referred).

vagrant ssh minion-1
curl -qa http://10.247.122.216

Voila – a load balanced, self-healing Nginx service, backed by two replica pods, accessible by IP.

It is not necessary to only use a VIP. The IP could also be real (i.e. routable), provided by the network or an overlay. In future posts, we’ll take a close look at Kubernetes’ approach to networking and explain how to plug-in overlay networks, such as Weave, Flannel and Project Calico, for inter- and intra-pod networking. The Vagrant Kubernetes cluster in this post uses Open vSwitch.

In this blog, we have introduced Kubernetes replication controllers and services. These powerful abstractions enable us to describe to Kubernetes the desired state of containerised applications, and for Kubernetes to actively maintain this state, auto-healing and scaling on request. Services provide network accessible interfaces to sets of pods, all transparently proxied and load balanced to service consumers inside and outside the cluster.

Stay tuned for the next blog in which we look at how Kubernetes supports data volumes, with a variety of volume types.

To keep up with Jetstack news, and to get early access to products, subscribe using the sign-up on our home page.

Source

Rancher v2.0.6 Released — the Version of Rancher

Expert Training in Kubernetes and Rancher

Join our free online training sessions to learn how to manage Kubernetes workloads with Rancher.

Sign up here

Rancher new release

We are proud to announce that we have released a new version of Rancher, Rancher v2.0.6.

This version provides some nice enhancements and new features from the previous stable v2.0.4. These are some of the new features:

  • Nested group membership authentication option for Active Directory and OpenLDAP directory services, which provides permissions to users that are in groups of groups. This feature is disabled by default; to activate it, please review the “Customize Schema” section and select the “Search direct and nested group memberships” option. Enabling nested group membership could cause slower search results/login, due to extensive user search.
  • Support for opensource directory services, OpenLDAP and FreeIPA, as authentication providers and can be selected as an option. You can now integrate Rancher with OpenLDAP or FreeIPA to manage and control user authentication and authorization on K8s clusters and projects.
  • Cordon/Uncordon K8s cluster nodes. At some point in time in a K8s cluster lifecycle, users may need to execute some maintance tasks (such as patch, update, reboot,…) on K8s nodes. With this new feature, you could temporarily cordon nodes so that no K8s pods will be able to be scheduled on them. Once K8s nodes are ready again, you can uncordon them to make them work normally.
  • Azure AKS advanced networking options. Users now have the ability to select a different resource group for Azure AKS advanced networking.
  • Support Infrastructure as Code (IaC) at rancher CLI. Two new commands have been added to it to facilitate automation, deployment and reproductability of K8s clusters running on Rancher.
    • rancher cluster export – allows the export of a K8s cluster definition as a YAML file.
    • rancher up – allows the import of a YAML file for cluster creation within Rancher.

New Docker images are released, tagged and published on Docker Hub. The Rancher server can be pulled as rancher/rancher with one of following tags, v2.0.6 , latest or stable.

Note: If you are upgrading from a previous Rancher v2.0.x version, please review our Github Rancher v2.0.6 release info to check for known issues, upgrades and rollback notes.

Raul Sanchez

Raul Sanchez, DevOps Lead

Source