{"id":777,"date":"2018-11-12T06:15:09","date_gmt":"2018-11-12T06:15:09","guid":{"rendered":"https:\/\/www.appservgrid.com\/paw93\/?p=777"},"modified":"2018-11-18T01:19:54","modified_gmt":"2018-11-18T01:19:54","slug":"grpc-load-balancing-on-kubernetes-without-tears","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw93\/index.php\/2018\/11\/12\/grpc-load-balancing-on-kubernetes-without-tears\/","title":{"rendered":"gRPC Load Balancing on Kubernetes without Tears"},"content":{"rendered":"<h3><a href=\"https:\/\/kubernetes.io\/blog\/2018\/11\/07\/grpc-load-balancing-on-kubernetes-without-tears\/\">gRPC Load Balancing on Kubernetes without Tears<\/a><\/h3>\n<p>Many new gRPC users are surprised to find that Kubernetes\u2019s default load<br \/>\nbalancing often doesn\u2019t work out of the box with gRPC. For example, here\u2019s what<br \/>\nhappens when you take a <a href=\"https:\/\/github.com\/sourishkrout\/nodevoto\" target=\"_blank\" rel=\"noopener\">simple gRPC Node.js microservices<br \/>\napp<\/a> and deploy it on Kubernetes:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/d33wubrfki0l68.cloudfront.net\/8ce21facff302ed07c286ea5608b6a6e04ae01b5\/931f0\/images\/blog\/grpc-load-balancing-with-linkerd\/screenshot2018-11-0116-c4d86100-afc1-4a08-a01c-16da391756dd.34.36.png\" alt=\"\" \/><\/p>\n<p>While the voting service displayed here has several pods, it\u2019s clear from<br \/>\nKubernetes\u2019s CPU graphs that only one of the pods is actually doing any<br \/>\nwork\u2014because only one of the pods is receiving any traffic. Why?<\/p>\n<p>In this blog post, we describe why this happens, and how you can easily fix it<br \/>\nby adding gRPC load balancing to any Kubernetes app with<br \/>\n<a href=\"https:\/\/linkerd.io\" target=\"_blank\" rel=\"noopener\">Linkerd<\/a>, a <a href=\"https:\/\/cncf.io\" target=\"_blank\" rel=\"noopener\">CNCF<\/a> service mesh and service sidecar.<\/p>\n<p>First, let\u2019s understand why we need to do something special for gRPC.<\/p>\n<p>gRPC is an increasingly common choice for application developers. Compared to<br \/>\nalternative protocols such as JSON-over-HTTP, gRPC can provide some significant<br \/>\nbenefits, including dramatically lower (de)serialization costs, automatic type<br \/>\nchecking, formalized APIs, and less TCP management overhead.<\/p>\n<p>However, gRPC also breaks the standard connection-level load balancing,<br \/>\nincluding what\u2019s provided by Kubernetes. This is because gRPC is built on<br \/>\nHTTP\/2, and HTTP\/2 is designed to have a single long-lived TCP connection,<br \/>\nacross which all requests are <em>multiplexed<\/em>\u2014meaning multiple requests can be<br \/>\nactive on the same connection at any point in time. Normally, this is great, as<br \/>\nit reduces the overhead of connection management. However, it also means that<br \/>\n(as you might imagine) connection-level balancing isn\u2019t very useful. Once the<br \/>\nconnection is established, there\u2019s no more balancing to be done. All requests<br \/>\nwill get pinned to a single destination pod, as shown below:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/d33wubrfki0l68.cloudfront.net\/b05eb6c0d5c4672ed795cc1c44ca476987057436\/a6cda\/images\/blog\/grpc-load-balancing-with-linkerd\/mono-8d2e53ef-b133-4aa0-9551-7e36a880c553.png\" alt=\"\" \/><\/p>\n<p>The reason why this problem doesn\u2019t occur in HTTP\/1.1, which also has the<br \/>\nconcept of long-lived connections, is because HTTP\/1.1 has several features<br \/>\nthat naturally result in cycling of TCP connections. Because of this,<br \/>\nconnection-level balancing is \u201cgood enough\u201d, and for most HTTP\/1.1 apps we<br \/>\ndon\u2019t need to do anything more.<\/p>\n<p>To understand why, let\u2019s take a deeper look at HTTP\/1.1. In contrast to HTTP\/2,<br \/>\nHTTP\/1.1 cannot multiplex requests. Only one HTTP request can be active at a<br \/>\ntime per TCP connection. The client makes a request, e.g. GET \/foo, and then<br \/>\nwaits until the server responds. While that request-response cycle is<br \/>\nhappening, no other requests can be issued on that connection.<\/p>\n<p>Usually, we want lots of requests happening in parallel. Therefore, to have<br \/>\nconcurrent HTTP\/1.1 requests, we need to make multiple HTTP\/1.1 connections,<br \/>\nand issue our requests across all of them. Additionally, long-lived HTTP\/1.1<br \/>\nconnections typically expire after some time, and are torn down by the client<br \/>\n(or server). These two factors combined mean that HTTP\/1.1 requests typically<br \/>\ncycle across multiple TCP connections, and so connection-level balancing works.<\/p>\n<p>Now back to gRPC. Since we can\u2019t balance at the connection level, in order to<br \/>\ndo gRPC load balancing, we need to shift from connection balancing to <em>request<\/em><br \/>\nbalancing. In other words, we need to open an HTTP\/2 connection to each<br \/>\ndestination, and balance <em>requests<\/em> across these connections, as shown below:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/d33wubrfki0l68.cloudfront.net\/b1968b73e82d0d8af5b7d34bd8d3b027ddad9bf4\/5ec68\/images\/blog\/grpc-load-balancing-with-linkerd\/stereo-09aff9d7-1c98-4a0a-9184-9998ed83a531.png\" alt=\"\" \/><\/p>\n<p>In network terms, this means we need to make decisions at L5\/L7 rather than<br \/>\nL3\/L4, i.e. we need to understand the protocol sent over the TCP connections.<\/p>\n<p>How do we accomplish this? There are a couple options. First, our application<br \/>\ncode could manually maintain its own load balancing pool of destinations, and<br \/>\nwe could configure our gRPC client to <a href=\"https:\/\/godoc.org\/google.golang.org\/grpc\/balancer\" target=\"_blank\" rel=\"noopener\">use this load balancing<br \/>\npool<\/a>. This approach gives<br \/>\nus the most control, but it can be very complex in environments like Kubernetes<br \/>\nwhere the pool changes over time as Kubernetes reschedules pods. Our<br \/>\napplication would have to watch the Kubernetes API and keep itself up to date<br \/>\nwith the pods.<\/p>\n<p>Alternatively, in Kubernetes, we could deploy our app as <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/services-networking\/service\/#headless-services\" target=\"_blank\" rel=\"noopener\">headless<br \/>\nservices<\/a>.<br \/>\nIn this case, Kubernetes <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/services-networking\/service\/#headless-services\" target=\"_blank\" rel=\"noopener\">will create multiple A<br \/>\nrecords<\/a><br \/>\nin the DNS entry for the service. If our gRPC client is sufficiently advanced,<br \/>\nit can automatically maintain the load balancing pool from those DNS entries.<br \/>\nBut this approach restricts us to certain gRPC clients, and it\u2019s rarely<br \/>\npossible to only use headless services.<\/p>\n<p>Finally, we can take a third approach: use a lightweight proxy.<\/p>\n<p><a href=\"https:\/\/linkerd.io\" target=\"_blank\" rel=\"noopener\">Linkerd<\/a> is a <a href=\"https:\/\/cncf.io\" target=\"_blank\" rel=\"noopener\">CNCF<\/a>-hosted <em>service<br \/>\nmesh<\/em> for Kubernetes. Most relevant to our purposes, Linkerd also functions as<br \/>\na <em>service sidecar<\/em>, where it can be applied to a single service\u2014even without<br \/>\ncluster-wide permissions. What this means is that when we add Linkerd to our<br \/>\nservice, it adds a tiny, ultra-fast proxy to each pod, and these proxies watch<br \/>\nthe Kubernetes API and do gRPC load balancing automatically. Our deployment<br \/>\nthen looks like this:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/d33wubrfki0l68.cloudfront.net\/c53a0b6d4b3ab1889d2bedb8c4f59a3767da7510\/26152\/images\/blog\/grpc-load-balancing-with-linkerd\/linkerd-8df1031c-cdd1-4164-8e91-00f2d941e93f.io.png\" alt=\"\" \/><\/p>\n<p>Using Linkerd has a couple advantages. First, it works with services written in<br \/>\nany language, with any gRPC client, and any deployment model (headless or not).<br \/>\nBecause Linkerd\u2019s proxies are completely transparent, they auto-detect HTTP\/2<br \/>\nand HTTP\/1.x and do L7 load balancing, and they pass through all other traffic<br \/>\nas pure TCP. This means that everything will <em>just work.<\/em><\/p>\n<p>Second, Linkerd\u2019s load balancing is very sophisticated. Not only does Linkerd<br \/>\nmaintain a watch on the Kubernetes API and automatically update the load<br \/>\nbalancing pool as pods get rescheduled, Linkerd uses an <em>exponentially-weighted<br \/>\nmoving average<\/em> of response latencies to automatically send requests to the<br \/>\nfastest pods. If one pod is slowing down, even momentarily, Linkerd will shift<br \/>\ntraffic away from it. This can reduce end-to-end tail latencies.<\/p>\n<p>Finally, Linkerd\u2019s Rust-based proxies are incredibly fast and small. They<br \/>\nintroduce &lt;1ms of p99 latency and require &lt;10mb of RSS per pod, meaning that<br \/>\nthe impact on system performance will be negligible.<\/p>\n<p>Linkerd is very easy to try. Just follow the steps in the <a href=\"https:\/\/linkerd.io\/2\/getting-started\/\" target=\"_blank\" rel=\"noopener\">Linkerd Getting<br \/>\nStarted Instructions<\/a>\u2014install the<br \/>\nCLI on your laptop, install the control plane on your cluster, and \u201cmesh\u201d your<br \/>\nservice (inject the proxies into each pod). You\u2019ll have Linkerd running on your<br \/>\nservice in no time, and should see proper gRPC balancing immediately.<\/p>\n<p>Let\u2019s take a look at our sample voting service again, this time after<br \/>\ninstalling Linkerd:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/d33wubrfki0l68.cloudfront.net\/f34bb586a2c1ebdff4dff99d4d1e30a55668f5ae\/50a61\/images\/blog\/grpc-load-balancing-with-linkerd\/screenshot2018-11-0116-24b8ee81-144c-4eac-b73d-871bbf0ea22e.57.42.png\" alt=\"\" \/><\/p>\n<p>As we can see, the CPU graphs for all pods are active, indicating that all pods<br \/>\nare now taking traffic\u2014without having to change a line of code. Voila,<br \/>\ngRPC load balancing as if by magic!<\/p>\n<p>Linkerd also gives us built-in traffic-level dashboards, so we don\u2019t even need<br \/>\nto guess what\u2019s happening from CPU charts any more. Here\u2019s a Linkerd graph<br \/>\nthat\u2019s showing the success rate, request volume, and latency percentiles of<br \/>\neach pod:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/d33wubrfki0l68.cloudfront.net\/33bcb31355721dc181f7810475e421e9a38b68eb\/1356b\/images\/blog\/grpc-load-balancing-with-linkerd\/screenshot2018-11-0212-15ed0448-5424-4e47-9828-20032de868b5.08.38.png\" alt=\"\" \/><\/p>\n<p>We can see that each pod is getting around 5 RPS. We can also see that, while<br \/>\nwe\u2019ve solved our load balancing problem, we still have some work to do on our<br \/>\nsuccess rate for this service. (The demo app is built with an intentional<br \/>\nfailure\u2014as an exercise to the reader, see if you can figure it out by<br \/>\nusing the Linkerd dashboard!)<\/p>\n<p>If you\u2019re interested in a dead simple way to add gRPC load balancing to your<br \/>\nKubernetes services, regardless of what language it\u2019s written in, what gRPC<br \/>\nclient you\u2019re using, or how it\u2019s deployed, you can use Linkerd to add gRPC load<br \/>\nbalancing in a few commands.<\/p>\n<p>There\u2019s a lot more to Linkerd, including security, reliability, and debugging<br \/>\nand diagnostics features, but those are topics for future blog posts.<\/p>\n<p>Want to learn more? We\u2019d love to have you join our rapidly-growing community!<br \/>\nLinkerd is a <a href=\"https:\/\/cncf.io\" target=\"_blank\" rel=\"noopener\">CNCF<\/a> project, <a href=\"https:\/\/github.com\/linkerd\/linkerd2\" target=\"_blank\" rel=\"noopener\">hosted on GitHub<\/a>, and has a thriving community<br \/>\non <a href=\"https:\/\/slack.linkerd.io\" target=\"_blank\" rel=\"noopener\">Slack<\/a>, <a href=\"https:\/\/twitter.com\/linkerd\" target=\"_blank\" rel=\"noopener\">Twitter<\/a>, and the <a href=\"https:\/\/lists.cncf.io\/g\/cncf-linkerd-users\" target=\"_blank\" rel=\"noopener\">mailing lists<\/a>. Come and join the fun!<\/p>\n<p><a href=\"https:\/\/kubernetes.io\/blog\/2018\/11\/07\/grpc-load-balancing-on-kubernetes-without-tears\/\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>gRPC Load Balancing on Kubernetes without Tears Many new gRPC users are surprised to find that Kubernetes\u2019s default load balancing often doesn\u2019t work out of the box with gRPC. For example, here\u2019s what happens when you take a simple gRPC Node.js microservices app and deploy it on Kubernetes: While the voting service displayed here has &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw93\/index.php\/2018\/11\/12\/grpc-load-balancing-on-kubernetes-without-tears\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;gRPC Load Balancing on Kubernetes without Tears&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-777","post","type-post","status-publish","format-standard","hentry","category-kubernetes"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/777","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/comments?post=777"}],"version-history":[{"count":1,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/777\/revisions"}],"predecessor-version":[{"id":786,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/777\/revisions\/786"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/media?parent=777"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/categories?post=777"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/tags?post=777"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}