{"id":3487,"date":"2018-11-16T14:46:58","date_gmt":"2018-11-16T14:46:58","guid":{"rendered":"https:\/\/www.appservgrid.com\/paw92\/?p=3487"},"modified":"2018-11-17T15:38:13","modified_gmt":"2018-11-17T15:38:13","slug":"foss-project-spotlight-bluek8s-linux-journal","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/11\/16\/foss-project-spotlight-bluek8s-linux-journal\/","title":{"rendered":"FOSS Project Spotlight: BlueK8s | Linux Journal"},"content":{"rendered":"<p><em>Deploying and managing complex stateful applications on Kubernetes.<\/em><\/p>\n<p><a href=\"https:\/\/kubernetes.io\">Kubernetes<\/a> (aka K8s) is now the de facto container orchestration<br \/>\nframework. Like other popular open-source technologies, Kubernetes has<br \/>\namassed a considerable ecosystem of complementary tools to address<br \/>\neverything from storage to security. And although it was first created for<br \/>\nrunning <a href=\"https:\/\/whatis.techtarget.com\/definition\/stateless-app\">stateless applications<\/a>, more and more organizations are<br \/>\ninterested in using Kubernetes for <a href=\"https:\/\/whatis.techtarget.com\/definition\/stateful-app\">stateful applications<\/a>.<\/p>\n<p>However, while Kubernetes has advanced significantly in many areas during the past couple years, there still are considerable gaps when it comes to<br \/>\nrunning complex stateful applications. It remains challenging to deploy<br \/>\nand manage distributed stateful applications consisting of a multitude of<br \/>\nco-operating services (such as for use cases with large-scale analytics and<br \/>\nmachine learning) with Kubernetes.<\/p>\n<p>I&#8217;ve been focused on this space for the past several years as a<br \/>\nco-founder of <a href=\"https:\/\/www.bluedata.com\">BlueData<\/a>. During that time, I&#8217;ve worked with many teams<br \/>\nat Global 2000 enterprises in several industries to deploy<br \/>\ndistributed stateful services successfully, such as Hadoop, Spark, Kafka, Cassandra, TensorFlow and other analytics, data science, machine learning (ML) and deep learning (DL) tools in containerized environments.<\/p>\n<p>In that time, I&#8217;ve learned what it takes to deploy complex stateful<br \/>\napplications like these with containers while ensuring enterprise-grade<br \/>\nsecurity, reliability and performance. Together with my colleagues at<br \/>\nBlueData, we&#8217;ve broken new ground in using Docker containers for big<br \/>\ndata analytics, data science and ML\/DL in highly distributed<br \/>\nenvironments. We&#8217;ve developed new innovations to address<br \/>\nrequirements in areas like storage, security, networking, performance and<br \/>\nlifecycle management.<\/p>\n<p>Now we want to bring those innovations to the Open Source community\u2014to ensure that these stateful services are supported in the Kubernetes<br \/>\necosystem. BlueData&#8217;s engineering team has been busy working with<br \/>\nKubernetes, <a href=\"https:\/\/www.bluedata.com\/blog\/2017\/12\/big-data-container-orchestration-kubernetes-k8s\">developing prototypes<\/a> with Kubernetes in our labs and<br \/>\ncollaborating with multiple enterprise organizations to evaluate the<br \/>\nopportunities (and challenges) in using Kubernetes for complex stateful<br \/>\napplications.<\/p>\n<p>To that end, we recently <a href=\"https:\/\/www.bluedata.com\/article\/bluek8s-and-kubernetes-director-for-stateful-applications\">introduced\u00a0<\/a>a new Kubernetes open-source<br \/>\ninitiative: BlueK8s. The BlueK8s initiative will be composed of several<br \/>\nopen-source projects that each will bring enterprise-level capabilities for<br \/>\nstateful applications to Kubernetes.<\/p>\n<p>Kubernetes Director (or KubeDirector for short) is the first open-source project in this initiative. KubeDirector is a custom controller<br \/>\ndesigned to simplify and streamline the packaging, deployment and<br \/>\nmanagement of complex distributed stateful applications for big data<br \/>\nanalytics and AI\/ML\/DL use cases.<\/p>\n<p>Of course, other existing open-source projects address<br \/>\nvarious requirements for both stateful and stateless applications. The<br \/>\nKubernetes <a href=\"https:\/\/coreos.com\/operators\">Operator<\/a> framework, for instance, manages the lifecycle of a<br \/>\nparticular application, providing a useful resource for building and<br \/>\ndeploying application-specific Operators. This is achieved through the<br \/>\ncreation of a simple finite state machine, commonly known as a<br \/>\nreconciliation loop:<\/p>\n<ul>\n<li><em>Observe<\/em>: determine the current state of the application.<\/li>\n<li><em>Analyze<\/em>: compare the current state of the application with the expected<br \/>\nstate of the application.<\/li>\n<li><em>Act<\/em>: take the necessary steps to make the running state of the<br \/>\napplication match its expected state.<\/li>\n<\/ul>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.linuxjournal.com\/sites\/default\/files\/styles\/max_650x650\/public\/u%5Buid%5D\/12581f1.png\" alt=\"&quot;&quot;\" width=\"300\" height=\"278\" \/><\/p>\n<p><em>Figure 1. Reconciliation Loop<\/em><\/p>\n<p>It&#8217;s pretty straightforward to use a Kubernetes Operator to manage a<br \/>\ncloud native stateless application, but that&#8217;s not the case for all<br \/>\napplications. Most applications for big data analytics, data science and<br \/>\nAI\/ML\/DL are not implemented in a cloud native architecture. And, these<br \/>\napplications often are stateful. In addition, a distributed data pipeline<br \/>\ngenerally consists of a variety of different services that all have<br \/>\ndifferent characteristics and configuration requirements.<\/p>\n<p>As a result, you can&#8217;t easily decompose these applications into<br \/>\nself-sufficient and containerizable microservices. And, these applications<br \/>\nare often a mishmash of tightly integrated processes with complex<br \/>\ninterdependencies, whose state is distributed across multiple configuration<br \/>\nfiles. So it&#8217;d be challenging to create, deploy and integrate an<br \/>\napplication-specific Operator for each possible configuration.<\/p>\n<p>The KubeDirector project is aimed at solving this very problem. Built upon<br \/>\nthe Kubernetes <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/extend-kubernetes\/api-extension\/custom-resources\">custom resource definition<\/a> (CRD) framework, KubeDirector<br \/>\ndoes the following:<\/p>\n<ul>\n<li>It employs the native Kubernetes API extensions, design philosophy and<br \/>\nauthentication.<\/li>\n<li>It requires a minimal learning curve for any developers that have experience<br \/>\nwith Kubernetes.<\/li>\n<li>It is not necessary to decompose an existing application to fit<br \/>\nmicroservices patterns.<\/li>\n<li>It provides native support for preserving application configuration and<br \/>\nstate.<\/li>\n<li>It follows an application-agnostic deployment pattern, reducing the time to<br \/>\nonboard stateful applications to Kubernetes.<\/li>\n<li>It is application-neutral, supporting many applications simultaneously via<br \/>\napplication-specific instructions specified in YAML format configuration<br \/>\nfiles.<\/li>\n<li>It supports the management of distributed data pipelines consisting of<br \/>\nmultiple applications, such as Spark, Kafka, Hadoop, Cassandra, TensorFlow<br \/>\nand so on, including a variety of related tools for data science,<br \/>\nML\/DL, business intelligence, ETL, analytics and visualization.<\/li>\n<\/ul>\n<p>KubeDirector makes it unnecessary to create and implement multiple<br \/>\nKubernetes Operators in order to manage a cluster composed of multiple<br \/>\ncomplex stateful applications. You simply can use KubeDirector to manage<br \/>\nthe entire cluster. All communication with KubeDirector is performed via<br \/>\nkubectl commands. The anticipated state of a cluster is submitted as a<br \/>\nrequest to the API server and stored in the Kubernetes etcd database.<br \/>\nKubeDirector will apply the necessary application-specific workflows to<br \/>\nchange the current state of the cluster into the expected state of the<br \/>\ncluster. Different workflows can be specified for each application type, as<br \/>\nillustrated in Figure 2, which shows a simple<br \/>\nexample (using KubeDirector to deploy and manage containerized Hadoop and<br \/>\nSpark application clusters).<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"https:\/\/www.linuxjournal.com\/sites\/default\/files\/styles\/max_650x650\/public\/u%5Buid%5D\/12581f2.png\" alt=\"&quot;&quot;\" width=\"650\" height=\"328\" \/><\/p>\n<p><em>Figure 2. Using KubeDirector to Deploy and Manage Containerized<br \/>\nHadoop and Spark Application Clusters<\/em><\/p>\n<p>If you&#8217;re interested, we&#8217;d love for you to join the growing<br \/>\ncommunity of KubeDirector developers, contributors and adopters. The<br \/>\ninitial pre-alpha version of KubeDirector was recently released<br \/>\nat <a href=\"https:\/\/github.com\/bluek8s\/kubedirector\">https:\/\/github.com\/bluek8s\/kubedirector<\/a>. For an architecture overview,<br \/>\nrefer to the <a href=\"https:\/\/github.com\/bluek8s\/kubedirector\/wiki\">GitHub project wiki<\/a>. You can also read more about how it<br \/>\nworks in this <a href=\"https:\/\/kubernetes.io\/blog\/2018\/10\/03\/kubedirector-the-easy-way-to-run-complex-stateful-applications-on-kubernetes\">technical blog post on the Kubernetes site<\/a>.<\/p>\n<p><a href=\"https:\/\/www.linuxjournal.com\/content\/foss-project-spotlight-bluek8s\" target=\"_blank\" rel=\"noopener\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Deploying and managing complex stateful applications on Kubernetes. Kubernetes (aka K8s) is now the de facto container orchestration framework. Like other popular open-source technologies, Kubernetes has amassed a considerable ecosystem of complementary tools to address everything from storage to security. And although it was first created for running stateless applications, more and more organizations are &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw92\/index.php\/2018\/11\/16\/foss-project-spotlight-bluek8s-linux-journal\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;FOSS Project Spotlight: BlueK8s | Linux Journal&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"class_list":["post-3487","post","type-post","status-publish","format-standard","hentry","category-linux"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/3487","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/comments?post=3487"}],"version-history":[{"count":2,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/3487\/revisions"}],"predecessor-version":[{"id":3764,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/posts\/3487\/revisions\/3764"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/media?parent=3487"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/categories?post=3487"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw92\/index.php\/wp-json\/wp\/v2\/tags?post=3487"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}