{"id":215,"date":"2018-10-16T03:43:52","date_gmt":"2018-10-16T03:43:52","guid":{"rendered":"http:\/\/www.appservgrid.com\/paw93\/index.php\/2018\/10\/16\/the-unexpected-kubernetes-part-2-volume-and-many-ways-of-persisting-data\/"},"modified":"2018-10-16T03:43:52","modified_gmt":"2018-10-16T03:43:52","slug":"the-unexpected-kubernetes-part-2-volume-and-many-ways-of-persisting-data","status":"publish","type":"post","link":"https:\/\/www.appservgrid.com\/paw93\/index.php\/2018\/10\/16\/the-unexpected-kubernetes-part-2-volume-and-many-ways-of-persisting-data\/","title":{"rendered":"The Unexpected Kubernetes: Part 2: Volume and Many Ways of Persisting Data"},"content":{"rendered":"<h2>Recap<\/h2>\n<p><a href=\"https:\/\/rancher.com\/blog\/2018\/2018-09-20-unexpected-kubernetes-part-1\/\">Last time we talked about PV, PVC, Storage Class and Provisioner.<\/a><\/p>\n<p>To quickly recap:<\/p>\n<ol>\n<li>\n<p>Originally PV was designed to be a piece of storage pre-allocated by administrator. Though after the introduction of Storage Class and Provisioner, users are able to dynamically provision PVs now.<\/p>\n<\/li>\n<li>\n<p>PVC is a request for a PV. When used with Storage Class, it will trigger the dynamic provisioning of a matching PV.<\/p>\n<\/li>\n<li>\n<p>PV and PVC are always one to one mapping.<\/p>\n<\/li>\n<li>\n<p>Provisioner is a plugin used to provision PV for users. It helps to remove the administrator from the critical path of creating a workload that needs persistent storage.<\/p>\n<\/li>\n<li>\n<p>Storage Class is a classification of PVs. The PV in the same Storage Class can share some properties. In most cases, while being used with a Provisioner, it can be seen as the Provisioner with predefined properties. So when users request it, it can dynamically provision PVs with those predefined properties.<\/p>\n<\/li>\n<\/ol>\n<p>But those are not the only ways to use persistent storage in Kubernetes.<\/p>\n<h5>Take a deep dive into Best Practices in Kubernetes Networking<\/h5>\n<p> From overlay networking and SSL to ingress controllers and network security policies, we&#8217;ve seen many users get hung up on Kubernetes networking challenges. In this video recording, we dive into Kubernetes networking, and discuss best practices for a wide variety of deployment options.\n <\/p>\n<p> <a href=\"https:\/\/rancher.com\/events\/2018\/kubernetes-networking-masterclass-june-online-meetup\/\" target=\"blank\">Watch the video<\/a><\/p>\n<h2>Volume<\/h2>\n<p>In the previous article, I mentioned that there is also a concept of Volume in Kubernetes. In order to differentiate Volume from Persistent Volume, people sometimes call it In-line Volume, or Ephemeral Volume.<\/p>\n<p>Let me quote <a href=\"https:\/\/kubernetes.io\/docs\/concepts\/storage\/volumes\/#background\">the definition of Volume<\/a> here:<\/p>\n<blockquote>\n<p>A Kubernetes volume \u2026 has an explicit lifetime &#8211; the same as the Pod that encloses it. Consequently, a volume outlives any Containers that run within the Pod, and data is preserved across Container restarts. Of course, when a Pod ceases to exist, the volume will cease to exist, too. Perhaps more importantly than this, Kubernetes supports many types of volumes, and a Pod can use any number of them simultaneously.<\/p>\n<p>At its core, a volume is just a directory, possibly with some data in it, which is accessible to the Containers in a Pod. How that directory comes to be, the medium that backs it, and the contents of it are determined by the particular volume type used.<\/p>\n<\/blockquote>\n<p>One important property of Volume is that it has the same lifecycle as the Pod it belongs to. It will be gone if the Pod is gone. That\u2019s different from Persistent Volume, which will continue to exist in the system until users delete it. Volume can also be used to share data between containers inside the same Pod, but this isn\u2019t the primary use case, since users normally only have one container per Pod.<\/p>\n<p>So it\u2019s easier to treat Volume as a property of Pod, instead of as a standalone object. As the definition said, it represents a directory inside the pod, and Volume type defines what\u2019s in the directory. For example, Config Map Volume type will create configuration files from the API server in the Volume directory; PVC Volume type will mount the filesystem from the corresponding PV in the directory, etc. In fact, Volume is almost the only way to use storage natively inside Pod.<\/p>\n<p>It\u2019s easy to get confused between Volume, Persistent Volume and Persistent Volume Claim. So if you can imagine that there is a data flow, it will look like this: PV -&gt; PVC -&gt; Volume. PV contains the real data, bound to PVC, which used as Volume in Pod in the end.<\/p>\n<p>However, Volume is also confusing in the sense that besides PVC, it can be backed by pretty much any type of storage supported by Kubernetes directly.<\/p>\n<p>Remember we already have Persistent Volume, which supports different kinds of storage solutions. We also have Provisioner, which supports the similar (but not exactly the same) set of solutions. And we have different types of Volume as well.<\/p>\n<p>So, how are they different? And how to choose between them?<\/p>\n<h2>Many ways of persisting data<\/h2>\n<p>Take AWS EBS for example. Let\u2019s start counting the ways of persisting data in Kubernetes.<\/p>\n<h3>Volume Way<\/h3>\n<p>awsElasticBlockStore is a Volume type.<\/p>\n<p>You can create a Pod, specify a volume as awsElasticBlockStore, specify the volumeID, then use your existing EBS volume in the Pod.<\/p>\n<p>The EBS volume must exist before you use it with Volume directly.<\/p>\n<h3>PV way<\/h3>\n<p>AWSElasticBlockStore is also a PV type.<\/p>\n<p>So you can create a PV that represents an EBS volume (assuming you have the privilege to do that), then create a PVC bound to it. Finally, use it in your Pod by specifying the PVC as a volume.<\/p>\n<p>Similar to Volume Way, EBS volume must exist before you create the PV.<\/p>\n<h3>Provisioner way<\/h3>\n<p>kubernetes.io\/aws-ebs is also a Kubernetes built-in Provisioner for EBS.<\/p>\n<p>You can create a Storage Class with Provisioner kubernetes.io\/aws-ebs, then create a PVC using the Storage Class. Kubernetes will automatically create the matching PV for you. Then you can use it in your Pod by specifying the PVC as a volume.<\/p>\n<p>In this case, you don\u2019t need to create EBS volume before you use it. The EBS Provisioner will create it for you.<\/p>\n<h3>Third-Party Way<\/h3>\n<p>All the options listed above are the built-in options of Kubernetes. There are also some third-party implementations of EBS in the format of Flexvolume driver, to help you hook it up to Kubernetes if you\u2019re not yet satisfied by any options above.<\/p>\n<p>And there are CSI drivers for the same purpose if Flexvolume doesn\u2019t work for you. (Why? More on this later.)<\/p>\n<h3>VolumeClaimTemplate Way<\/h3>\n<p>If you\u2019re using StatefulSet, congratulations! You now have one more way to use EBS volume with your workload \u2013 VolumeClaimTemplate.<\/p>\n<p>VolumeClaimTemplate is a StatefulSet spec property. It provides a way to create matching PVs and PVCs for the Pod that Statefulset created. Those PVCs will be created using Storage Class so they can be created automatically when StatefulSet is scaling up. When a StatefulSet has been scaled down, the extra PVs\/PVCs will be kept in the system. So when the StatefulSet scales up again, they will be used again for the new Pods created by Kubernetes. We will talk more on StatefulSet later.<\/p>\n<p>As an example, let\u2019s say you created a StatefulSet named www with replica 3, and a VolumeClaimTemplate named data with it. Kubernetes will create 3 Pods, named www-0, www-1, www-2 accordingly. Kubernetes will also create PVC www-data-0 for Pod www-0, www-data-1 for www-1, and www-data-2 for www-2. If you scale the StatefulSet to 5, Kubernetes will create www-3, www-data-3, www-4 and www-data-4 accordingly. Then you scale the StatefulSet down to 1, all www-1 to www-4 will be deleted, but www-data-1 to www-data-4 will remain in the system. So when you decide to scale up to 5 again, Pod www-1 to www-4 will be created, and PVC www-data-1 will still serve Pod www-1, www-data-2 for www-2, etc. That\u2019s because the identity of Pod are stable in StatefulSet. The name and relationship are predictable when using StatefulSet.<\/p>\n<p>VolumeClaimTemplate is important for the block storage solutions like EBS and Longhorn. Because those solutions are inherently ReadWriteOnce, you cannot share it between the Pods. Deployment won\u2019t work well with them if you have more than one Pod running with persistent data. So VolumeClaimTemplate provides a way for the block storage solution to scale horizontally for a Kubernetes workload.<\/p>\n<h2>How to choose between Volume, Persistent Volume and Provisioner<\/h2>\n<p>As you see, there are built-in Volume types, PV types, Provisioner types, plus external plugins using Flexvolume and\/or CSI. The most confusing part is that they just provide largely the same but also slightly different functionality.<\/p>\n<p>I thought, at least, there should be a guideline somewhere on how to choose between them.<\/p>\n<p>But I cannot find it anywhere.<\/p>\n<p>So I\u2019ve plowed through codes and documents, to bring you the comparison matrix, and the guideline that makes the most sense to me.<br \/>\nComparison of Volume, Persistent Volume and Provisioner<\/p>\n<table>\n<thead>\n<tr>\n<th>Name<\/th>\n<th>Volume<\/th>\n<th>Persistent Volume<\/th>\n<th>Provisioner<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>AWS EBS<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Azure Disk<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Azure File<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>CephFS<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Cinder<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Fiber Channel<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Flexvolume<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Flocker<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>GCE Persistent Disk<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Glusterfs<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>HostPath<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>iSCSI<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>NFS<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Photon PersistentDisk<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Portworx<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Quobyte<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Ceph RBD<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>ScaleIO<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>StorageOS<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>vsphereVolume<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>ConfigMap<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>DownwardAPI<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>EmptyDir<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Projected<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Secret<\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>Container Storage Interface(CSI)<\/td>\n<td><\/td>\n<td>\u2713<\/td>\n<td>\u2713<\/td>\n<\/tr>\n<tr>\n<td>Local<\/td>\n<td><\/td>\n<td>\u2713<\/td>\n<td><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Here I only covered the in-tree support from Kubernetes. There are some <a href=\"https:\/\/github.com\/kubernetes-incubator\/external-storage\">official out-of-tree Provisioners<\/a> you can use as well.<\/p>\n<p>As you see here, Volume, Persistent Volume and Provisioner are different in some nuanced ways.<\/p>\n<ol>\n<li>Volume supports most of the volume plugins.\n<ol>\n<li>It\u2019s the only way to connect PVC to Pod.<\/li>\n<li>It\u2019s also the only one that supports Config Map, Secret, Downward API, and Projected. All of those are closely related to the Kubernetes API server.<\/li>\n<li>And it\u2019s the only one that supports EmptyDir, which will automatically allocate and clean up a temporary volume for Pod*.<\/li>\n<\/ol>\n<\/li>\n<li>PV\u2019s supported plugins are the superset of what Provisioner supports. Because Provisioner needs to create PV before workloads can use it. However, there are a few plugins supported by PV but not supported by Provisioner, e.g. Local Volume (which is a work-in-progress).<\/li>\n<li>There are two types that Volume doesn\u2019t support. These are the two most recent feature, CSI and Local Volume. There are works-in-progress trying to bring them to Volume.<\/li>\n<\/ol>\n<p><em>* A side note about EmptyDir with PV:<\/em><\/p>\n<p><em>Back in 2015, there was <a href=\"https:\/\/github.com\/kubernetes\/kubernetes\/issues\/17354\">an issue<\/a> raised by Clayton Coleman to support EmptyDir with PV. It can be very helpful for the workloads needing persistent storage but only have local volumes available. But it didn\u2019t get much traction. Without scheduler supports, it was too hard to do it at the time. Now, in 2018, scheduler and PV node affinity support have been added for Local Volume in Kubernetes v1.11. But there is still no EmptyDir PV. And <a href=\"https:\/\/kubernetes.io\/blog\/2018\/04\/13\/local-persistent-volumes-beta\/\">Local Volume<\/a> feature is not exactly what I expected since it doesn\u2019t have the ability to create new volumes with new directories on the node. So I\u2019ve written <a href=\"https:\/\/github.com\/rancher\/local-path-provisioner\">Local Path Provisioner<\/a>, which utilized the scheduler and PV node affinity changes, to dynamically provision Host Path type PV for the workload.<\/em><\/p>\n<h2>Guideline for choosing between Volume, Persistent Volume and Provisioner<\/h2>\n<p>So which way should users choose?<\/p>\n<p>In my opinion, users should stick to one principle:<\/p>\n<p>Choose Provisioner over Persistent Volume, Persistent Volume over Volume when possible.<\/p>\n<p>To elaborate:<\/p>\n<ol>\n<li>For Config Map, Downward API, Secret or Projected, use Volume since PV doesn\u2019t support those.<\/li>\n<li>For EmptyDir, use Volume directly. Or use Host Path instead.<\/li>\n<li>For Host Path, use Volume directly in general, since it\u2019s bound to a specific node and normally homogeneous across the node.\n<ol>\n<li>If you want to have heterogeneous Host Path volumes, it didn\u2019t work until Kubernetes v1.11 due to lack of node affinity knowledge for PV. With v1.11+, you can create Host Path PV with node affinity using my <a href=\"https:\/\/github.com\/rancher\/local-path-provisioner\">Local Path Provisioner<\/a>.<\/li>\n<\/ol>\n<\/li>\n<li>For all other cases, unless you need to hook up with existing volumes (in which case you should use PV), use Provisioner instead. Some of Provisioners are not made into built-in options, but you should able to find them <a href=\"https:\/\/github.com\/kubernetes-incubator\/external-storage\">here<\/a> or at vendor\u2019s official repositories.<\/li>\n<\/ol>\n<p>The rationale behind this guideline is simple. While operating inside Kubernetes, an object (PV) is easier to manage than a property (Volume), and creating PV automatically (Provisioner) is much easier than creating it manually.<\/p>\n<p>There is an exception: if you prefer to operate storages outside of Kubernetes, it\u2019s better to stick with Volume. Though in this way, you will need to do creation\/deletion using another set of API. Also, you will lose the ability to scale storage automatically with StatefulSet due to the lack of VolumeClaimTemplate. I don\u2019t think it will be the choice for most Kubernetes users.<\/p>\n<h2>Why are there so many options to do the same thing?<\/h2>\n<p>This question was one of the first things that came to my mind when I started working with Kubernetes storage. The lack of consistent and intuitive design makes Kubernetes storage look like an afterthought. I\u2019ve tried to research the history behind those design decisions, but it\u2019s hard to find anything before 2016.<\/p>\n<p>In the end, I tend to believe those are due to a few initial design decision made very early, which may be combined with the urgent need for vendor support, resulting in Volume gets way more responsibility than it should have. In my opinion, all those built-in volume plugins duplicated with PV shouldn\u2019t be there.<\/p>\n<p>While researching the history, I realized dynamic provisioning was already an alpha feature in Kubernetes v1.2 release in early 2016. It took two release cycles to become beta, another two to become stable, which is very reasonable.<\/p>\n<p>There is also a huge ongoing effort by SIG Storage (which drives Kubernetes storage development) to <a href=\"https:\/\/github.com\/kubernetes\/community\/blob\/master\/contributors\/design-proposals\/storage\/csi-migration.md\">move Volume plugins to out of tree using Provisioner and CSI<\/a>. I think it will be a big step towards a more consistent and less complex system.<\/p>\n<p>Unfortunately, I don\u2019t think different Volume types will go away. It\u2019s kinda like the flipside of Silicon Valley\u2019s unofficial motto: move fast and break things. Sometimes, it\u2019s just too hard to fix the legacy design left by a fast-moving project. We can only live with them, work around them cautiously, and don\u2019t herald them in a wrong way.<\/p>\n<h2>What\u2019s next<\/h2>\n<p>We will talk about the mechanism to extend Kubernetes storage system in the next part of the series, namely Flexvolume and CSI. A hint: as you may have noticed already, I am not a fan of Flexvolume. And it\u2019s not storage subsystem\u2019s fault.<\/p>\n<p><em>[To be continued]<\/em><\/p>\n<p><em>[You can join the discussion <a href=\"https:\/\/medium.com\/@yasker\/the-unexpected-kubernetes-part-2-volume-and-many-ways-of-persisting-data-6d4d19eb2e2a\">here<\/a>]<\/em><\/p>\n<p> <img loading=\"lazy\" decoding=\"async\" alt=\"Sheng Yang\" height=\"100\" src=\"https:\/\/rancher.com\/img\/bio\/sheng-yang.jpg\" width=\"100\" \/><\/p>\n<p>\n Sheng Yang<br \/>\n <br \/> Principal Engineer\n <\/p>\n<p>Sheng Yang currently leads Project Longhorn in Rancher Labs, Rancher&#8217;s open source microservices-based, distributed block storage solution. He is also the author of Convoy, an open source persistent storage solution for Docker. Before Rancher Labs, he joined Citrix through the Cloud.com acquisition, where he worked on CloudStack project and CloudPlatform product. Before that, he was a kernel developer at Intel focused on KVM and Xen development. He has worked in the fields of virtualization and cloud computing for the last eleven years.<\/p>\n<p> <a href=\"https:\/\/rancher.com\/blog\/2018\/2018-10-11-unexpected-kubernetes-part-2\/\" target=\"_blank\">Source<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recap Last time we talked about PV, PVC, Storage Class and Provisioner. To quickly recap: Originally PV was designed to be a piece of storage pre-allocated by administrator. Though after the introduction of Storage Class and Provisioner, users are able to dynamically provision PVs now. PVC is a request for a PV. When used with &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.appservgrid.com\/paw93\/index.php\/2018\/10\/16\/the-unexpected-kubernetes-part-2-volume-and-many-ways-of-persisting-data\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;The Unexpected Kubernetes: Part 2: Volume and Many Ways of Persisting Data&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[],"class_list":["post-215","post","type-post","status-publish","format-standard","hentry","category-kubernetes"],"_links":{"self":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/215","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/comments?post=215"}],"version-history":[{"count":0,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/posts\/215\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/media?parent=215"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/categories?post=215"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.appservgrid.com\/paw93\/index.php\/wp-json\/wp\/v2\/tags?post=215"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}