An Architect’s Guide to Hybrid Cloud Storage.

As we ease into 2021, there are a few technology vectors that are dictating the conversation for IT architects. The dominant one is Kubernetes. Related, and quickly becoming “standard,” is the hybrid cloud.

The challenges inherent in an architect’s role are only compounded when planning for the hybrid cloud. First, it is new and marketing has a tendency to outrun facts. Second, it is constantly evolving — which requires the architect to have a strong sense of what comes next. Third, organizations are changing and adapting to the trials and tribulations associated with a global pandemic. Finally, this is a long term planning exercise with short term deliverables — one thing we know for sure is that the modern enterprise will not tolerate a technology vacuum. A vacuum, after all, is what launched the multicloud phenomena to start with.

Now is an opportunity to bring order to the chaos. At the most fundamental level, the architect must deliver consistency across the various environments. Developer consistency, application consistency, user interface consistency, performance consistency. The list goes on, but the success criteria stays the same with regard to consistency.

This post is going to focus on one element, albeit a very critical element in any hybrid cloud architecture: storage. Before we go any further, we should mention that we are only concerned with object storage in this post. Object storage is the storage class of the cloud and of Kubernetes. That makes it the storage class of the hybrid cloud. File and block systems are legacy at this point — one only needs to look at how it is priced in the public cloud to understand that fact.

From an architect’s perspective, it helps to start by defining the playing field. There is a tendency to use the terms “public cloud” and “on-prem” and be done with the definition of the hybrid cloud. The truth, however, is that the hybrid cloud is multidimensional.

To deliver a functional hybrid cloud architecture, you need to have a storage strategy that can operate in the following environments.

Public Clouds: This is an increasingly large field, but starts with Amazon Web Services, Azure, Google Cloud Platform, IBM, Alibaba and Tencent. Your hybrid cloud storage software needs to run everywhere your business runs. Even companies that claim to run on a single cloud don’t — there are always other clouds, you just don’t know about them yet.

Private Cloud: The definition of the private cloud continues to evolve and the role of hybrid cloud storage needs to evolve to support that emerging architecture. The private cloud is a concept, not a place, and the modern private cloud is often found in off-premises data centers, virtual private networks and virtual private clouds. Your hybrid cloud storage needs to run without compromise everywhere your cloud computing infrastructure runs.

The Kubernetes Distributions: Often overlooked, the Kubernetes distributions could be considered a subcategory of the private cloud, but we treat them as separate entities because they don’t lend themselves to a roll-your-own approach. To run here, your hybrid cloud storage solution needs to be object storage, software-defined and cloud native. Options include VMware (Tanzu), HP (Ezmeral), Cisco (IKE), Red Hat (OpenShift) and Rancher/SUSE.

The Edge: Also often overlooked, the edge is a critical part of any hybrid cloud architecture. Your hybrid cloud storage solution needs to be lightweight, powerful, cloud native and fast to run at the edge. While the edge has varying levels of importance today, that importance will only grow — and with it the challenge of small objects. Architects designing hybrid systems need to be thinking clearly about the implications of the edge.

The Attributes of Hybrid Cloud Storage

Given these hybrid cloud parameters — object storage deployed across public, private, Kubernetes distros and the edge — what are the attributes of success? I present the following for consideration:

Consistency

As noted earlier, the goal is consistency in the user experience, application performance and developer experience. Roadmaps are nice, but data is bankable. What object storage solutions run across multiple public clouds, across the Kubernetes distributions, and at the edge? Are there elements that would preclude a solution from succeeding in these environments? An appliance, for example, doesn’t lend itself to orchestration. It cannot provide consistency across the environments. Public clouds are another area where consistency is threatened. There is increasing talk from the major players not just about their on-prem offerings, but also their ambitions to run in each other’s clouds. How does that square with their experience when running a service of having complete control over the hardware? Can they really guarantee consistency?

Performance

Performance expands the pool of applications that you can pair with object storage. Almost every modern workload demands performance. If you are not performant you cannot run Spark, Presto, Tensorflow or any of the other AI/ML and big data applications that have come to define the enterprise landscape. Even archival workloads benefit from performance. What enterprise designs a slow restore process?

An architect needs to design not only for performance but also performance at scale. This is where modern object storage shines. Long known as cheap and slow, new object storage offerings read and write at hundreds of GB/s on standard hardware. Not every workload demands that performance, but every workload wants it. To serve the broadest audience, architects need to design for speed.

Scale

Scale is often misinterpreted to mean the theoretical limit of the system. While object storage is considered to be infinitely scalable, everyone knows that practically this is not the case. Scalability has multiple dimensions. Architects need to consider the operational efficiency of scaling and the bottlenecks that can arise. For example, object stores that use an external metadata database simply don’t scale past a certain point. They are poor choices for large-scale infrastructure.

A hybrid cloud object storage solution needs to scale in the same way — no matter the environment — and do so simply, with minimal human interaction and maximum automation.

Software-Defined

For an architect thinking about multiple workloads on multiple clouds, (public, private, edge) there is only one answer: software. Multiple environments dictate other heterogeneous hardware. Software abstracts the backend physical storage and is the architect’s primary tool in this effort (see Kubernetes). Software defines the user experience, providing flexibility and extensibility.

Cloud Native

For an architect thinking about storage, this can be a component that one “gives a pass” to, given how few vendors are actually cloud native. Don’t. Just as a leopard cannot change his/her spots, an appliance vendor does not suddenly become software-defined and cloud native. Cloud native is as much a philosophy as it is a collection of technologies and principles. If Kubernetes, containers, microservices, S3 and the API were not part of the plan from the beginning, there will always be friction. It should not completely disqualify non-cloud native storage vendors, but it should provide pause. What worked for on-prem doesn’t work for the cloud. What became a key vendor relationship five years ago may not be relevant to the architectures that are emerging.

A Framework

This post is designed to provide a framework for enterprise cloud architects looking ahead. The key to any successful planning exercise is to challenge your thinking and to create as much detail as possible around the key components of the plan. Planning for hybrid cloud storage architecture requires exceptional discipline and deep evaluation of previously held beliefs. The payoff for enterprises, however, can be massive — both from a cost savings perspective and a competitive perspective.

Source

How to Use Your Own Docker Registry

One of the things that makes the Docker Platform so powerful is how easy it is to use images from a central location. Docker Hub is the premier Image Repository with thousands of Official Images ready for use. It’s also just as easy to push your own images to the Docker Hub registry so that everyone can benefit from your Dockerized applications.

But in certain scenarios, you might not want to push your images outside of your firewall. In this case you can set up a local registry using the open source software project Distribution. In this article we’ll take a look at setting up and configuring a local instance of the Distribution project where your teams can share images with docker commands they already know: docker push and docker pull.

Prerequisites

To complete this tutorial, you will need the following:

Running the Distribution service

The Distribution project has been packaged as an Official Image on Docker Hub. To run a version locally, execute the following command:

$ docker run -d -p 5000:5000 –name registry registry:2.7

The -d flag will run the container in detached mode. The -p flag publishes port 5000 on your local machine’s network. We also give our container a name using the –name flag. Check out our documentation to learn more about these and all the flags for the docker run command.

Pushing and Pulling from a local registry

Now that we have our registry running locally, let’s tail the container’s logs so we can verify that our image is being pushed and pulled locally:

$ docker logs -f registry

Open another terminal and grab the Official Ubuntu Image from Docker Hub. We’ll use this image in our example below:

$ docker pull ubuntu

To push to or pull from our local registry, we need to add the registry’s location to the repository name. The format is as follows: my.registry.address:port/repositoryname.

In our example, we need to replace my.registry.address.port with localhost:5000 because our registry is running on our localhost and is listening on port 5000. Here is the full repository name: localhost:500/ubuntu. To do this, we’ll run the docker tag command:

$ docker tag ubuntu localhost:5000/ubuntu

Now we can push to our local registry.

$ docker push localhost:5000/ubuntu

NOTE:

Docker looks for either a “.” (domain separator) or “:” (port separator) to learn that the first part of the repository name is a location and not a user name. If you just had localhost without either .localdomain or :5000 (either one would do) then Docker would believe that localhost is a username, as in localhost/ubuntu or samalba/hipache. It would then try to push to the default Registry which is Docker Hub. Having a dot or colon in the first part tells Docker that this name contains a hostname and that it should push to your specified location instead.

Switch back to the terminal where our registry logs are being tailed. If you review the logs, you will see entries displaying the request to save our ubuntu image:


172.17.0.1 – – [26/Feb/2021:18:10:57 +0000] “POST /v2/ubuntu/blobs/uploads/ HTTP/1.1” 202 0 “” “docker/20.10.2 go/go1.13.15 git-commit/8891c58 kernel/4.19.121-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.2 \(darwin\))”

172.17.0.1 – – [26/Feb/2021:18:10:57 +0000] “POST /v2/ubuntu/blobs/uploads/ HTTP/1.1” 202 0 “” “docker/20.10.2 go/go1.13.15 git-commit/8891c58 kernel/4.19.121-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.2 \(darwin\))”

172.17.0.1 – – [26/Feb/2021:18:10:57 +0000] “POST /v2/ubuntu/blobs/uploads/ HTTP/1.1” 202 0 “” “docker/20.10.2 go/go1.13.15 git-commit/8891c58 kernel/4.19.121-linuxkit os/linux arch/amd64 UpstreamClient(Docker-Client/20.10.2 \(darwin\))”

Now let’s remove our localhost:5000/ubuntu image and then pull the image from our local repository to make sure everything is working properly.

First print a list of images we have locally:

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry 2.7 5c4008a25e05 40 hours ago 26.2MB
ubuntu latest f63181f19b2f 5 weeks ago 72.9MB
localhost:5000/ubuntu latest f63181f19b2f 5 weeks ago 72.9MB

Now remove the localhost:5000/ubuntu:latest image from our local machine:

$ docker rmi localhost:5000/ubuntu
Untagged: localhost:5000/ubuntu:latest
Untagged: localhost:5000/ubuntu@sha256:3093096ee188f8…8c091c8cb4579c39cc4e

Let’s double check the image has been removed:

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry 2.7 5c4008a25e05 40 hours ago 26.2MB
ubuntu latest f63181f19b2f 5 weeks ago 72.9MB

Finally pull the image from our local registry and verify that it is now pulled to our local instance of Docker.

$ docker pull localhost:5000/ubuntu
Using default tag: latest
latest: Pulling from ubuntu
Digest: sha256:sha256:3093096ee188f8…8c091c8cb4579c39cc4e
Status: Downloaded newer image for localhost:5000/ubuntu:latest
localhost:5000/ubuntu:latest

$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
registry 2.7 5c4008a25e05 40 hours ago 26.2MB
ubuntu latest f63181f19b2f 5 weeks ago 72.9MB
localhost:5000/ubuntu latest f63181f19b2f 5 weeks ago 72.9MB

Summary

In this article, we took a look at running an image registry locally. We also pulled an image for Docker Hub, tagged the image with our local registry and then pushed that image to our local registry running Distribution.

If you would like to learn more about the Distribution project, head on over to the open source project homepage on GitHub and be sure to check out the documentation.

Source