Showing posts with label Red Hat. Show all posts
Showing posts with label Red Hat. Show all posts

Monday, November 1, 2021

Docker and Containers - Everything you should know

Much has been discussed about Docker, containers, virtualization, microservices and distributed applications. On this post let's recap the essential concepts and review related technologies.
Photo by chuttersnap on Unsplash

Much has been discussed about Docker, microservices, virtualization and containerized applications. So much, that most people probably didn't catch up. As the ecosystem matures and new technologies and standards come and go, the container ecosystem can be confusing at times. On this post we will recap the essential concepts and a solid reference for the future.

Virtualization

So let's start with a bit of history. More a less 20 years ago the industry saw a big growth in processing power, memory, storage and a significant decrease in hardware prices. Engineers realized that their applications weren't utilizing the resources effectively so they developed Virtual machines (VMs) and hypervisors to run multiple operating systems in parallel on the same server.
Source: Resellers Panel
A hypervisor is computer software, firmware or hardware that creates and runs virtual machines. The computer where the hypervisor runs is called the host, and the VM is called a guest.

The first container technologies

As virtualization grew, engineers realized that VMs were difficult to scale, hard to secure, utilized a lot of redundant resources and maxed out at a dozen per server. Those limitations led to the first containerization tools listed below.
  • FreeBSD Jails: FreeBSD jails appeared in 2000 allowing the partitioning of a FreeBSD system into multiple subsystems. Jails was developed so that the same server could be sharded with multiple users without securely. 
  • Google's lmctfy: Google also had their own container implementation called lmcty (Let Me Contain That For You). According to the project page, lmctfy used to be Google’s container stack which now seems to be moved to runc. 
  • rkt: rkt was another container engine for Linux. rkt has ended and with CoreOS transitioning into Fedora CoreOS. Most of the efforts on that front should be happening into Podman now. 
  • LXC: released on 2008, the Linux Containers project (LXC) is another container solution for Linux. LXC provides a CLI, tools, libraries and a reference specification that's followed by Docker, LXD, systemd-nspawn and Podman/Buildah. 
  • Podman/Buildah: Podman and Buildah are also tools to create and manage containers. Podman provides an equivalent Docker CLI and improves on Docker by neither requiring a daemon (service) nor requiring root privileges. Podman's available by default on RH-based distros (RHEL, CentOS and Fedora). 
  • LXD: LXD is another system container manager. Developed by Canonical, Ubuntu's parent company, it offers pre-made images for multiple Linux distributions and is built around a REST API. Clients, such as the command line tool provided with LXD itself then do everything through that REST API. 

Docker

Docker first appeared in 2008 as dotCloud and became open-source in 2013. Docker is by far the most used container implementation. According to Docker Inc., more than 3.5 million Docker applications have been deployed and over 37 billion containerized applications downloaded.

Docker grew so fast because it allowed developers to easily pull, run and share containers remotely on Docker Hub as simple as:
docker run -it nginx /bin/bash

Differences between containers and VMs

So what's the difference between containers and VMs? While each VM has to have their own kernel, applications, libraries and services, containers don't as they share some of the host's resources. VMs are also slower to build, provision, deploy and restore. Since containers also provide a way to run isolated services, are lightweight (some are only a few MBs), start fast and are easier to deploy and scale, containers became the standard today.

The image below shows a visual comparison between VMs and Containers:
Source: ZDNnet

Why Containers?

Here are guidelines that could help you decide if you should be using containers instead of VMs:
  • containers share the operating system's kernel with other containers
  • containers are designed to run one main process, VMs manage multiple sets of processes
  • containers maximize the host's resource utilization 
  • containers faster to run, download and start
  • containers are easier to scale
  • containers are more portable than VMs
  • containers are usually more secure due to the reduced attack surface
  • containers are easier to deploy 
  • containers can be very lightweight (some are just a few MBs)
Containers are not only advantages. They also bring many technical challenges and will require you to not only rethink how your system is designed but also to use different tools. Look at the Ecosystem section below to understand.

Usage of Containers

And how much are containers being used? According to the a Cloud Native Computing Foundation survey, 84% of companies today use containers in production, a 15% increase from last year. Another good metric is provided by the Docker Index:

Open Collaboration

As the ecosystem stabilized, companies such as Amazon, Google, Microsoft and Red Hat collaborated on a shared format under Open Container Initiative (OCI). OCI was created from standards and technologies developed by Docker such as libcontainer. The standardization means that today you can run Docker and other LXC-based containers such as Podman on any OS.

The Cloud Native Computing Foundation (CNCF), part of the Linux Foundation is another significant entity in the area. CNF hosts many of the fastest-growing open source projects, including Kubernetes, Prometheus, and Envoy. CNCF's mission is to promote, monitor and hosts critical components of the global technology infrastructure.

The Technologies

Now let's dive into the technologies used by Docker (and OCI containers in general). The image below shows a detailed overview of the internals of a container. For clarity, we'll break the discussion in user and kernel space.

User space technologies

In usersland, Docker and other OCI containers utilize essentially these technologies:
  • runc: runc is a CLI tool for spawning and running containers. runc is a fork of libcontainer, a library developed by Docker that was donated to the OCI and includes all modifications needed to make it run independently of Docker. 
  • containerd: containerd is a project developed by Docker and donated to the CNCF that builds on top of runc adding features, such as image transfer, storage, execution, network and more.
  • CRI: CRI is the containerd plugin for the Kubernetes Container Runtime Interface. With it, you could run Kubernetes using containerd as the container runtime. 
  • Prometheus: Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus is an independent project and member of the Cloud Native Computing Foundation.
  • gRPC: gRPC is an open source remote procedure call system developed by Google. It uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts.
  • Go: yes, some of the tools are developed in C but Go shines in the area. Most of the open-source projects around containers use Go including: runc, runtime-tools, Docker CE, containerd, Kubernetes, libcontainer, Podman, Buildah, rkt, CoreDNS, LXD, Prometheus, CRI, etc. 

Kernel space technologies

In order to provide isolation, security and resource management, Docker relies on the following features from the Linux Kernel:
  • Union Filesystem (or UnionFS, UFS): UnionFS is a filesystem that allows files and directories of separate file systems to be transparently overlaid, forming a single file system. Docker implements some of them including brtfs and zfs.
  • Namespaces: Namespaces are a feature of the Linux kernel that partitions kernel resources so that one set of processes sees one set of resources while another set of processes sees a different set of resources. Specifically for Docker, PID, net, ipc, mnt and ufs are required.
  • Cgroups: Cgroups allow you to allocate resources — such as CPU time, system memory, network bandwidth, or combinations of these resources — among groups of processes running on a system. 
  • chroot chroot changes the apparent root directory for the current running process and its children. A program that is run in such a modified environment cannot name files outside the designated directory tree. 

Docker Overview

You probably installed Docker on your machine, pulled images and executed them. Three distinct tools participated on that operation: two local Docker tools and a remote container registry. On your local machine the two tools are:
  • Docker client: this is the CLI tool you use to run your commands. The CLI is essentially a wrapper to interact with the daemon (service) via a REST API.
  • Docker daemon (service): the daemon is a backend service that runs on your machine. The Docker daemon is the tool that performs most of the jobs such as downloading, running and creating resources on your machine.
The image below shows how the client and the daemon interact with each other:
Source: Docker Overview

Remote Registry

And what happens when you push your images to a container registry such as Docker Hub? The next image shows the relationship between client, dameon and the remote registry.
Source: Docker Overview

Images and Containers

Moving lower on the stack, it's time to take a quick look at Docker images. Internally, a Docker image can look like this:

Important concepts about images and containers that you should know:
  • Images are built on layers, utilizing the the union file system.
  • Images are readonly. Modifications made by the user are stored on a separate docker volume managed by the Docker daemon. They are removed as soon as you remove the container.
  • Images are managed using  docker image <operation> <imageid>
  • An instance of an image is called a container.
  • Containers are managed with the  docker container <operation> <containerid>
  • You can inspect details about your image with docker image inspect <imageid>
  • Images can be created with docker commit, docker build or Dockerfiles
  • Every image has to have a base image. scratch is the base empty image.
  • Dockerfiles are templates to script images. Developed by Docker, they became the standard for the industry.
  • The docker tool allows you to not only create and run images but also to create volumes, networks and much more.
For more information about how to build your images, check the official documentation.

Container Security

Due to the new practices of containers new security measures had to be applied. By default, containers are very reliable on some of the security measures of the host operating system kernel. Docker applies the principle of least privilege to provide isolation and reduce the attack surface. In essence, the best practices around container practice are:
  • signing containers 
  • only used images from trusted registries
  • harden the host operating system
  • enforce the principle of least privilege and do not elevate access to access devices
  • offer centralized logging and monitoring
  • run automated vulnerability scanning

The Ecosystem

Since this post is primarily about containers I'll defer the discussion of some the ecosystem for the future. However, it's important to list the main areas people working with containers, microservices and distributed applications should learn:
  • Container Registries: remote registries that allow you to push and share your own images.
  • Orchestration: orchestration tools deploy, manage and monitor your microservices.
  • DNS and Service Discovery: with containers and microservices, you'll probably need DNS and service discovery so that your services can see and talk to each onther.
  • Key-Value Stores: provide a reliable way to store data that needs to be accessed by a distributed system or cluster.
  • Routing: routes the communication between microservices.
  • Load Balancing: load balancing in a distributed system is a complex problem. Consider specific tooling for your app.
  • Logging: microservices and distributed applications will require you to rethink your logging strategy so they're available on a central location.
  • Communication Bus: your applications will need to communicate and using a Bus is the preferred way.
  • Redundancy: necessary to guarantee that your system can sustain load and keep operating on crashes.
  • Health Checking: consistent health checking is necessary to guarantee all services are operating.
  • Self-healing: microservices will fail. Self-healing is the process of redeploying services when they crash.
  • Deployments, CI, CD: redeploying microservices is different than the traditional deployment. You'll probably have to rethink your deployments, CI and CD.
  • Monitoring: monitoring should be centralized for distributed applications.
  • Alerting: it's a good practice to have alerting systems on events triggered from your system.
  • Serverless: allows you to build and run applications and services without running the servers..
  • FaaS - Functions as a service: allows you to develop, run, and manage application functionalities without maintaining the infrastructure.

Conclusion

On this post we reviewed the most important concepts about Docker containers, virtualization and the whole ecosystem. As you probably realized by the lenght of this post, the ecosystem around containers and microservices is huge - and keeps growing! We will cover in more detail much of the topic addressed here on future posts.

In the next posts, we will start divining in the details of some of these technologies.

References

See Also

Wednesday, September 1, 2021

Docker - 28 facts you should know

Docker is a very mature technology at this point but there's a lot of information that's still confused or ignored. Let's review some facts that everyone working with Docker should know.
Photo by Shunya Koide on Unsplash

Docker is a pretty established technology at this point and most people should know what it is. There are however important facts that everyone should know about. Let's see them.

Docker is not just a Container

In 2013, Docker introduced what would become the industry standard for containers. For millions of developers today, Docker is the standard way to build apps. However, Docker is much more than that command that you use on the terminal. Docker is a set of platform as a service products that uses OS-level virtualization to deliver software in packages called containers.

Today, apart from Docker (the containerization tool) Docker Inc (the company) offers:
  • Docker Engine: an open source containerization technology for building and containerizing your applications. Available for Linux and Windows.
  • Docker Compose: a tool for running and orchestrating containers on a single host.
  • Docker Swarm: A toolkit for orchestrating distributed systems at any scale. It includes primitives for node discovery, raft-based consensus, task scheduling and more
  • Docker Desktop: tools to run Docker on Windows and Mac.
  • Docker Hub: the public container registry
  • Docker Registry: server side application that stores and lets you distribute Docker images
  • Docker Desktop Enterprise: offers enterprise tools for the desktop
  • Docker Enterprise: a full set of tools for enterprise customers.
  • Docker Universal Control Plane (UCP): a cluster management solution
  • Docker Kubernetes Service: a full Kubernetes orchestration feature set
  • Security Scans: available on Docker Enterprise.

The second most loved platform

Developers love Docker (the tool 😉). Docker was elected the second most loved platform according to StackOverflow's 2019 survey. In fact, the company has made huge contributions to the development ecosystem and is a bliss to use. Well deserved!
Source: StackOverflow's 2019 survey

Images != Containers

A lot of people confuses this and interchangeability mix images and containers. The correct way to think about it is by using an Object-Oriented programming paradigm of Class/Instance. A Docker image is your class whereas the container is your instance. Continuing on the analogy with OO you can create multiple instances (containers) of your class (image).

As per Docker themselves,
A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

Docker was once named dotCloud

What you know today as Docker Inc., once was called dotCloud Inc.. dotCloud Inc. changed its name to Docker Inc. to grow the ecosystem by establishing Docker as a new standard for containerization, an alternative approach to virtualization which rapidly gained adoption. The project became one of the fastest-growing open source projects on GitHub. Surely it worked!

Docker is neither the first nor the only tool to run containers

Docker is neither the first nor the only tool to run containers. In fact, one of the key technologies Docker is based of, the chroot syscall was released in 1979 for Unix v7. Next, the first known container technology was FreeBSD jails (2000), Solaris Zones (2004), LXC (2008) and  Google's mcty. However, Docker made significant contributions to the segment since the establishment of Open Container Initiative (OCI) in 2015. Today the open standards allow tools such as Podman to offer an equivalent Docker CLI.

Containers are the new unit of deployment

In the past, applications included tens, and on some cases, hundreds of distinct business units and a huge number of lines of code. Developing, maintaining, deploying and even scaling out those big monoliths required a huge effort. Developers were frequently frustrated that their code would fail on production but would work locally. Containers alleviate this pain as they are deployed exactly as intended, are easier to deploy and can be easily scalable.

Containers run everywhere

Due to that fact that containers run on top of the container framework, they abstract the platform they're running in. That's a huge enhancement from the past where IT had to replicate the exact same setup on different environments. It also simplifies due to the fact that today you can deploy your images to your own datacenter, cloud service or even better, to a managed Kubernetes service with confidence that they'll run as they ran on your machine.
Source: Docker - What's a Container?

Docker Hub

Docker Hub is Docker's official container repository. Docker Hub is the most popular container registry in the world and one of the catalysts for the enourmous growth of Docker and containers themselves. Users and companies share their images online and everyone can download and run these images as simple as running the dock run command such as:
docker run -it alpine /bin/bash
Docker Hub is also the official repo for some of the world's most popular (and awesome!) technologies including:

Docker Hub Alternatives

But Docker Hub's not the only container registry out there. Multiple vendors including AWS, Google, Microsoft and Red Hat have their offerings. Currently, the most popular alternatives to Docker Hub are Google Container Registry (GCR), Amazon Elastic Container Registry (ECR), Azure Container Registry (ACR), Quay and Red Hat Container Registry. All of them offer public and public repos.

Speaking of private repos, Docker also has a similar offering called Docker Trusted Registry. Available on Docker Enterprise, you can install it on your intranet and securely store, serve and manage your company's images.

Docker Images

As per Docker, an image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.

There are important concepts about images and containers that are worth repeating:
  • Images are built on layers, utilizing a technology called UnionFS (union filesystem).
  • Images are readonly. Modifications made by the user are stored on a separate docker volume managed by the Docker daemon. They are removed as soon as the container is removed.
  • Images are managed using  docker image <operation> <imageid>
  • An instance of an image is called a container.
  • Containers are managed with the  docker container <operation> <containerid>
  • You can inspect details about your image with docker image inspect <imageid>
  • Images can be created with docker commit, docker build or Dockerfiles
  • Every image has to have a base image. scratch is the base empty image.
  • Dockerfiles are templates to script images. Developed by Docker, they became the standard for the industry.
  • The docker tool allows you to not only create and run images but also to create volumes, networks and much more.
For more information about how to build your images, check the official documentation.

A Layered Architecture

Docker images are a compilation of read-only layers. The below image shows an example of the multiple layers a Docker image can have. The upper layer is the writeable portion: modifications made by the user are stored on a separate docker volume managed by the Docker daemon. They are removed as soon as you remove the container.

Volumes are your disks

Because images and containers are readonly, and because the temporary volume created for your image is lost as soon as the image is removed the recommended way to persist the data for your container is volumes. As per Docker:
Volumes are the preferred mechanism for persisting data generated by and used by Docker containers. While bind mounts are dependent on the directory structure of the host machine, volumes are completely managed by Docker.
Some advantages of Volumes over bindmounts are:
  • Volumes are easier to back up or migrate than bind mounts.
  • You can manage volumes using Docker CLI commands or the Docker API.
  • Volumes work on both Linux and Windows containers.
  • Volumes can be more safely shared among multiple containers.
Creating volumes is as simple as:
docker volume create myvol
And using them with your container should be as simple as:
docker run -it -v myvol:/data alpine:latest /bin/sh
Other commands of interest for volumes are:
  • docker volume inspect <vol>:  inspects the volume
  • docker volume rm <vol>:  removes the volume
You can also have read-only volumes by appending :ro to your command

Dockerfiles

Dockerfile is a text file that contains all the commands a user could call on the command line to assemble an image. Using docker build users can create an automated build that executes several command-line instructions in succession.

The most common commands in Dockerfiles are:
  • FROM <image_name>[:<tag>]: specifies the base the current image on <image_name>
  • LABEL <key>=<value> [<key>=value>...]: adds metadata to the image
  • EXPOSE <port>: indicates which port should be mapped into the container
  • WORKDIR <path>: sets the current directory for the following commands
  • RUN <command> [ && <command>... ]: executes one or more shell commands
  • ENV <name>=<value>: sets an environment variable to a specific value
  • VOLUME <path>: indicates that the <path> should be externally mounted volume
  • COPY <src> <dest>: copies a local file, a group of files, or a folder into the container
  • ADD <src> <dest>: same as COPY but can handle URIs and local archives
  • USER <user | uid>: sets the runtime context to <user> or <uid> for commands after this one
  • CMD ["<path>", "<arg1>", ...]: defines the command to run when the container is started

.dockerignore

You may be familiar with .gitignore. Docker also accepts a .dockerignore file that can be used to ignore files and directories when building your image. Despite what you heard before, Docker recommends using .dockerignore files:
To increase the build’s performance, exclude files and directories by adding a .dockerignore file to the context directory.

Client-Server Architecture

The Docker tool is split into two parts: a daemon with a RESTful API and a client that interacts to the daemon. The docker command you run on the command line is a frontend and essentially interacts with the daemon. The daemon is a server and a service that listens and responds to requests from the client or services authorized using the HTTP protocol. The daemon also manages your images, containers and all operations including image transfer, storage, execution, network and more.

runc and containerd

A technical overview of the internals of Docker can be seen on the below image. Since already discussed some of the technologies on the Platform layer, let's focus now on the technologies found on the platform layer: containerd and runc.
runc is a CLI tool for spawning and running containers according to the OCI specification. runc was created from libcontainer, a library developed by Docker and donated to the OCI. libcontainer was open sourced by Docker in 2013 and donated by Docker to the Open Container Initiative (OCI).

containerd is an open source project project and the industry-standard container runtime. Developed by Docker and donated to the CNCF, containerd builds on top of runc, is available as a daemon for Linux and Windows and adds features, such as image transfer, storage, execution, network and more. containerd is by far the most popular container runtime and is the default runtime of Kubernetes 1.8 + and Docker.

Linux kernel features

In order to provide isolation, security and resource management, Docker relies on the following features from the Linux Kernel:
  • Union Filesystem (or UnionFS, UFS): UnionFS is a filesystem that allows files and directories of separate file systems to be transparently overlaid, forming a single coherent file system.
  • Namespaces: Namespaces are a feature of the Linux kernel that partitions kernel resources so that one set of processes sees one set of resources while another set of processes sees a different set of resources. Specifically for Docker, PID, net, ipc, mnt and ufs are required.
  • Cgroups: Cgroups allow you to allocate resources — such as CPU time, system memory, network bandwidth, or combinations of these resources — among groups of processes running on a system. 
  • chroot chroot changes the apparent root directory for the current running process and its children. A program that is run in such a modified environment cannot name files outside the designated directory tree.

A huge Ecosystem

The ecosystem around containers just keep growing. The image below lists some of the tools and services in the area.

Today the ecosystem around containers encompasses:
  • Container Registries: remote registries that allow you to push and share your own images.
  • Orchestration: orchestration tools deploy, manage and monitor your microservices.
  • DNS and Service Discovery: with containers and microservices, you'll probably need DNS and service discovery so that your services can see and talk to each onther.
  • Key-Value Stores: provide a reliable way to store data that needs to be accessed by a distributed system or cluster.
  • Routing: routes the communication between microservices.
  • Load Balancing: load balancing in a distributed system is a complex problem. Consider specific tooling for your app.
  • Logging: microservices and distributed applications will require you to rethink your logging strategy so they're available on a central location.
  • Communication Bus: your applications will need to communicate and using a Bus is the preferred way.
  • Redundancy: necessary to guarantee that your system can sustain load and keep operating on crashes.
  • Health Checking: consistent health checking is necessary to guarantee all services are operating.
  • Self-healing: microservices will fail. Self-healing is the process of redeploying them when they crash.
  • Deployments, CI, CD: redeploying microservices is different than the traditional deployment. You'll probably have to rethink your deployments, CI and CD.
  • Monitoring: monitoring should be centralized for distributed applications.
  • Alerting: it's a good practice to have alerting systems on events triggered from your system.
  • Serverless: servless technologies are also growing year over year. Today you can even find solid alternatives clouds such as AWS, Google Cloud and Azure.

Containers are way more effective than VMs

While each VM has to have their own kernel, applications, libraries and services, containers do not since they share the resources of the host. VMs are also slower to provision, deploy and restore. So since containers also provide a way to run isolated services, can be lightweight (some are only a few MBs), start quickly, are quicker to deploy and scale, containers are usually the preferred unit of scale these days.
Source: ZDNnet

Containers are booming

According to the latest Cloud Native Computing Foundation survey, 84% of companies use containers in production with 78% using Kubernetes. The Docker Index also provides impressive numbers reporting more than 130 billion pulls just from Docker Hub.
Source: Docker

Open standards

Companies such as Amazon, IBM, Google, Microsoft and Red Hat collaborate under the Open Container Initiative (OCI) which was created from standards and technologies developed by Docker such libcontainer. The standardization allows you to run Docker and other LXC-based containers on third-party tools such as Podman in any operating system.

Go is the container language

You won't see this mentioned elsewhere but I'll make this bold statement: Go is the container language. Apart from kernel features (written in C) and lower level services (written in C++), most of the open-source projects in the container space use Go, including:  runc, runtime-tools, Docker CE, containerd, Kubernetes, libcontainer, Podman, Buildah, rkt, CoreDNS, LXD, Prometheus, CRI, etc. 


gRPC is the standard protocol for synchronous communication

gRPC is an open source remote procedure call system developed by Google. It uses HTTP/2 for transport, Protocol Buffers as the interface description language, and provides features such as authentication, bidirectional streaming and flow control, blocking or nonblocking bindings, and cancellation and timeouts. gRPC is also the preferred protocol when communicating between containers.

Orchestration technologies emerge

The most deployed orchestration tool today is Kubernetes with 78% of the market share. Kubernetes was developed at Google then donated to the the Cloud Native Computing Foundation (CNCF). There are however other container orchestration products are Apache Mesos, Rancher, Open Shift, Docker Cluster on Docker Enterprise, and more.

Kubernetes is the Container Operating System

It's impossible talk Docker these days without mentioning Kubernetes. Kubernetes is an open source orchestration system for automating the management, placement, scaling and routing of containers that has become popular with developers and IT operations teams in recent years. It was first developed by Google and contributed to Open Source in 2014, and is now maintained by the Cloud Native Computing Foundation. Since version 1.9 Kubernetes uses containerd sas its container runtime so it can use other container runtimes such CRI-O. A container runtime is responsible for managing and running the individual containers of a pod.

Today Kubernetes is embedded with Docker Desktop so developers can develop Docker and Kubernetes at the comfort of their desktops. Plus, because Docker containers implement the OCI specification, you can build your containers using a tools such as Buildah / LXD and run in on Kubernetes.

Security Scans

Docker also offers a automated scans via Docker Enterprise Platform. Because images are readonly, automated scans are simple as it becomes simply checking the sha of your image. The image below details how this happens. More information is available here.

Windows also has native container

As a Windows user, you might already be aware that there exists so-called Windows containers that run natively on Windows. And you are right. Recently, Microsoft has ported the Docker engine to Windows and it is now possible to run Windows containers directly on a Windows Server 2016 without the need for a VM. So, now we have two flavors of containers, Linux containers and Windows containers. The former only run on Linux host and the latter only run on a Windows Server. In this book, we are exclusively discussing Linux containers, but most of the things we learn also apply to Windows containers.  

Today you can run Windows-based or Linux-based containers on Windows 10 for development and testing using Docker Desktop, which makes use of containers functionality built-in to Windows. You can also run containers natively on Windows Server.

Service meshes

More recently, the trend goes towards a service mesh. This is the new buzz word. As we containerize more and more applications, and as we refactor those applications into more microservice-oriented applications, we run into problems that simple orchestration software cannot solve anymore in a reliable and scalable way. Topics in this area are service discovery, monitoring, tracing, and log aggregation. Many new projects have emerged in this area, the most popular one at this time being Istio, which is also part of the CNCF.

Conclusion

On this post we reviewed 28 about Docker everyone should know. Hope this article was interesting and that you learned something new today. Docker is a fantastic tool and given it's popularity, it's ecosystem will only grow bigger thus, it's important to learn it well and understand its internals.

See Also

Tuesday, August 3, 2021

Why every developer needs to learn Docker and Containers

Want to know why use Docker and more importantly, when not use it? Read to understand.
Photo by chuttersnap on Unsplash

At this point, everyone working in IT should have heard about Docker. While its growth has been impressive in the last 3 years, there are important decisions to be made before using it. On this post we will recap the essentials of Docker (and containers in General) and list why, when and when not use Docker.

Virtual Machines

So let's start with a bit of history. More a less 20 years ago the industry saw a big growth in processing power, memory, storage and a significant decrease in hardware prices. Engineers realized that their applications weren't utilizing the resources effectively so they developed tools such as Virtual machines (VMs) and hypervisors to run multiple operating systems in parallel on the same server.
Source: Resellers Panel
A hypervisor is computer software, firmware or hardware that creates and runs virtual machines. The computer where the hypervisor runs is called the host, and the VM is called a guest.

Containers

The evolution of VMs is containers. Containers are packages of software that contain the application and all its dependencies making the application run quickly and reliably on any environment. A container is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. Containers run natively on Linux, share the kernel of the host and don't use more memory than any other applications.

Differences between Containers and VMS

So what's the difference between containers and VMs? While each VM has to have their own kernel, applications, libraries and services, containers don't as they share many of the host's resources. VMs are also slower to start, build, deploy and scale. Since containers also provide a way to run isolated services, are lightweight (some are only a few MBs), start fast and are easier to deploy and scale, containers became the standard today.

The image below shows a visual comparison between VMs and Containers:
Source: ZDNnet

Advantages of containers over VMs

Here are guidelines that could help you decide if you should be using containers instead of VMs:
  • lightweight: containers can be very lightweight (some are just a few MBs). It's common to run many at the same time on a development machine. 
  • platform independent: containers are platform independent thus more portable than VMs.
  • reduced costs: containers are lighter and share host resources maximizing the host's resource utilization. You'll probably be able to host 3, 4 or 5 times more containers than you host VMs per server.
  • faster: due to their smaller size, containers are faster to download, build, start, stop and remove.
  • deploy: containers are easier to deploy as the images are lightweight, start quickly and are platform independent.
  • scale out: containers are easier to scale out.
  • security: containers are usually more secure due to the reduced attack surface and isolation features at the kernel level.

Docker

So let's talk Docker. Docker first appeared in 2008 as dotCloud and became open-source in 2013. Docker is by far the most used container implementation. According to Docker Inc., more than 3.5 million Docker applications have been deployed and over 37 billion containerized applications downloaded. Docker's popularity exploded because it allowed developers to easily share, pull, and run images as simply as:
docker run -d nginx
But Docker is neither the first nor the only tool to run containers. Before it, FreeBSD jails, Google's mcty and LXC already existed. However, Docker made significant contributions to the segment since the establishment of Open Container Initiative (OCI) in 2015. Today the open standards allow tools such as Podman to offer an equivalent Docker CLI.

Why use Containers

Here are guidelines that could help you decide if you should be using containers instead of VMs.

Platform-agnostic

Due to that fact that containers run on top of the container framework (Docker Desktop for example), they abstract the platform they're running in. That's a huge enhancement from the past where companies had to replicate the exact same setup (OS, patches, libraries and frameworks) on different environments. It also simplifies due to the fact that today you can deploy your images to your own datacenter, cloud service or even better, to a managed Kubernetes cluster (such as Amazon EKS) and be confident that they'll run as they ran on your machine.

Reduced Costs

Containers share host many of the resources of the host minimizing CPU, network and storage. Plus, due to being lightweight, it's common to deploy dozens of containers per server increasing server utilization and reducing costs.

Development-Friendly

Containers simplify development because they enable developers to easily pull, run, build and deploy any application as a lightweight, portable and independent image. Plus scripts to build a single image (Dockerfiles) or applications (Docker Compose files) can be saved on Git repos or shared on public repositories such as Docker Hub, so that installing complex chains of dependencies is abbreviated by a simple docker pull command.

Faster

Due to their smaller footprint, containers are faster to run, download and start. That's a significant advantage over VMs which may take up to a couple of minutes to start. Another benefit is that building, removing and recreating a container are no longer tedious operations anymore.

More Secure

Containers are usually more secure because they use Linux features that sandbox and isolate them from other applications running on the same host. Images can be signed and, because they're immutable, they can be easily scanned (or diffed) to ensure they weren't tempered with.

And, in case you detect that one of your images were compromised, removing and recreating it would be very quick and would not affect the SLA of your application.

Lightweight

Containers are usually way more lightweight than VMs. It's possible to have many containers running at the same time on a developer's laptop without overwhelming the laptop. On the server too, it's possible to get far more apps running on the same servers since they'll reutilize many of the host's resources.

Simplified Ops

Because containers are platform-agnostic, ops can focus on infrastructure, running and monitoring the applications. Infrastructure can (and should) be easily created and removed and, by using orchestration tools such as Kubernetes, a server becomes just another host. Plus, no libraries of frameworks need to be installed on the servers, just the OS and a container runtime such as Docker.

Deployment

Deployment is also more consistent and reliable as the abstraction provided by Docker allows replicating the exact same setup on any computer and easily replicate a production-like environment on any workstation. Building images is also simple and can be easily started from Dockerfiles and bash scripts.

Scale Out

If you're using the right tools it's super easier to scale out your services. Since your images are available on a container registry, they can be easily pulled and executed on any server having the container engine installed. In theory this is as simple as requesting the scale out operation from your orchestration service. For example, if you were using Docker Compose, scaling up a service named web would be as simple as:
docker-compose scale web=3

Automated Testing

While everyone knows the benefits of unit-tests, integration testing sometimes is not that trivial. There's a lot to setup, the process is complex and recreating environments from scratch may take a long time. Tools such as Kubernetes or Docker Compose allow easily rebuilding the exact setup and run the integrated tests in an isolated fashion. For example, with Docker Compose, just define a docker-compose.yml file and let it do the hard work for you:
docker-compose up -d
./run_tests
docker-compose down

Easier CI/CD

We saw previously how containers simplify our testing pipeline including dynamic creation of environments. The same setup could be replicated and isolated to run multiple environments on the same host with different combinations of operating systems and configurations. We could even have multiple layers of chained test being cascaded and automatically deploying to production without any manual intervention.

Disaster recovery

Disaster recovery with containers is a no-brainer. Because all deployed images are available on the container registry and the because the deployment setup is already pre-specified, recovering an environment from scratch is as simple as running a script or even better: just let the orchestration tool handle it for you.

When not use Containers

But containers are not only advantages. They also bring many technical challenges and will require you to not only rethink how your system is designed but also to use different tools. Reasons not to use containers are:
  • Increased complexity: containers will increase a lot the complexity of your application as you'll need to revisit your development, integration, deployment and production practices.
  • Bye bye monolith: when developing containters, it's a good practice to break your monolith in smaller services. If your monolith is very big and mixes multiple business units, migrating to a microservice architecture will be a big undertaking. If your team does not have capacity or buying from the business, maybe it should not be a good idea.
  • Multiple databases: breaking the monolith also means breaking the database. So your development team will have to manage more databases from now on.
  • Complex security: more services means more resources to secure. Orchestration tools usually handle most of the basic security concerns but your ops will have more resources to secure.
  • Single x multiple process per host: containers are designed to run one main process. If your application has rigid rules against this rule, maybe you should reconsider sticking with it.
  • Orchestration: most of the examples you'll see on the web show containers running isolatedly however, we know our systems don't work like that. In order to leverage the potential of containers, you'll have to run an orchestration service which will by itself require additional knowledge (and potentially more people) from your team.
  • Complex backups: your resources will now be more distributed than before. While there are good practices in the field, it's important to mention that the complexity may increase.
  • Complex tracing: tracing events and errors in distributed applications is not an easy task
  • Complex logging: the complexity of logging also grows. You'll have to build a centralized logging framework so you can trace 
  • Shared kernel: if for some reason your're deploying containers share the operating system's kernel with other containers

What else to consider

Other challenges with containers come from the complexity that building them requires. There are areas that you and your team will have to understand to use them effectively, including:
  • Container Registries: remote registries that allow you to push and share your own images.
  • Orchestration: orchestration tools deploy, manage and monitor your microservices.
  • DNS and Service Discovery: with containers and microservices, you'll probably need DNS and service discovery so that your services can see and talk to each onther.
  • Key-Value Stores: provide a reliable way to store data that needs to be accessed by a distributed system or cluster.
  • Routing: routes the communication between microservices.
  • Load Balancing: load balancing in a distributed system is a complex problem. Consider specific tooling for your app.
  • Logging: microservices and distributed applications will require you to rethink your logging strategy so they're available on a central location.
  • Communication Bus: your applications will need to communicate and using a Bus is the preferred way.
  • Redundancy: necessary to guarantee that your system can sustain load and keep operating on crashes.
  • Health Checking: consistent health checking is necessary to guarantee all services are operating.
  • Self-healing: microservices will fail. Self-healing is the process of redeploying services when they crash.
  • Deployments, CI, CD: redeploying microservices is different than the traditional deployment. You'll probably have to rethink your deployments, CI and CD.
  • Monitoring: monitoring should be centralized for distributed applications.
  • Alerting: it's a good practice to have alerting systems on events triggered from your system.
  • Serverless: allows you to build and run applications and services without running the servers..
  • FaaS - Functions as a service: allows you to develop, run, and manage application functionalities without maintaining the infrastructure.

Conclusion

On this post we reviewed the most important concepts about Docker containers and presented reasons why you should and why you shouldn't use it. As with every new technology, transitioning to a microservice / distributed architecture will require significant changes from your team. However as we also saw, the benefits may be fantastic too.

References

See Also

Monday, June 29, 2020

How to create a custom CentOS Stream VM on Azure

There isn't a one-click experience for creating CentOS VMs in Azure. Learn how to create yours.
Photo by Robert Eklund on Unsplash

Running CentOS on Azure is great. However, getting there requires some work because none of the on Azure are available at the moment are free. On this post we will continue illustrating why one should use CentOS by deploying our own CentOS Stream VM to Azure.

With the news that Red Hat is shutting down the CentOS project, I definitely cannot recommend CentOS for your server anymore. 😌 However, it still has its value if you're developing for RHEL.

Azure Requirements

Before getting our hands dirty let's review requirements to run a custom VM on Azure:
  • Disk format: at the moment, only fixed VHD is supported;
  • Gen: Azure supports Gen1 (BIOS boot) & Gen2 (UEFI boot) Virtual machines. Gen1 worked better for me;
  • Disk Space: Minimum 7Gb of disk space;
  • Partitioning: Use default partitions instead of LVM or raid;
  • Swap: Swap should be disabled as Azure does not support a swap partition on the OS disk;
  • Virtual Size: All VHDs on Azure must have a virtual size aligned to 1 MB;
  • Supported formats: XFS is now the default file system but ext4 is still supported.

Installing CentOS on Hyper-V

The first thing that we have to do is to produce a virtual hard disk (VHD) with a bootable CentOS installed using Hyper-V as explained in detail on a previous post. Today we'll extend that setup adding what's necessary to run it on Azure. On this tutorial we will:
  1. Download the CentOS 8 Stream ISO
  2. Create a virtual hard disk (VHD) in Hyper-V
  3. Create and configure the VM in Hyper-V
  4. Install CentOS on the VM by:
    1. Specifying a software selection
    2. Configuring networking (we'll need to install software after the first boot)
    3. Configuring partitions on disk
    4. Creating accounts
    5. Modify the system to add settings required by Azure

Downloading the CentOS 8 ISO

This should be obvious to most people. In order to get our custom installed on a VHD with Hyper-V, please go ahead and download the latest ISO to your computer. We'll need that ISO to load the installer and install it to our VHD. As previously, we'll use CentOS Stream.

Creating a Virtual Hard Disk (VHD)

With the ISO downloaded, let's create a virtual hard disk (VHD) on Hyper-V. To do so, open Hyper-V Manager, click New -> Hard Disk and choose VHD on the Choose Disk Format screen:
Next, on Choose Disk Type, choose Fixed size:
In Configure Disk, set the disk size. In my tests, 6GB was a reasonable size for a simple server and enough space on the home partition:

Creating the VM

The process to create the Hyper-V VM remains the same. Make sure to review the previous post it in detail as I'll only describe the essential bits that required by the Azure customization here.

Configuring Networking

Make sure that you choose the Default Switch in Configure Networking:

Connecting the Virtual Hard Disk

On Connect Virtual Hard Disk, we'll choose Use an existing virtual hard disk and point it to the one you just created. This is necessary because Hyper-V auto-creates VHDXs by default while Azure requires VHDs:
To finish up, validate on Summary that all looks good and confirm:

Specifying the ISO

The last thing before starting up the VM is to specify the ISO as a DVD drive. That's done on Hyper-V manager by selecting DVD Drive -> Media, choosing Image file and locating yours on disk:
I also like to disable checkpoints and unset automatic start/stop actions.

Installing CentOS Stream

After starting the VM in Hyper-V, you should be prompted with the screen below. Choose Install CentOS Stream 8-stream:

The installer

After the boot ends, you should be running the installer called Anaconda. Choose your language and click Continue:

Installation Summary

On Installation Summary, we'll essentially configure software selection, network. We'll also need to setup partitions on the Installation Destination screen.

Software selection

For the software selection, we'll go with Minimal Install as I don't want to delay the process by installing what we don't need. The only requirement we'll need to install is a text editor (I'll be using Vim, my favourite text editor but feel free to use yours) so we can make the necessary changes.

During installation, click on Software Selection and choose Minimal Install:

Disk Partitioning

Because Azure requires some special settings (see requirements above), we'll need to do manual partitioning. But don't be scared, that shouldn't be complicated. We'll divide our disk in three main partitions:
  • /boot, 300Mb - where the boot files will be placed (including the kernel)
  • /, 4Gb - where all the files of the system will be placed (including software, logs, services and libraries)
  • /home, 1Gb - to store user files
  • no swap - we don't need a swap partition as Azure will privision one for us.
We'll also use XFS whenever applicable since it's the default in Azure now.

Choose your disk and click on Custom:
 On the Manual Partitioning screen, click on Standard Partition:
Add the manual partitions by clicking on the + sign below.
 The first to add is /boot. Enter 300m on the popup so you see:
Add 1GB for /home:
And the remainder (4.7G) for /:
Confirm to complete:

Networking

Enable networking as we'll need to install our text editor (and if you wish, update the instance before uploading to Azure):

Start Installation

After all the settings were entered, click continue to proceed. During install, you will be prompted with a screen similar to:
It's recommended to set a root password and to create an account. I also recommend checking the Make this user administrator option as we should be using root as little as possible:

Before the first boot

Once the installation finishes eject the virtual ISO by goint to Hyper-V manager, choosing your VM -> Settings -> DVD Drive and set it to None -> Apply:

First Boot

After ejecting the DVD and starting the VM you should see the following boot loader:
Then the following login screen after the boot finishes:

Azure Configuration

We'll now proceed with our Azure configuration. For CentOS 8, the documentation is specified here (although in less detail than on this blog post). Login as root and follow the next steps.

Testing the network

If you recall, we chose Minimum Install during the installation. That means that we don't have a text editor yet so let's proceeed with the installation as we'll need one to modify our configuration. To confirm our network can access the internet, type:
ping github.com

No network?

If no network is available, check the status of your connection with:
nmcli con status
If eth0 is down, we should enable eth0 to auto-get an ip from our Hyper-V Virtual switch with:
nmcli con up eth0
Try pinging again and it should work fine now.
ping github.com

Installing Vim

For editing files I'll install Vim, my favourite text editor. That can be done with:
dnf install vim

Whenever possible, I'll be using DNF as it's what I'm used as a Fedora user. Feel free to use Yum if you prefer it.

Configuring the network

To configure the network, the first step is to create or edit the file /etc/sysconfig/network and add the following:
NETWORKING=yes
HOSTNAME=localhost.localdomain
You can run this as a oneliner with:
printf "NETWORKING=yes\nHOSTNAME=localhost.localdomain\n" >> /etc/sysconfig/network

Create or edit the file /etc/sysconfig/network-scripts/ifcfg-eth0 and add the following text:
DEVICE=eth0
ONBOOT=yes
BOOTPROTO=dhcp
TYPE=Ethernet
USERCTL=no
PEERDNS=yes
IPV6INIT=no
NM_CONTROLLED=no
Modify udev rules to avoid generating static rules for the Ethernet interface(s). These rules can cause problems when cloning a virtual machine in Microsoft Azure or Hyper-V:
ln -s /dev/null /etc/udev/rules.d/75-persistent-net-generator.rules

Modifying GRUB

Next, we'll modify the kernel boot line in your grub configuration to include additional kernel parameters for Azure. To do this, open /etc/default/grub and set GRUB_CMDLINE_LINUX to:
GRUB_CMDLINE_LINUX="rootdelay=300 console=ttyS0 earlyprintk=ttyS0 net.ifnames=0"
Rebuild the grub configuration:
grub2-mkconfig -o /boot/grub2/grub.cfg

Installing the Azure Linux Client

We'll now install the Azure Linux Client. This package is required by Azure to perform essential tasks on our VM including provisioning, networking, user management, ssh, swap, diagnostics, etc. Installing it on CentOS is super simple as the package is available in the repo:
dnf install WALinuxAgent
Then modify /etc/waagent.conf making sure you have:
ResourceDisk.Format=y
ResourceDisk.Filesystem=ext4
ResourceDisk.MountPoint=/mnt/resource
ResourceDisk.EnableSwap=y
ResourceDisk.SwapSizeMB=4096    ## setting swap to 4Gb
To finish off, enable it on boot with:
systemctl enable waagent

Deprovisioning and powering off

Almost there. Just run the following commands to deprovision the virtual machine and prepare it for Azure with:
waagent -force -deprovision
export HISTSIZE=010i
systemclt poweroff
The machine will shut down. Let's move to the Azure part now.

Uploading virtual hard disk to Azure

Now that our setup is complete, we'll upload our VHD to Azure so we can create new virtual machines from it. There are two ways to do this:
  1. using AzCopy (only for the brave)
  2. use Azure Storage Explorer (recommended)
Unfortunately I can't recommend using AzCopy at the moment as the tool is full of bugs. It could be that Microsoft is still learning Go 😉.

Uploading using AzCopy (only for the brave)

To upload our VHD, you should install AzCopy and install the Azure CLI. After the installations finish, close and open all PowerShell/terminal windows so all the env vars are reloaded.

Login in Azure using the CLI

Let's login to the Azure CLI by typing on a PowerShell window:
az login

Create the disk

In order to create our managed disk, first we need to determine it's actual size. To get your disk size, type the command below and copy the output as will be necessary by the upload:
wc -c <file.vhd>
Now, run a command similar to the below replacing items in <> with your data:
az disk create -n <disk-name> -g <resourcegroup> -l <region> --for-upload --upload-size-bytes <your-vhd-size> --sku standard_lrs
To upload, first we'll need to generate a SAS token with:
az disk grant-access -n <disk-name> -g <resourcegroup> --access-level Write --duration-in-seconds 86400
If you got a json response with a "accessSas" token on it, copy that url. We'll use it to upload our VHD file to Azure using the azcopy tool:
azCopy copy "<path-to-vhd>" "<sas-token>" --blob-type PageBlob
After the upload is complete, and you no longer need to write any more data to the disk, revoke the SAS. Revoking the SAS will change the state of the managed disk and allow you to attach the disk to a VM.
az disk revoke-access -n <disk-name> -g <resourcegroup>

Option 2 (Recommended): Using Azure Storage Explorer

I usually don't recommend GUIs but AzCopy is unusable at the moment. Also uploading via Azure Storage Explorer was way faster and didn't timeout on me 😒. So install Azure Storage Explorer, open a Blob container, find or create a folder and click Upload File. Select your VHD and don't forget to set it to Page Blob:
After completed, you should see your VHD on your remove blog storage folder:
Right-click it -> properties on your disk and copy the Uri:
Next, run the following command to create a VHD from that image (it should be quick):
az disk create -n <disk-name> -g <resourcegroup> -l <region> --source <your-vhd-url>
At this point our disk should be recognized by Azure:

Creating the VM

With the disk available, we're ready to create our VM with:
az vm create -g <resourcegroup> -l <region> --name <vmname> --os-type linux --attach-os-disk <disk-name>

You should now see your VM on Azure as expected:

Testing the VM

Now the fun part, let's see if this works. If you look carefully the image above, you'll see our IP listed there. We can try to ssh into it with:
ssh bruno@<ip>

Yay! Our custom CentOS VM is available on Azure and we can access remotely. From here, it's up to you to install the serices you need. Or, just play a little with it, tear it down, recreate and so on.

Security Considerations

I dream of the day we no longer have to discuss hardening our VMs on public cloud providers. Unfortunately we're not there yet. There's a lot of bots scanning for open ports and vulnerabilities on public IPs so make sure you take the necessary precautions to secure your SSH service such as preventing root login, changing the SSH port number and even banning failed login attempts.

There's also some other measures that we can take on Azure to block that traffic but that's beyond the scope of this post.

Troubleshooting

Before wrapping up, I'd like to leave a few tips. It's important to remember that a lot can go wrong here. So far we created a virtual machine on Hyper-V locally, modified the configuration, uploaded our VHD to a blob storage and ran some commands using the Azure CLI to create our resourses on Azure. What else could go wrong? Let's see next.

What's my public ip?

You can get that information on the overview or, from the shell with:
curl ipinfo.me

Can't access the VM via SSH

I had this error too. There are two possible problems: (1) your VM is not accessing the internet or (2) either a firewall or Azure's Networking is not properly set. Both should be fine if it's your first time accessing your VM.

The VM isn't starting up

Recheck the steps above. A solution would be running it locally from Hyper-V and make sure you didn't break it while applying the configuration.

Can't access the internet

This error may happen if your VM is incorrectly configured or if it couldn't get an IP from Azure's DHCP server. Try accessing it from the Serial Console to get insights about the eth0 ethernet adapter, IP and connection status with:
ip a                                        # check my adapters. Expected: eth0
nmcli device show eth0        # shows the status of the ethernet connection
nmcli con up eth0                 # starts the connection

The VM won't get an IP

This is probably Azure's fault as the IP should be auto-given to you by their DHCP servers. Anyhow, we can retry being assigned an ip with:
sudo dhclient

waagent is not working

Using the Serial Console, check if the agent is working and the status is active (running) with the command below:
sudo systemctl status waagent
sudo systemctl start waagent

Can't connect to my VM via SSH

This could happen if your instance can't access the internet or if the service is not running. Try connecting to it via the Azure Serial Client and check the previous steps to make sure that the VM can ping an external site. Also confirm that your public IP is correct. If you did not specify, Azure will release your previous IPs and is not guaranteed that a new one will be the same.

Virtual routing problems

If you think that the problem is related to virtual network routing in Azure, please check these links:

Conclusion

On this post we reviewed in detail how to create a custom CentOS Stream image and run it on Azure. For this demo we used CentOS, my favorite distro for the server but most of the information described here should also be useful for other distributions. We also demoed how to use the Azure CLI and showed some of the features Azure provides to us.

References

See Also

About the Author

Bruno Hildenbrand      
Principal Architect, HildenCo Solutions.