Tuesday, August 3, 2021

Why every developer needs to learn Docker and Containers

Want to know why use Docker and more importantly, when not use it? Read to understand.
Photo by chuttersnap on Unsplash

At this point, everyone working in IT should have heard about Docker. While its growth has been impressive in the last 3 years, there are important decisions to be made before using it. On this post we will recap the essentials of Docker (and containers in General) and list why, when and when not use Docker.

Virtual Machines

So let's start with a bit of history. More a less 20 years ago the industry saw a big growth in processing power, memory, storage and a significant decrease in hardware prices. Engineers realized that their applications weren't utilizing the resources effectively so they developed tools such as Virtual machines (VMs) and hypervisors to run multiple operating systems in parallel on the same server.
Source: Resellers Panel
A hypervisor is computer software, firmware or hardware that creates and runs virtual machines. The computer where the hypervisor runs is called the host, and the VM is called a guest.

Containers

The evolution of VMs is containers. Containers are packages of software that contain the application and all its dependencies making the application run quickly and reliably on any environment. A container is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. Containers run natively on Linux, share the kernel of the host and don't use more memory than any other applications.

Differences between Containers and VMS

So what's the difference between containers and VMs? While each VM has to have their own kernel, applications, libraries and services, containers don't as they share many of the host's resources. VMs are also slower to start, build, deploy and scale. Since containers also provide a way to run isolated services, are lightweight (some are only a few MBs), start fast and are easier to deploy and scale, containers became the standard today.

The image below shows a visual comparison between VMs and Containers:
Source: ZDNnet

Advantages of containers over VMs

Here are guidelines that could help you decide if you should be using containers instead of VMs:
  • lightweight: containers can be very lightweight (some are just a few MBs). It's common to run many at the same time on a development machine. 
  • platform independent: containers are platform independent thus more portable than VMs.
  • reduced costs: containers are lighter and share host resources maximizing the host's resource utilization. You'll probably be able to host 3, 4 or 5 times more containers than you host VMs per server.
  • faster: due to their smaller size, containers are faster to download, build, start, stop and remove.
  • deploy: containers are easier to deploy as the images are lightweight, start quickly and are platform independent.
  • scale out: containers are easier to scale out.
  • security: containers are usually more secure due to the reduced attack surface and isolation features at the kernel level.

Docker

So let's talk Docker. Docker first appeared in 2008 as dotCloud and became open-source in 2013. Docker is by far the most used container implementation. According to Docker Inc., more than 3.5 million Docker applications have been deployed and over 37 billion containerized applications downloaded. Docker's popularity exploded because it allowed developers to easily share, pull, and run images as simply as:
docker run -d nginx
But Docker is neither the first nor the only tool to run containers. Before it, FreeBSD jails, Google's mcty and LXC already existed. However, Docker made significant contributions to the segment since the establishment of Open Container Initiative (OCI) in 2015. Today the open standards allow tools such as Podman to offer an equivalent Docker CLI.

Why use Containers

Here are guidelines that could help you decide if you should be using containers instead of VMs.

Platform-agnostic

Due to that fact that containers run on top of the container framework (Docker Desktop for example), they abstract the platform they're running in. That's a huge enhancement from the past where companies had to replicate the exact same setup (OS, patches, libraries and frameworks) on different environments. It also simplifies due to the fact that today you can deploy your images to your own datacenter, cloud service or even better, to a managed Kubernetes cluster (such as Amazon EKS) and be confident that they'll run as they ran on your machine.

Reduced Costs

Containers share host many of the resources of the host minimizing CPU, network and storage. Plus, due to being lightweight, it's common to deploy dozens of containers per server increasing server utilization and reducing costs.

Development-Friendly

Containers simplify development because they enable developers to easily pull, run, build and deploy any application as a lightweight, portable and independent image. Plus scripts to build a single image (Dockerfiles) or applications (Docker Compose files) can be saved on Git repos or shared on public repositories such as Docker Hub, so that installing complex chains of dependencies is abbreviated by a simple docker pull command.

Faster

Due to their smaller footprint, containers are faster to run, download and start. That's a significant advantage over VMs which may take up to a couple of minutes to start. Another benefit is that building, removing and recreating a container are no longer tedious operations anymore.

More Secure

Containers are usually more secure because they use Linux features that sandbox and isolate them from other applications running on the same host. Images can be signed and, because they're immutable, they can be easily scanned (or diffed) to ensure they weren't tempered with.

And, in case you detect that one of your images were compromised, removing and recreating it would be very quick and would not affect the SLA of your application.

Lightweight

Containers are usually way more lightweight than VMs. It's possible to have many containers running at the same time on a developer's laptop without overwhelming the laptop. On the server too, it's possible to get far more apps running on the same servers since they'll reutilize many of the host's resources.

Simplified Ops

Because containers are platform-agnostic, ops can focus on infrastructure, running and monitoring the applications. Infrastructure can (and should) be easily created and removed and, by using orchestration tools such as Kubernetes, a server becomes just another host. Plus, no libraries of frameworks need to be installed on the servers, just the OS and a container runtime such as Docker.

Deployment

Deployment is also more consistent and reliable as the abstraction provided by Docker allows replicating the exact same setup on any computer and easily replicate a production-like environment on any workstation. Building images is also simple and can be easily started from Dockerfiles and bash scripts.

Scale Out

If you're using the right tools it's super easier to scale out your services. Since your images are available on a container registry, they can be easily pulled and executed on any server having the container engine installed. In theory this is as simple as requesting the scale out operation from your orchestration service. For example, if you were using Docker Compose, scaling up a service named web would be as simple as:
docker-compose scale web=3

Automated Testing

While everyone knows the benefits of unit-tests, integration testing sometimes is not that trivial. There's a lot to setup, the process is complex and recreating environments from scratch may take a long time. Tools such as Kubernetes or Docker Compose allow easily rebuilding the exact setup and run the integrated tests in an isolated fashion. For example, with Docker Compose, just define a docker-compose.yml file and let it do the hard work for you:
docker-compose up -d
./run_tests
docker-compose down

Easier CI/CD

We saw previously how containers simplify our testing pipeline including dynamic creation of environments. The same setup could be replicated and isolated to run multiple environments on the same host with different combinations of operating systems and configurations. We could even have multiple layers of chained test being cascaded and automatically deploying to production without any manual intervention.

Disaster recovery

Disaster recovery with containers is a no-brainer. Because all deployed images are available on the container registry and the because the deployment setup is already pre-specified, recovering an environment from scratch is as simple as running a script or even better: just let the orchestration tool handle it for you.

When not use Containers

But containers are not only advantages. They also bring many technical challenges and will require you to not only rethink how your system is designed but also to use different tools. Reasons not to use containers are:
  • Increased complexity: containers will increase a lot the complexity of your application as you'll need to revisit your development, integration, deployment and production practices.
  • Bye bye monolith: when developing containters, it's a good practice to break your monolith in smaller services. If your monolith is very big and mixes multiple business units, migrating to a microservice architecture will be a big undertaking. If your team does not have capacity or buying from the business, maybe it should not be a good idea.
  • Multiple databases: breaking the monolith also means breaking the database. So your development team will have to manage more databases from now on.
  • Complex security: more services means more resources to secure. Orchestration tools usually handle most of the basic security concerns but your ops will have more resources to secure.
  • Single x multiple process per host: containers are designed to run one main process. If your application has rigid rules against this rule, maybe you should reconsider sticking with it.
  • Orchestration: most of the examples you'll see on the web show containers running isolatedly however, we know our systems don't work like that. In order to leverage the potential of containers, you'll have to run an orchestration service which will by itself require additional knowledge (and potentially more people) from your team.
  • Complex backups: your resources will now be more distributed than before. While there are good practices in the field, it's important to mention that the complexity may increase.
  • Complex tracing: tracing events and errors in distributed applications is not an easy task
  • Complex logging: the complexity of logging also grows. You'll have to build a centralized logging framework so you can trace 
  • Shared kernel: if for some reason your're deploying containers share the operating system's kernel with other containers

What else to consider

Other challenges with containers come from the complexity that building them requires. There are areas that you and your team will have to understand to use them effectively, including:
  • Container Registries: remote registries that allow you to push and share your own images.
  • Orchestration: orchestration tools deploy, manage and monitor your microservices.
  • DNS and Service Discovery: with containers and microservices, you'll probably need DNS and service discovery so that your services can see and talk to each onther.
  • Key-Value Stores: provide a reliable way to store data that needs to be accessed by a distributed system or cluster.
  • Routing: routes the communication between microservices.
  • Load Balancing: load balancing in a distributed system is a complex problem. Consider specific tooling for your app.
  • Logging: microservices and distributed applications will require you to rethink your logging strategy so they're available on a central location.
  • Communication Bus: your applications will need to communicate and using a Bus is the preferred way.
  • Redundancy: necessary to guarantee that your system can sustain load and keep operating on crashes.
  • Health Checking: consistent health checking is necessary to guarantee all services are operating.
  • Self-healing: microservices will fail. Self-healing is the process of redeploying services when they crash.
  • Deployments, CI, CD: redeploying microservices is different than the traditional deployment. You'll probably have to rethink your deployments, CI and CD.
  • Monitoring: monitoring should be centralized for distributed applications.
  • Alerting: it's a good practice to have alerting systems on events triggered from your system.
  • Serverless: allows you to build and run applications and services without running the servers..
  • FaaS - Functions as a service: allows you to develop, run, and manage application functionalities without maintaining the infrastructure.

Conclusion

On this post we reviewed the most important concepts about Docker containers and presented reasons why you should and why you shouldn't use it. As with every new technology, transitioning to a microservice / distributed architecture will require significant changes from your team. However as we also saw, the benefits may be fantastic too.

References

See Also

About the Author

Bruno Hildenbrand