Software development is not only about writing the code. From my experience I may tell that writing code itself is not that difficult (get some data from database, show it in UI, process this data, save back to DB, call some external services, etc - tasks are quite common). But that is not all. We also need to ensure that non-functional requirements are met. I.e. that:
- app handles errors/exceptions correctly
- works safely (authentication/authorization/policies etc)
- has good performance
- scales well for increased loads
- allows to quickly find the cause of an error/problem in production environment
- supports updates (preferably without downtime)
- not expensive for maintenance and support during its life cycle
- fault-tolerant to hardware failures
- etc.
And if we will put it this together story becomes not that easy. Let's look at an example. Assume that we have some system which consists of several components which run on the same single server:
How to ensure scalability and fault tolerance of this system? The first thing that comes to mind is to move application components from one physical server to virtual machines, run then on multiple physical servers and provide orchestration between them (setup cluster):
If one VM or server will fall, the rest will continue to work. We won't go here into details how to organize the orchestration of such a cluster (routing of http requests with network load balancer, replication of database servers, distributed caching, logging, etc.) - it is out of scope of the current article. Here we just need to understand the problem so let's continue.
How we just saw with such a cluster scaling and fault tolerance of the system got improved. However if we will look inside virtual machines we'll find that over situation with components and dependencies didn't change much. I.e. different components use different dependencies and it is possible that one application needs certain version of some library, while another needs a different version of the same library. I.e. we need to keep several different versions of the same library in one system.
Also how to update such cluster? Upload updates to each VM and update all components one by one. Yes it is possible to do it that way but what will happen when number of components will grow and number of environments where these components need to be run will also increase? In this case dependencies for different applications are accumulated and we get so-called "matrix of hell":
Maintenance cost of such system will increase as much as many components and environments will be added.
How we can improve that? E.g. if we would be able to package component/service of our distributed system along with all the necessary dependencies, configuration, environment variables, etc. into "something" that would allow us to run this component/service in any environment on any OS (on development machine, standalone server, in cluster, on production stand), then we could simply transfer this "something" between environments:
Here containers and Docker come to the scene. When we talk about containers, first thing which we may imagine is a huge barge carrying cargo containers:
In context of software development this image is a pretty good analogy. As we will see below Docker image is kind of cargo container which contains components and all its dependencies inside. That's why Docker logo looks like a whale carrying cargo containers on its back:
The term "container" came from UNIX-based operating systems. Originally term "jail" was used, but "container" has become the preferred term since 2005 with the release of Sun Solaris 10 and Sun Containers. Container is isolated runtime environment for an app that prevents that app from accessing resources outside its container (allowing access only to those resources that are explicitly allowed).
However manual creation and configuration of containers is quite complex and error-prone process. Docker is used to solve this problem. In context of Docker containers are child processes of Docker background service (Docker daemon). Any software running with Docker runs inside a container.
Containers are launched from images. As it was mentioned above Docker image is good analogue of the cargo container. Images are stored in repositories, which in turn are organized into registries. The most well-known public image registry is DockerHub. Also it is possible to run own private local image registry within the company.
Docker consists from several parts:
- CLI tool
- background service (daemon)
- set of remote services (DockerHub, JFrog, etc.)
Together they simplify management of containers and allow to build own container management infrastructure:
Docker is open source. Although it came from the Linux world, but also runs on Windows (on top of Hyper-V or WSL2) and MacOS. Note however that although it is quite easy to run Linux containers (containers with Linux runtime) in Docker running in Windows (as well as Windows containers of course) since under the hood WSL2 is a lightweight virtual machine with a real Linux Kernel, but running a Windows container in Docker on Linux is not that easy:
There are solutions for that but not that straightforward (e.g. you may run Windows Server Core OS inside VirtualBox, which in turn runs inside Docker container on Linux or use Wine shell). Also licensing issues should be solved since Windows is not free.
Note that container is not the same as virtual machine:
Virtual machines:
- launch own OS in which the installed software runs
- require more resources (an average PC can only run a few VMs)
- start slower
- support snapshots which is good. But snapshots have own problems: large size, issues with diffs tracking and versioning
- from one set of VMX/VMDK files only one VM can be launched.
Containers from other side:
- run on the same host OS kernel
- require less resources (on an average PC you can run many containers at the same time)
- start within a few seconds
- changes are added as an additional layer in union file system: possible to track changes and view history
- it is possible to start many containers from one image.
Now with this knowledge we may solve matrix of hell mentioned above using Docker containers:
But there is new question: how to manage this matrix? 🙂 Here we come to containers orchestration technologies like Kubernetes, Docker Swarm, etc. This topic is out of scope of the current article (plan to write about that later as well).
And at the end example of how Docker may help developers at everyday work. As developer you may need to run different versions of some database engine simultaneously in order to test functionalities on these versions. Docker is perfect tool for that. E.g. if you run PostgreSQL 16 on you host OS and want to test code on older PostgreSQL 10 you need only 2 commands for that:
docker pull postgres:10 docker run -d -p 5432:5432 --name postgres10 -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=postgres -it postgres:10
Here I used host port 5432 because don't have any Postgres version running on my host (to be true with Docker I don't want to install any db engines to my host anymore 🙂 ) i.e. this port is not busy. Otherwise just use different host port and map it to internal port 5432 used by Postgres inside container e.g. "-p 6432:5432"). After that you may connect to db engine and work with it as usual:
That's all I wanted to write about Docker here. Hope that this information will help you to understand this technology, will motive to learn it further and use it in your work.