Objectives

Upon completion of this lesson, you will be able to:

  • explain the benefits of using containers over running applications on the same unvirtualized operating system.
  • describe the mechanisms used by Linux Containers (LXC) to create isolated process groups, including namespaces and control groups.
  • compare and contrast LXC and Docker, including how Docker manages container file systems through a union file system.
  • create and manage their own containers using either LXC or Docker, including configuring file systems, network interfaces, and isolated processes.
  • evaluate the performance implications and potential security risks of using containers for application deployment and identify scenarios where containers may be appropriate.

Overview

This topic discusses Linux Containers (LXC) and Docker, which are lightweight forms of virtualization that enable multiple isolated user-space instances to run on a single operating system kernel. LXC uses namespaces and control groups to create separate containers, each with its own file system, network interfaces, and isolated processes, while Docker manages container file systems through a union file system that allows various environments to be stacked. Both LXC and Docker provide security, performance isolation, management isolation, and packaging convenience benefits, making them popular tools for deploying and running applications.

Linux Containers and Docker

Running different applications within separate virtual machines provides a number of benefits when compared to running them all on the same unvirtualized operating system:

Security: If one application is compromised, or is untrusted, the only way for it to attack the other applications (other than via the external network) is by subverting the hypervisor. This is difficult, as they are small and have tended to be quite reliable in practice (i.e. with few bugs that can be exploited).

Performance isolation: Server-class virtualization systems can enforce resource limits (memory, CPU time, disk and network I/O) which ensure that heavy loads on one application do not negatively impact another.

Management isolation: Each virtual machine has its own file system, administrative (root) account, installed libraries, etc. and can be configured without regard to the dependencies of other applications running in other virtual machines.

Packaging convenience: A virtual machine image is a convenient and useful format for storing a virtual machine and all of its configuration, as well as for distributing it to others.

Note, however, that none of these benefits actually requires hardware virtualization. If all of the applications are going to be running on the same operating system, then operating system virtualization can be used: rather than pretending that a single hardware machine is actually multiple virtual machines, we pretend that a single instance of the operating system is actually multiple instances. This approach was first used in FreeBSD Jails and Solaris Containers, but the mostly widely known example today is Linux Containers (LXC) and Docker.

Process Groups

LXC allows the creation of isolated process groups: each process in such a group (and any children of those processes) thinks that the group has the entire operating system to itself. This is done via two mechanisms:

Namespaces: In recent Linux versions, any access to the file system, process ID, networking, user or group IDs, or several more obscure system parameters (e.g. hostname) is relative to a namespace. In a normal system with no containers there will be a single namespace, visible to all processes (or at least those that have sufficient permission, in the case of e.g. file system access). However you can also create new namespaces, with a restricted view of the file system (e.g., only able to see a small subtree), with their own process ID space and user names and IDs, and separate network interfaces and addresses. Within a namespace you can have a root user which can perform privileged operations within the namespace, but which has no power or visibility outside of it.

Control groups (cgroups): These are used to control operating system allocation of resources such as memory, CPU time, or disk and network bandwidth. A cgroup can be associated with a process group, and the process group as a whole will be subject to any limits (e.g., on memory, CPU time, etc.) placed on that cgroup.

The combination of these two features allows the creation of separate containers, each with its own file system, network interfaces, etc., and where processes within a container are isolated from those in other containers or in the “base” or root operating system within which these containers were created. Processes in a container interact with the OS kernel in exactly the same way as in a non-containerized system; the only difference is in what they see, which is controlled via namespaces, and how their CPU time and I/O are scheduled, which is controlled via cgroups. Containers are thus more efficient, as there is no virtualization overhead, and can be created almost as quickly as normal processes.

Since there is still a single operating system kernel, all containers in a system share the same operating system version. Note, however, that they may have entirely different file systems; thus it is quite possible to have both a Red Hat and an Ubuntu distribution running in separate containers on the same machine, although each will be using the kernel of the underlying system.

Docker is based on LXC; however perhaps its main innovation is the way in which it manages container file systems. A Docker container uses a union file system to join together multiple file systems – the first one of which is writable, and with one or more read-only ones “behind” it. To understand the operation of a union file system, consider how the Unix shell finds an executable: it searches each directory in the PATH variable in order, and takes the first version of the file that it finds. Thus if the value of PATH is /usr/bin:/sbin:/bin, and you type ls, it will search /usr/bin/ls (not found), /sbin/ls (not found), and then /bin/ls (successful). A union file system operates in a similar fashion: on read access to a file (or directory) it will search through each underlying file system in order until it is found. When writing to a file, however, it will write to the first writable file system in the list, providing a form of copy-on-write.

This allows various environments to be stacked: e.g., a base file system containing the files from a minimal Linux installation, with additional file systems “on top” of it containing installed versions of various packages one wishes to use, and a writable file system on top for per-instance configuration parameters, application data, etc.

References

Desnoyers, Peter (Fall 2020). How Operating Systems Work, Unpublished Manuscript. Fall 2020 Edition, Northeastern University.

Acknowledgement

This lesson is based on work by Desnoyers, Peter (Fall 2020). How Operating Systems Work, Fall 2020 Edition, Northeastern University, that is provided under Creative Commons License 4.0.

Inspiration for some sections was drawn through interactions with ChatGPT-3.5.

Errata

None collected yet. Let us know.