Upon completion of this lesson, you will be able to:
Most programmers are likely using either VirtualBox, Parallels, or VMware Workstation to run a Linux, Windows, or MacOS on separate virtual machine on your physical Windows or MacOS computer. Mostly to ensure that installing new software does not cause any issues or present a security risk.
But what does this mean? How does it work? In this lesson you will learn how a hypervisor can run multiple operating systems (or multiple copies of the same operating system), much like how a normal operating system can run multiple processes. You will learn a little bit about the applications of virtualization, and then will study the different mechanisms which can be used to virtualize the operating system-specific parts of a CPU (e.g., address translation) through a virtual machine using traditional and modern virtualization products as examples.
In addition to virtual machines, containers provide another useful operating system abstraction mechanism. Containers are a lightweight form of virtualization that enables multiple isolated user-space instances to run on a single operating system kernel. Each container shares the same kernel as the host operating system, but has its own file system, process space, and network interface, providing a secure and efficient way to deploy and run applications.
Hardware virtualization is a technique that allows multiple “virtual machines”, i.e., “computers in ‘software’) to run on the same system. You may have run Linux on your MacOS within VMWare or Parallels; that is an example of using a hypervisor (e.g., VMWare) to run a virtual machine and then install an operating (e.g., Linux) on that virtual machine. From the perspective of Linux, it has no idea that it is running on a”virtual computer”.
But what is a virtual machine (or VM for short)? It’s a machine, able to run a complete operating system, and frequently equivalent in almost every way to the “bare metal” hardware. It’s virtual, implemented in some part in software, as shown here:
Virtual machines are commonly used for several purposes:
Multiple operating systems: Running a different operating system on a computer, for instance to run a Linux instance on your Windows or Mac laptop so you can run Windows-only software on your Mac or do development in Linux on your Windows machine.
Multiple configurations: Running multiple copies of the same operating system. This may be needed because different copies require different, incompatible versions of systems libraries or different configurations, separately-administered machines on the same piece of hardware.
Security: Running applications that require administrative privileges is often times safer being run on a virtual machine where each client can be given full root or administrative privileges to configure the machine as they wish, without posing a thread to other clients on different virtual machines. (e.g., Amazon EC2 gives you virtual machines that you can configure, without risk from other users).
Testing: Developers can build a test environment in a virtual machine and ensure that their development environment does not “pollute” the test environment – and vice versa.
One of the primary uses of virtual machines is for what is known as server consolidation, or running multiple network servers on the same hardware. But wait, you might say, isn’t allowing multiple users the whole goal of a multi-user operating system? Yes and no. Multi-user operating systems like Unix were designed to allow multiple users to access a set of applications at the same time, without interfering with other users. Network servers are different, and although most of the differences could have been handled by multi-user operating systems, they weren’t.
Network addresses: Servers are typically distinguished by address, but neither Unix nor Windows has any mechanism for assigning permissions to addresses and allowing one user but not another to access them. HTTP allows multiple names on the same address, called virtual servers, but all requests still have to go through the same web server process.
Software installation: Software installation on a multi-user system requires administrative privileges, in part to protect against trojans and other malware that a user might install. Although it is possible to install a program for a single user, proprietary software typically includes a fixed installation script which often does not provide for this.
Dependencies: Software may depend on specific versions of other system programs or libraries, causing programs to be incompatible when they each require a different version. In some cases, two different programs may require different versions of the operating system.
If you are running a low-cost web server business, it is likely that you will choose to put multiple users on the same multi-user system; the users will tolerate restrictions on software and versions in return for a low price. However, if you are maintaining a number of network servers for an enterprise, you may be stuck with the specific software and operating systems of each server. In that case, virtualization will allow you to run them all on a single physical machine.
If you are used to running VirtualBox or VMware on your laptop, it may seem like it’s just another program, maybe using more memory and CPU than most. But it isn’t. To understand why, consider trying to run Linux (the “guest” operating system) on top of Windows. The Linux kernel is an executable file, typically found in /boot/vmlinux – with the right tools it should be possible to translate it into a Windows executable. However if you tried to run it you would find the following problems:
Privileged instructions: One of the first things the kernel does on startup is to initialize the virtual memory system, mapping virtual addresses to physical addresses. This requires privileged-mode instructions – as you saw in previous modules, applications aren’t allowed to access these instructions, as they could use them to bypass any operating system protections. The first such instruction executed by the guest OS will cause an illegal instruction trap and kill the process.
Interference: The problem isn’t just that the guest OS won’t be allowed to modify virtual address mappings. Both operating systems “know” that they have a physical address space available – starting at address 0 for the PC architecture – and that they are responsible for allocating it. Even if the host OS allocated some physical memory for the guest, and trusted the guest to use only that memory, there’s no way to tell the guest what memory to use.
Security: Secure isolation between virtual machines, including memory protection, is at least as important as isolation between processes in a normal operating system. But if a guest operating system has direct access to the CPU address translation mechanisms it can easily access physical memory allocated to another virtual machine (or to the host OS itself), bypassing any security mechanisms.
I/O: A process running under Linux or Windows uses system calls such as open and read to access files. An operating system uses drivers to access physical devices.
The remainder of this lesson explains how these problems may be avoided, starting with the simplest (and lowest-performance) solutions and finishing with the mechanisms currently in use.
Desnoyers, Peter (Fall 2020). How Operating Systems Work, Unpublished Manuscript. Fall 2020 Edition, Northeastern University.
This lesson is based on work by Desnoyers, Peter (Fall 2020). How Operating Systems Work, Fall 2020 Edition, Northeastern University, that is provided under Creative Commons License 4.0.
Inspiration for some sections was drawn through interactions with ChatGPT-3.5.
None collected yet. Let us know.