For more half-baked ideas, see the ideas tag.
Containers offer a way to do limited virtualization with fewer resources. But a lot of people have belatedly realized that containers aren’t secure, and so there’s a trend for putting containers into real virtual machines.
Unfortunately qemu is not very well suited to just running a single instance of the Linux kernel, as we in the libguestfs community have long known. There are at least a couple of problems:
- You have to allocate a fixed amount of RAM to the VM. This is basically a guess. Do you guess too large and have memory wasted in guest kernel structures, or do you guess too small and have the VM fail at random?
- There’s a large amount of overhead — firmware, legacy device emulation and other nonsense — which is essentially irrelevant to the special case of running a Linux appliance in a VM.
Here’s the half-baked idea: Let’s make a qemu “container mode/machine” which is better for this one task.
Unlike other proposals in this area, I’m not suggesting that we throw away or rewrite qemu. That’s stupid, as qemu gives us lots of useful abilities.
Instead the right way to do this is to implement a special
virtio-ram device where the guest kernel can start off with a very tiny amount of RAM and request more memory on demand. And an empty machine type which is just for running appliances (qemu on ARM already has this: mach-virt).
Libguestfs people and container people, all happy. What’s not to like?
Containers are the future! and lots of misinformed talk.
The truth here is more boring. Both full-fat virtualization (KVM), and containers (LXC) are the future, and which you use will depend on what you want to do. You’ll probably use both. You’re probably using both now, but you don’t know it.
The first thing to say is that full virt and containers are used for different things, although there is a little bit of overlap. Containers are only useful when the kernel of all your VMs is identical. That means you can run 1000 copies of Fedora 18 as containers, probably not something you could do with full virt, but you can’t run Windows or FreeBSD or possibly not even Debian in one of those containers. Nor can your users install their own kernels.
The second important point is that containers in Linux are not secure at all. It’s moderately easy to escape from a container and take control of the host or other containers. I also doubt they’ll ever be secure because any local kernel exploit in the enormous syscall API is a host exploit when you’re using containers. This is in contrast with full virt on Fedora and RHEL where the qemu process is protected by multiple layers: process isolation, Unix permissions, SELinux, seccomp, and, yes, a container.
So containers are useless? Not at all. If you need to run lots and lots of VMs and you don’t care about multiple kernels or security, containers are just about the only game in town (although it turns out that KVM is no slouch these days).
Also containers (or the cgroups technology behind them) are being used in very innovative places. They are used in systemd for resource control, in libvirt to isolate qemu, and for sandboxing.