As of libguestfs 1.23.16, the User-Mode Linux backend is now a supported feature upstream, meaning that at least it gets tested fully for each release.
I did some performance tests on the User-Mode Linux backend compared to the ordinary KVM-based appliance and the results are quite interesting.
The first test is to run the C API test suite using UML and KVM on baremetal. All times are in seconds, averaged over a few runs:
tests/c-api (baremetal) — UML: 630 — KVM: 332
UML is roughly half the speed, but do remember that the test is very system-call intensive, which is one of the worst cases for UML.
The same test again, but performed inside a KVM virtual machine (on the same hardware):
tests/c-api (virtualized) — UML: 334 — KVM: 961
The results of this are so surprising I went back and retested everything several times, but this is completely reproducible. UML runs the C API test suite about twice as fast virtualized as on baremetal.
KVM (no surprise) runs several times slower. Inside the VM there is no hardware virtualization, and so qemu-kvm has to fall back on TCG software emulation of everything.
One conclusion you might draw from this is that UML could be a better choice of backend if you want to use libguestfs inside a VM (eg. in the cloud). As always, you should measure your own workload.
The second test is of start-up times. If you want to use libguestfs to process a lot of disk images, this matters.
start-up (baremetal) — UML: 3.9 — KVM: 3.7
start-up (virtualized) — UML: 3.0 — KVM: 8-11
The start-up time of KVM virtualized was unstable, but appeared to be around 3 times slower than on baremetal. UML performs about the same in both cases.
A couple of conclusions that I take from this:
(1) Most of the time is now spent initializing the appliance, searching for LVM and RAID and so on. The choice of hypervisor makes no difference. This is never going to go away, even if libguestfs was rewritten to use (eg) containers, or if libguestfs linked directly to kernel code. It just takes this time for this kernel & userspace LVM/MD/filesystem code to initialize.
(2) The overhead of starting a KVM VM is not any different from starting a big Linux application. This is no surprise for people who have used KVM for a long time, but it’s counter-intuitive for most people who think that VMs “must” be heavyweight compared to ordinary processes.
The third test is of uploading data from the host into a disk image. I created a 1 GB disk image containing an ext2 filesystem, and I timed how long it took to upload 500 MB of data to a file on this filesystem.
upload (baremetal) — UML: 147 — KVM: 16
upload (virtualized) — UML: 149 — KVM: 73
KVM is predictably much slower when no hardware virtualization is available, by a factor of about 4.5 times.
UML is overall far slower than KVM, but it is at least consistent.
In order to work out why UML is so much slower, I wanted to find out if it was because of the emulated serial port that we push the data through, or because writes to the disk are slow, so I carried out some extra tests:
upload-no-write (baremetal) — UML: 141 — KVM: 11
upload-no-write (virtualized) — UML: 140 — KVM: 20
write-no-upload (baremetal) — UML: 7 — KVM: 13
write-no-upload (virtualized) — UML: 9 — KVM: 25
My conclusion is that the UML emulated serial device is over 10 times slower than KVM’s virtio-serial. This is a problem, but at least it’s a well-defined problem the UML team can fix with an example (virtio-serial) that it’s possible to do much better.
Finally, notice that UML appears faster than KVM at writes.
In fact what’s happening is a difference in caching modes: For safety, libguestfs forces KVM to bypass the host disk cache. This ensures that modifications made to disk images remain consistent even if there is a sudden power failure.
The UML backend currently uses the host cache, so the writes weren’t hitting the disk before the test finished (this is in fact a bug in UML since libguestfs performs an fsync inside the appliance, which UML does not honour).
As always with benchmarks, the moral is to take everything with a pinch of salt and measure your workloads!