nbdkit is our flexible, plug-in based Network Block Device server.
While I was visiting the KVM Forum last week, one of the most respected members of the QEMU development team mentioned to me that he wanted to think about deprecating QEMU’s VVFAT driver. This QEMU driver is a bit of an oddity — it lets you point QEMU to a directory of files, and inside the guest it will see a virtual floppy containing those files:
$ qemu -drive file=fat:/some/directory
That’s not the odd thing. The odd thing is that it also lets you make the drive writable, and the VVFAT driver then turns those writes back into modifications to the host filesystem (remember that these are writes happening to raw FAT32 data structures, the driver has to infer from just seeing the writes what is happening at the filesystem level). Which is both amazing and crazy (and also buggy).
Anyway I have implemented the read-only part of this in nbdkit. I didn’t implement the write stuff because that’s very ambitious, although if you were going to implement that, doing it in nbdkit would be better than qemu since the only thing that can crash is nbdkit, not the whole hypervisor.
Usage is very simple:
$ nbdkit floppy /some/directory
This gives you an NBD source which you can connect straight to a qemu virtual machine:
$ qemu -drive nbd:localhost:10809
or examine with guestfish:
$ guestfish --ro --format=raw -a nbd://localhost -m /dev/sda1
Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.
Type: ‘help’ for help on commands
‘man’ to read the manual
‘quit’ to quit the shell
> ll /
drwxr-xr-x 14 root root 16384 Jan 1 1970 .
drwxr-xr-x 19 root root 4096 Oct 28 10:07 ..
-rwxr-xr-x 1 root root 40 Sep 17 21:23 .dir-locals.el
-rwxr-xr-x 1 root root 879 Oct 27 21:10 .gdb_history
drwxr-xr-x 8 root root 16384 Oct 28 10:05 .git
-rwxr-xr-x 1 root root 1383 Sep 17 21:23 .gitignore
-rwxr-xr-x 1 root root 1453 Sep 17 21:23 LICENSE
-rwxr-xr-x 1 root root 34182 Oct 28 10:04 Makefile
-rwxr-xr-x 1 root root 2568 Oct 27 22:17 Makefile.am
-rwxr-xr-x 1 root root 32085 Oct 27 22:18 Makefile.in
-rwxr-xr-x 1 root root 620 Sep 17 21:23 OTHER_PLUGINS
-rwxr-xr-x 1 root root 4628 Oct 16 22:36 README
-rwxr-xr-x 1 root root 4007 Sep 17 21:23 TODO
-rwxr-xr-x 1 root root 54733 Oct 27 22:18 aclocal.m4
drwxr-xr-x 2 root root 16384 Oct 27 22:18 autom4te.cache
drwxr-xr-x 2 root root 16384 Oct 28 10:04 bash
drwxr-xr-x 5 root root 16384 Oct 27 18:07 common
Previously … create ISO images on the fly in nbdkit
I wrote supernested a few years ago to see if I could break nested KVM. It works by repeatedly nesting KVM guests until either something breaks or the whole thing grinds to a halt. Even on my very fastest machine I can only get to an L4 guest (L0 = host, L1 = normal guest).
Kashyap and Thomas Huth resurrected the QEMU Advent Calendar this year, and today (day 13) supernested is featured.
Please note that supernested should only be run on idle machines which aren’t doing anything else, and it can crash the machine.
Red Hat provide RHEL KVM guest and cloud images. At time of writing, the last one was built in Feb 2015, and so undoubtedly contains packages which are out of date or insecure.
You can use virt-customize to update the packages in the cloud image. This requires the libguestfs subscription-manager feature which will only be available in RHEL 7.3, but see here for RHEL 7.3 preview packages. Alternatively you can use Fedora ≥ 22.
$ virt-customize \
-a rhel-guest-image-7.1-20150224.0.x86_64.qcow2 \
--sm-credentials 'USERNAME:password:PASSWORD' \
--sm-register --sm-attach auto \
[ 0.0] Examining the guest ...
[ 17.2] Setting a random seed
[ 17.2] Registering with subscription-manager
[ 28.8] Attaching to compatible subscriptions
[ 61.3] Updating core packages
[ 976.8] Finishing off
- You should probably use
--sm-credentials USERNAME:file:FILENAME to specify your password using a file, rather than having it exposed on the command line.
- The command above will leave the image template registered to RHN. To unregister it, add
--sm-unregister at the end.
Assuming HMG can get my passport back to me in time, I am speaking at the KVM Forum 2015 in Seattle USA (full schedule of talks here).
I’m going to be talking about virt-v2v and new features of qemu/KVM that made it possible for virt-v2v to be faster and more reliable than ever.
Regular readers of this blog will of course be familiar with the joys of virtualization. One of those joys is nested virtualization — running a virtual machine in a virtual machine. Nested KVM is a thing too — that is, emulating the virtualization extensions in the CPU so that the second level guest gets at least some of the acceleration benefits that a normal first level guest would get.
My question is: How deeply can you nest KVM?
This is not so easy to test at the moment, so I’ve created a small project / disk image which when booted on KVM will launch a nested guest, which launches a nested guest, and so on until (usually) the host crashes, or you run out of memory, or your patience is exhausted by the poor performance of nested KVM.
The answer, by the way, is just 3 levels [on AMD hardware], which is rather disappointing. Hopefully this will encourage the developers to take a closer look at the bugs in nested virt.
Git repo: http://git.annexia.org/?p=supernested.git;a=summary
Binary images: http://oirase.annexia.org/supernested/
How does this work?
Building a simple appliance is easy. I’m using supermin to do that.
The problem is how does the appliance run another appliance? How do you put the same appliance inside the appliance? Obviously that’s impossible (right?)
The way it works is inside the Lx hypervisor it runs the L(x+1) qemu on
/dev/sda, with a protective overlay stored in memory so we don’t disrupt the Lx hypervisor. Since
/dev/sda literally is the appliance disk image, this all kinda works.
This is mostly adapted from this long thread on the VMware community site.
I got VMware ESXi 5.5.0 running on upstream KVM today.
First I had to disable the “VMware backdoor”. When VMware runs, it detects that qemu underneath is emulating this port and tries to use it to query the machine (instead of using CPUID and so on). Unfortunately qemu’s emulation of the VMware backdoor is very half-assed. There’s no way to disable it except to patch qemu:
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index eaf3e61..ca1c422 100644
@@ -204,7 +204,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
pc_vga_init(isa_bus, pci_enabled ? pci_bus : NULL);
/* init basic PC hardware */
- pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy, xen_enabled(),
+ pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy, 1,
It would be nice if this was configurable in qemu. This is now being fixed upstream.
Secondly I had to turn off MSR emulation. This is, unfortunately, a machine-wide setting:
# echo 1 > /sys/module/kvm/parameters/ignore_msrs
# cat /sys/module/kvm/parameters/ignore_msrs
Thirdly I had to give the ESXi virtual machine an IDE disk and an
network card. Note also that ESXi requires ≥ 2 vCPUs and at least 2 GB of RAM.