Tag Archives: qemu

Split block drivers from qemu with nbdkit

One interesting talk at KVM Forum last week was Stefan Hajnoczi‘s talk about QEMU security (sorry, it’s not online — it should eventually be available alongside all the other talks on this youtube channel).

One thing Stefan mentioned was whether QEMU might be split into multiple processes. This has advantages for security:

  1. Crashing or corrupting a single process doesn’t automatically expose the whole hypervisor.
  2. You can separately label each process using SELinux and independently control those policies, providing finer-grained security.

For block drivers you can do this today, and in fact we do this already when we run qemu from virt-v2v. Consider the case where we are using a remote HTTPS disk image:

$ qemu -drive https://remote/disk.img

drawing.svg

The curl driver linked to and running inside QEMU needs to make a remote TCP/IP connection, has to encode and decode TLS, is linked to libcurl and so on, and all those things also apply to the QEMU process. If the curl block driver has problems for any reason, these also affect QEMU. SELinux labels and transitions needed to access the socket are labels and transitions needed by the QEMU process. An exploit in the driver is a QEMU exploit.

With nbdkit we can split this out:

$ nbdkit -U - curl url=https://remote/disk.img \
  --run 'qemu -drive $nbd'

drawing2.svg

From a security point of view this has immediate advantages: If the curl driver crashes or is exploited, only nbdkit is affected. QEMU only needs access to a private Unix domain socket, and conversely nbdkit doesn’t need access to anything else that QEMU uses. You can add resource limits, separate SELinux policy, seccomp, namespaces and anything else you can think of to nbdkit to contain it tightly.

It’s worth pointing out the obvious disadvantages too: It’s likely that there will be a performance impact — although don’t discount how efficient NBD is and how this architecture also lets you scale more effectively over NUMA nodes. And this puts all our eggs into the qemu NBD client which must be very robust.

I should say also that this is more laborious to set up, and it would only really work if some other component (libvirt ideally) handled the creation of the separate nbdkit process. In the example above I used captive nbdkit, but that only works if you have a single drive, and one of the other mechanisms would be more scalable.

Advertisements

Leave a comment

Filed under Uncategorized

New in nbdkit: Create a virtual floppy disk

nbdkit is our flexible, plug-in based Network Block Device server.

While I was visiting the KVM Forum last week, one of the most respected members of the QEMU development team mentioned to me that he wanted to think about deprecating QEMU’s VVFAT driver. This QEMU driver is a bit of an oddity — it lets you point QEMU to a directory of files, and inside the guest it will see a virtual floppy containing those files:

$ qemu -drive file=fat:/some/directory

That’s not the odd thing. The odd thing is that it also lets you make the drive writable, and the VVFAT driver then turns those writes back into modifications to the host filesystem (remember that these are writes happening to raw FAT32 data structures, the driver has to infer from just seeing the writes what is happening at the filesystem level). Which is both amazing and crazy (and also buggy).

Anyway I have implemented the read-only part of this in nbdkit. I didn’t implement the write stuff because that’s very ambitious, although if you were going to implement that, doing it in nbdkit would be better than qemu since the only thing that can crash is nbdkit, not the whole hypervisor.

Usage is very simple:

$ nbdkit floppy /some/directory

This gives you an NBD source which you can connect straight to a qemu virtual machine:

$ qemu -drive nbd:localhost:10809

or examine with guestfish:

$ guestfish --ro --format=raw -a nbd://localhost -m /dev/sda1
Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

> ll /
total 2420
drwxr-xr-x 14 root root  16384 Jan  1  1970 .
drwxr-xr-x 19 root root   4096 Oct 28 10:07 ..
-rwxr-xr-x  1 root root     40 Sep 17 21:23 .dir-locals.el
-rwxr-xr-x  1 root root    879 Oct 27 21:10 .gdb_history
drwxr-xr-x  8 root root  16384 Oct 28 10:05 .git
-rwxr-xr-x  1 root root   1383 Sep 17 21:23 .gitignore
-rwxr-xr-x  1 root root   1453 Sep 17 21:23 LICENSE
-rwxr-xr-x  1 root root  34182 Oct 28 10:04 Makefile
-rwxr-xr-x  1 root root   2568 Oct 27 22:17 Makefile.am
-rwxr-xr-x  1 root root  32085 Oct 27 22:18 Makefile.in
-rwxr-xr-x  1 root root    620 Sep 17 21:23 OTHER_PLUGINS
-rwxr-xr-x  1 root root   4628 Oct 16 22:36 README
-rwxr-xr-x  1 root root   4007 Sep 17 21:23 TODO
-rwxr-xr-x  1 root root  54733 Oct 27 22:18 aclocal.m4
drwxr-xr-x  2 root root  16384 Oct 27 22:18 autom4te.cache
drwxr-xr-x  2 root root  16384 Oct 28 10:04 bash
drwxr-xr-x  5 root root  16384 Oct 27 18:07 common
[etc]

Previously … create ISO images on the fly in nbdkit

Leave a comment

Filed under Uncategorized

NBD with TLS-PSK

The Network Block Device (NBD) protocol is really useful to us when we deal with virtual machines and disk images. It lets us share disk images between machines and is also the universal protocol we use for communicating disk images between different bits of software. I wrote a pluggable NBD server called nbdkit to make this even easier.

However there was a problem: The protocol has no concept of logins. If you have an open NBD port, then anyone can connect and read or write your disk image. This is not quite as terrible as it sounds since when two processes are talking NBD to each other, we use a Unix domain socket and we hide the socket in a directory with restrictive permissions. But there are still cases — such as communicating between separate servers — where authentication would be useful.

NBD does let you upgrade the protocol to use TLS, and all the important NBD servers support that. You can use TLS to do client authentication but it’s seriously clunky and difficult to set up because you have to use X.509 certificates, and if we’ve learned anything from the web we know that X.509 is a plot by the NSA to stop us using encryption (only joking, spooks!)

It turns out there’s a more sensible corner of the TLS specification called TLS-PSK. This uses usernames and randomly generated Pre-Shared Keys (PSK). As long as you can ensure that both the client and server can read a simple username:key file of keys, and the keys are kept secret, you can both authenticate and communicate securely.

Unfortunately just implementing TLS doesn’t get you PSK as well, and no existing NBD server supports TLS-PSK.

So I had to add support. To qemu and qemu-nbd. And to nbdkit.

Amazingly it all works, and qemu and nbdkit interoperate too. Here’s how you could use it:

$ mkdir -m 0700 /tmp/keys
$ psktool -u rich -p /tmp/keys/keys.psk
$ nbdkit -n \
    --tls=require --tls-psk=/tmp/keys/keys.psk \
    file file=disk.img
$ qemu-img info \
    --object "tls-creds-psk,id=tls0,endpoint=client,username=rich,dir=/tmp/keys" \
    --image-opts "file.driver=nbd,file.host=localhost,file.port=10809,file.tls-creds=tls0"

The qemu command line is a bit clunky, but it’s overall much simpler than setting up certificates, although not as scalable for large installations.

3 Comments

Filed under Uncategorized

Fedora/RISC-V: Runnable stage 4 disk images

We’ve now got:

  1. An autobuilder.
  2. A multithreaded QEMU.
  3. A Fedora RPMs repository.
  4. A bootable disk image.

It’s unpolished and minimal at the moment, but what you can do today (if you have a Fedora 27+ x86_64 host):

  1. Enable the rjones/riscv copr and install riscv-qemu.
  2. Download the stage4-disk.img, and bbl and uncompress the disk image.
  3. Run this command:
    qemu-system-riscv64 \
        -nographic -machine virt -m 2G -smp 4 \
        -kernel bbl \
        -append "console=ttyS0 ro root=/dev/vda init=/init" \
        -device virtio-blk-device,drive=hd0 \
        -drive file=stage4-disk.img,format=raw,id=hd0 \
        -device virtio-net-device,netdev=usernet \
        -netdev user,id=usernet
    
  4. Inside the guest drop a repo file into /etc/yum.repos.d containing:
    [local]
    name=RPMS
    baseurl=https://fedorapeople.org/groups/risc-v/RPMS/
    enabled=1
    gpgcheck=0
    
  5. Use tdnf --releasever 27 install ... to install more packages.

3 Comments

Filed under Uncategorized

Fedora/RISC-V: the final bootstrap

There are bootable (but very minimal) disk images built cleanly from RPMs: https://fedorapeople.org/groups/risc-v/disk-images/

More soon …

1 Comment

Filed under Uncategorized

Tip: Changing the qemu product name in libguestfs

20:30 < koike> Hi. Is it possible to configure the dmi codes for libguestfs? I mean, I am running cloud-init inside a libguestfs session (through python-guestfs) in GCE, the problem is that cloud-init reads /sys/class/dmi/id/product_name to determine if the machine is a GCE machine, but the value it read is Standard PC (i440FX + PIIX, 1996) instead of the expected Google Compute Engine so cloud-init fails.

The answer is yes, using the guestfs_config API that lets you set arbitrary qemu parameters:

g.config('-smbios',
         'type=1,product=Google Compute Engine')

Leave a comment

Filed under Uncategorized

virt-builder Debian 9 image available

Debian 9 (“Stretch”) was released last week and now it’s available in virt-builder, the fast way to build virtual machine disk images:

$ virt-builder -l | grep debian
debian-6                 x86_64     Debian 6 (Squeeze)
debian-7                 sparc64    Debian 7 (Wheezy) (sparc64)
debian-7                 x86_64     Debian 7 (Wheezy)
debian-8                 x86_64     Debian 8 (Jessie)
debian-9                 x86_64     Debian 9 (stretch)

$ virt-builder debian-9 \
    --root-password password:123456
[   0.5] Downloading: http://libguestfs.org/download/builder/debian-9.xz
[   1.2] Planning how to build this image
[   1.2] Uncompressing
[   5.5] Opening the new disk
[  15.4] Setting a random seed
virt-builder: warning: random seed could not be set for this type of guest
[  15.4] Setting passwords
[  16.7] Finishing off
                   Output file: debian-9.img
                   Output size: 6.0G
                 Output format: raw
            Total usable space: 3.9G
                    Free space: 3.1G (78%)

$ qemu-system-x86_64 \
    -machine accel=kvm:tcg -cpu host -m 2048 \
    -drive file=debian-9.img,format=raw,if=virtio \
    -serial stdio

6 Comments

Filed under Uncategorized