Tag Archives: security

Split block drivers from qemu with nbdkit

One interesting talk at KVM Forum last week was Stefan Hajnoczi‘s talk about QEMU security (sorry, it’s not online — it should eventually be available alongside all the other talks on this youtube channel).

One thing Stefan mentioned was whether QEMU might be split into multiple processes. This has advantages for security:

  1. Crashing or corrupting a single process doesn’t automatically expose the whole hypervisor.
  2. You can separately label each process using SELinux and independently control those policies, providing finer-grained security.

For block drivers you can do this today, and in fact we do this already when we run qemu from virt-v2v. Consider the case where we are using a remote HTTPS disk image:

$ qemu -drive https://remote/disk.img

drawing.svg

The curl driver linked to and running inside QEMU needs to make a remote TCP/IP connection, has to encode and decode TLS, is linked to libcurl and so on, and all those things also apply to the QEMU process. If the curl block driver has problems for any reason, these also affect QEMU. SELinux labels and transitions needed to access the socket are labels and transitions needed by the QEMU process. An exploit in the driver is a QEMU exploit.

With nbdkit we can split this out:

$ nbdkit -U - curl url=https://remote/disk.img \
  --run 'qemu -drive $nbd'

drawing2.svg

From a security point of view this has immediate advantages: If the curl driver crashes or is exploited, only nbdkit is affected. QEMU only needs access to a private Unix domain socket, and conversely nbdkit doesn’t need access to anything else that QEMU uses. You can add resource limits, separate SELinux policy, seccomp, namespaces and anything else you can think of to nbdkit to contain it tightly.

It’s worth pointing out the obvious disadvantages too: It’s likely that there will be a performance impact — although don’t discount how efficient NBD is and how this architecture also lets you scale more effectively over NUMA nodes. And this puts all our eggs into the qemu NBD client which must be very robust.

I should say also that this is more laborious to set up, and it would only really work if some other component (libvirt ideally) handled the creation of the separate nbdkit process. In the example above I used captive nbdkit, but that only works if you have a single drive, and one of the other mechanisms would be more scalable.

Advertisements

Leave a comment

Filed under Uncategorized

NBD with TLS-PSK

The Network Block Device (NBD) protocol is really useful to us when we deal with virtual machines and disk images. It lets us share disk images between machines and is also the universal protocol we use for communicating disk images between different bits of software. I wrote a pluggable NBD server called nbdkit to make this even easier.

However there was a problem: The protocol has no concept of logins. If you have an open NBD port, then anyone can connect and read or write your disk image. This is not quite as terrible as it sounds since when two processes are talking NBD to each other, we use a Unix domain socket and we hide the socket in a directory with restrictive permissions. But there are still cases — such as communicating between separate servers — where authentication would be useful.

NBD does let you upgrade the protocol to use TLS, and all the important NBD servers support that. You can use TLS to do client authentication but it’s seriously clunky and difficult to set up because you have to use X.509 certificates, and if we’ve learned anything from the web we know that X.509 is a plot by the NSA to stop us using encryption (only joking, spooks!)

It turns out there’s a more sensible corner of the TLS specification called TLS-PSK. This uses usernames and randomly generated Pre-Shared Keys (PSK). As long as you can ensure that both the client and server can read a simple username:key file of keys, and the keys are kept secret, you can both authenticate and communicate securely.

Unfortunately just implementing TLS doesn’t get you PSK as well, and no existing NBD server supports TLS-PSK.

So I had to add support. To qemu and qemu-nbd. And to nbdkit.

Amazingly it all works, and qemu and nbdkit interoperate too. Here’s how you could use it:

$ mkdir -m 0700 /tmp/keys
$ psktool -u rich -p /tmp/keys/keys.psk
$ nbdkit -n \
    --tls=require --tls-psk=/tmp/keys/keys.psk \
    file file=disk.img
$ qemu-img info \
    --object "tls-creds-psk,id=tls0,endpoint=client,username=rich,dir=/tmp/keys" \
    --image-opts "file.driver=nbd,file.host=localhost,file.port=10809,file.tls-creds=tls0"

The qemu command line is a bit clunky, but it’s overall much simpler than setting up certificates, although not as scalable for large installations.

3 Comments

Filed under Uncategorized

The boring truth: full virtualization and containerization both have their place

Containers are the future! and lots of misinformed talk.

The truth here is more boring. Both full-fat virtualization (KVM), and containers (LXC) are the future, and which you use will depend on what you want to do. You’ll probably use both. You’re probably using both now, but you don’t know it.

The first thing to say is that full virt and containers are used for different things, although there is a little bit of overlap. Containers are only useful when the kernel of all your VMs is identical. That means you can run 1000 copies of Fedora 18 as containers, probably not something you could do with full virt, but you can’t run Windows or FreeBSD or possibly not even Debian in one of those containers. Nor can your users install their own kernels.

The second important point is that containers in Linux are not secure at all. It’s moderately easy to escape from a container and take control of the host or other containers. I also doubt they’ll ever be secure because any local kernel exploit in the enormous syscall API is a host exploit when you’re using containers. This is in contrast with full virt on Fedora and RHEL where the qemu process is protected by multiple layers: process isolation, Unix permissions, SELinux, seccomp, and, yes, a container.

So containers are useless? Not at all. If you need to run lots and lots of VMs and you don’t care about multiple kernels or security, containers are just about the only game in town (although it turns out that KVM is no slouch these days).

Also containers (or the cgroups technology behind them) are being used in very innovative places. They are used in systemd for resource control, in libvirt to isolate qemu, and for sandboxing.

4 Comments

Filed under Uncategorized

CVE-2011-4127: privilege escalation from qemu / KVM guests

Paolo Bonzini discovered that you can issue SCSI ioctls to virtio devices which are passed down to the host.

The very unfortunate part about this is it easily allows guests to read and write parts of host devices that they are not supposed to. For example, if a guest was confined to host device /dev/sda3, it could read or write other partitions or the boot sector on /dev/sda.

In your guest, try this command which reads the host boot sector:

sg_dd if=/dev/vda blk_sgio=1 bs=512 count=1 of=output

Swap the if and of arguments around to exploit the host.

Here’s Paolo’s write-up on LKML.

Here is the libguestfs mitigation patch. The libvirt mitigation patch.

Leave a comment

Filed under Uncategorized

hivex 1.2.5 released

The latest version of hivex — the library for extracting and modifying Windows Registry hive files has been released. You can get the source from here.

I spent a lot of time examining real hive files from Windows machines and running the library under the awesome valgrind tool, and found one or two places where a corrupt hive file could cause hivex to read uninitialized memory. It’s not clear to me if these are security issues — I think they are not — but everyone is advised to upgrade to this version anyway.

hivex would be a great candidate for fuzz testing if anyone wants to try that.

Leave a comment

Filed under Uncategorized

Tip: Audit virtual machine for setuid files

The script after the fold searches for setuid and setgid files on a disk image or in a virtual machine. When you run it you will see output like this:

# chmod +x findsuid.ml
# ./findsuid.ml RHEL60
/dev/vg_rhel6brewx64/lv_root:/sbin/mount.nfs is setuid file
/dev/vg_rhel6brewx64/lv_root:/sbin/netreport is setgid file
/dev/vg_rhel6brewx64/lv_root:/sbin/pam_timestamp_check is setuid file
/dev/vg_rhel6brewx64/lv_root:/sbin/unix_chkpwd is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/Xorg is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/at is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/chage is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/chfn is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/chsh is setuid file
etc

You could make simple adaptions to this script to audit for all sorts of things of interest: public writeable directories, unusual SELinux labels, hard links to setuid files, over-sized files. With more work you could look for files with unusual/changed checksums, infected files and so on.

Continue reading

Leave a comment

Filed under Uncategorized

New docs: security considerations with libguestfs

… discusses security implications of using libguestfs, particularly with untrusted or malicious guests or disk images.

Leave a comment

Filed under Uncategorized