Tag Archives: security

Split block drivers from qemu with nbdkit

One interesting talk at KVM Forum last week was Stefan Hajnoczi‘s talk about QEMU security (sorry, it’s not online — it should eventually be available alongside all the other talks on this youtube channel).

One thing Stefan mentioned was whether QEMU might be split into multiple processes. This has advantages for security:

  1. Crashing or corrupting a single process doesn’t automatically expose the whole hypervisor.
  2. You can separately label each process using SELinux and independently control those policies, providing finer-grained security.

For block drivers you can do this today, and in fact we do this already when we run qemu from virt-v2v. Consider the case where we are using a remote HTTPS disk image:

$ qemu -drive https://remote/disk.img


The curl driver linked to and running inside QEMU needs to make a remote TCP/IP connection, has to encode and decode TLS, is linked to libcurl and so on, and all those things also apply to the QEMU process. If the curl block driver has problems for any reason, these also affect QEMU. SELinux labels and transitions needed to access the socket are labels and transitions needed by the QEMU process. An exploit in the driver is a QEMU exploit.

With nbdkit we can split this out:

$ nbdkit -U - curl url=https://remote/disk.img \
  --run 'qemu -drive $nbd'


From a security point of view this has immediate advantages: If the curl driver crashes or is exploited, only nbdkit is affected. QEMU only needs access to a private Unix domain socket, and conversely nbdkit doesn’t need access to anything else that QEMU uses. You can add resource limits, separate SELinux policy, seccomp, namespaces and anything else you can think of to nbdkit to contain it tightly.

It’s worth pointing out the obvious disadvantages too: It’s likely that there will be a performance impact — although don’t discount how efficient NBD is and how this architecture also lets you scale more effectively over NUMA nodes. And this puts all our eggs into the qemu NBD client which must be very robust.

I should say also that this is more laborious to set up, and it would only really work if some other component (libvirt ideally) handled the creation of the separate nbdkit process. In the example above I used captive nbdkit, but that only works if you have a single drive, and one of the other mechanisms would be more scalable.

Leave a comment

Filed under Uncategorized


The Network Block Device (NBD) protocol is really useful to us when we deal with virtual machines and disk images. It lets us share disk images between machines and is also the universal protocol we use for communicating disk images between different bits of software. I wrote a pluggable NBD server called nbdkit to make this even easier.

However there was a problem: The protocol has no concept of logins. If you have an open NBD port, then anyone can connect and read or write your disk image. This is not quite as terrible as it sounds since when two processes are talking NBD to each other, we use a Unix domain socket and we hide the socket in a directory with restrictive permissions. But there are still cases — such as communicating between separate servers — where authentication would be useful.

NBD does let you upgrade the protocol to use TLS, and all the important NBD servers support that. You can use TLS to do client authentication but it’s seriously clunky and difficult to set up because you have to use X.509 certificates, and if we’ve learned anything from the web we know that X.509 is a plot by the NSA to stop us using encryption (only joking, spooks!)

It turns out there’s a more sensible corner of the TLS specification called TLS-PSK. This uses usernames and randomly generated Pre-Shared Keys (PSK). As long as you can ensure that both the client and server can read a simple username:key file of keys, and the keys are kept secret, you can both authenticate and communicate securely.

Unfortunately just implementing TLS doesn’t get you PSK as well, and no existing NBD server supports TLS-PSK.

So I had to add support. To qemu and qemu-nbd. And to nbdkit.

Amazingly it all works, and qemu and nbdkit interoperate too. Here’s how you could use it:

$ mkdir -m 0700 /tmp/keys
$ psktool -u rich -p /tmp/keys/keys.psk
$ nbdkit -n \
    --tls=require --tls-psk=/tmp/keys/keys.psk \
    file file=disk.img
$ qemu-img info \
    --object "tls-creds-psk,id=tls0,endpoint=client,username=rich,dir=/tmp/keys" \
    --image-opts "file.driver=nbd,file.host=localhost,file.port=10809,file.tls-creds=tls0"

The qemu command line is a bit clunky, but it’s overall much simpler than setting up certificates, although not as scalable for large installations.


Filed under Uncategorized

The boring truth: full virtualization and containerization both have their place

Containers are the future! and lots of misinformed talk.

The truth here is more boring. Both full-fat virtualization (KVM), and containers (LXC) are the future, and which you use will depend on what you want to do. You’ll probably use both. You’re probably using both now, but you don’t know it.

The first thing to say is that full virt and containers are used for different things, although there is a little bit of overlap. Containers are only useful when the kernel of all your VMs is identical. That means you can run 1000 copies of Fedora 18 as containers, probably not something you could do with full virt, but you can’t run Windows or FreeBSD or possibly not even Debian in one of those containers. Nor can your users install their own kernels.

The second important point is that containers in Linux are not secure at all. It’s moderately easy to escape from a container and take control of the host or other containers. I also doubt they’ll ever be secure because any local kernel exploit in the enormous syscall API is a host exploit when you’re using containers. This is in contrast with full virt on Fedora and RHEL where the qemu process is protected by multiple layers: process isolation, Unix permissions, SELinux, seccomp, and, yes, a container.

So containers are useless? Not at all. If you need to run lots and lots of VMs and you don’t care about multiple kernels or security, containers are just about the only game in town (although it turns out that KVM is no slouch these days).

Also containers (or the cgroups technology behind them) are being used in very innovative places. They are used in systemd for resource control, in libvirt to isolate qemu, and for sandboxing.


Filed under Uncategorized

CVE-2011-4127: privilege escalation from qemu / KVM guests

Paolo Bonzini discovered that you can issue SCSI ioctls to virtio devices which are passed down to the host.

The very unfortunate part about this is it easily allows guests to read and write parts of host devices that they are not supposed to. For example, if a guest was confined to host device /dev/sda3, it could read or write other partitions or the boot sector on /dev/sda.

In your guest, try this command which reads the host boot sector:

sg_dd if=/dev/vda blk_sgio=1 bs=512 count=1 of=output

Swap the if and of arguments around to exploit the host.

Here’s Paolo’s write-up on LKML.

Here is the libguestfs mitigation patch. The libvirt mitigation patch.

Leave a comment

Filed under Uncategorized

hivex 1.2.5 released

The latest version of hivex — the library for extracting and modifying Windows Registry hive files has been released. You can get the source from here.

I spent a lot of time examining real hive files from Windows machines and running the library under the awesome valgrind tool, and found one or two places where a corrupt hive file could cause hivex to read uninitialized memory. It’s not clear to me if these are security issues — I think they are not — but everyone is advised to upgrade to this version anyway.

hivex would be a great candidate for fuzz testing if anyone wants to try that.

Leave a comment

Filed under Uncategorized

Tip: Audit virtual machine for setuid files

The script after the fold searches for setuid and setgid files on a disk image or in a virtual machine. When you run it you will see output like this:

# chmod +x findsuid.ml
# ./findsuid.ml RHEL60
/dev/vg_rhel6brewx64/lv_root:/sbin/mount.nfs is setuid file
/dev/vg_rhel6brewx64/lv_root:/sbin/netreport is setgid file
/dev/vg_rhel6brewx64/lv_root:/sbin/pam_timestamp_check is setuid file
/dev/vg_rhel6brewx64/lv_root:/sbin/unix_chkpwd is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/Xorg is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/at is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/chage is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/chfn is setuid file
/dev/vg_rhel6brewx64/lv_root:/usr/bin/chsh is setuid file

You could make simple adaptions to this script to audit for all sorts of things of interest: public writeable directories, unusual SELinux labels, hard links to setuid files, over-sized files. With more work you could look for files with unusual/changed checksums, infected files and so on.

Continue reading

Leave a comment

Filed under Uncategorized

New docs: security considerations with libguestfs

… discusses security implications of using libguestfs, particularly with untrusted or malicious guests or disk images.

Leave a comment

Filed under Uncategorized

Virtualization and whom you trust

One comment at the test day described virt-cat as “awesome but dangerous”.

This user was surprised that he could do:

virt-cat aguest /etc/shadow

and read the shadow password file from a guest. “Is there”, I was asked, “a security model for this?”

Here’s the news: You can already look at the shadow password file in any disk image using a hex editor. libguestfs, guestfish and virt-cat just make it easier.

Could you encrypt the virtual disk? That will protect the VM while it is (virtually) switched off, but as soon as you boot it up, the encryption key is stored somewhere in guest memory, and the host administrator can read that too.

No security model can help you here. You need to own and manage the hardware yourself, or you need to trust your cloud provider. If your data is at all personally or commercially sensitive, keep it on hardware you physically control.


Filed under Uncategorized

Windows SAM and hivex

On Windows, the file C:\windows\system32\config\SAM contains the users and passwords known to the local machine. hivex can process this file to reveal the usernames and password (hashes):

$ virt-win-reg WinGuest HKLM\\SAM > sam.reg

For each local user you’ll see a key like this:


With typical technical brilliance Microsoft developers have written a zero-length key with the type field (0x3e9) overloaded as a key to use in another part of the registry:


(Apparently the number 0x3e9 is called the “RID” in Microsoft parlance).

My password hint is the “usual”. The “F” key is a dumped C structure containing the last login date amongst other things. The “V” key is another C structure containing my full name, home directory, the password hash and a bunch of other stuff.

With a bit of effort it looks like you could read and even modify these entries.

1 Comment

Filed under Uncategorized

Half-baked ideas: verify the integrity of VMs

For more half-baked ideas, see my ideas tag.

This LWN article on OSSEC reminded me of an idea I had. We need a verify tool that can verify your VMs are not corrupted and don’t contain a rootkit. You can currently run a simple rpm -V command or one of the tools listed in that LWN article, but the problem is you have to run those commands inside the VM, thus relying on the VM itself not to have been corrupted. (You can also reboot the VM into a known-good state, eg. from a rescue ISO, but then you get downtime).

Obviously the answer is to examine the VM from the host, using libguestfs to grab the checksums directly from the filesystem.

You can get the checksums easily this way, but what do you compare them against and how do you know the checksums are good?

You would need to ask the distribution for a list of known-good checksums for the packages they publish. In fact you can do this reasonably easily. That information is available in the raw RPMs, or Red Hat’s RHN, and I’m quite sure you can get it from the Debian repos too. Windows? I don’t know specifically, but I guess either Microsoft publish this or you could derive it in some way.

So now if you are presented with some file from the VM and its checksum, like:

e2ed3c7d6d429716173fbd2d831d6e2855f1d20209da1238f75d1892a3074af5 /sbin/ldconfig

in theory we can verify this file was distributed in the signed package glibc-2.11.1-4.x86_64.rpm built on Thu 18 Mar 2010 04:51:51 PM GMT by a Fedora builder. (Using virt-inspector you can work out that the VM is a Fedora instance).

There’s still a subtle problem with this which I can’t work out. What happens if the attacker doesn’t directly install a rootkit, but instead replaces a binary with a version which has a known vulnerability. Say, a known root exploit in a package which the distro had previously shipped and later found to be vulnerable? This would allow the attacker to revisit the machine and acquire root, and run any software they want entirely in memory (libguestfs only sees the disk). For that you’d need not just a big database of all the files that distributions have shipped, but a list of files that have been obsoleted for security reasons.

There’s also a second problem: Although we can enumerate the goodness (all files are files that the distribution has shipped), we don’t know what to do with all the other files. Like user files, configuration files, /tmp, or files that just shouldn’t be there. To detect rootkits amongst those files, it’s starting to look like we’d have to enumerate badness and we don’t want to go there.


Filed under Uncategorized