Tag Archives: qemu

NBD-backed qemu guest RAM

This seems too crazy to work, but it does:

$ nbdkit memory 1G
$ nbdfuse mem nbd://localhost &
[1] 1053075
$ ll mem
-rw-rw-rw-. 1 rjones rjones 1073741824 May 17 18:31 mem

Now boot qemu with that memory as the backing RAM:

$ qemu-system-x86_64 -m 1024 \
-object memory-backend-file,id=pc.ram,size=1024M,mem-path=/var/tmp/mem,share=on \
-machine memory-backend=pc.ram \
-drive file=fedora-36.img,if=virtio,format=raw

It works! You can even dump the RAM over a second NBD connection and grep for strings which appear on the screen (or passwords etc):

$ nbdcopy nbd://localhost - | strings | grep 'There was 1 failed'
There was 1 failed login attempt since the last successful login.

Of course this isn’t very useful on its own, it’s just an awkward way to use a sparse RAM disk as guest RAM, but nbdkit has plenty of other plugins that might be useful here. How about remote RAM? You’ll need a very fast network.

Leave a comment

Filed under Uncategorized

nbdkit + libblkio

Our plugin-based Network Block Device server, nbdkit, now has support for libblkio.

libblkio is a library written by Stefan Hajnoczi, Alberto Faria, Stefano Garzarella and others for accessing some somewhat unusual disk protocols including vhost-user, NVMe, vDPA, VFIO and io_uring which I’ll talk about below. It’s important to know that these are not disk formats (like raw or qcow2), but accelerated protocols for talking to virtual or real hardware.

The library is written in Rust (but offers a C API) and I believe it’s intended to replace various bottom-end parts of the qemu block layer at some point in the future.

The library uses a set of property strings to describe how to connect to a device. The nbdkit plugin maps those almost exactly into command line parameters, so you can usually follow the libblkio docs and translate that into an nbdkit command line, eg:

$ nbdkit blkio io_uring path=fedora.img

This sets the libblkio driver to “io_uring” and the path to the path of a local file. This libblkio driver uses Linux’s relatively new io_uring facility to access a local file or block device, the simplest way to use libblkio.

The other most frequently used protocol or libblkio driver is vhost-user. This is a protocol that allows a server to share a disk image to client(s) on the same machine. It uses a Unix domain socket for communication, but unlike Network Block Device (NBD) it’s not possible to use this over the network. For greater performance vhost uses shared memory between the client and server for data transfer.

qemu-storage-daemon is the most common server:

$ qemu-storage-daemon \

--blockdev driver=file,node-name=file,filename=fedora.qcow2 \

--blockdev driver=qcow2,node-name=qcow2,file=file \

--export type=vhost-user-blk,id=export,addr.type=unix,addr.path=sock,node-name=qcow2

To connect from nbdkit, just use the socket:

$ nbdkit blkio virtio-blk-vhost-user path=sock

You might wonder why we want to add libblkio support to nbdkit (apart from it being fun). There’s a practical reason which is this brings along all of the scripting support we’ve created around NBD to these somewhat obscure (albeit quite widely used) protocols. I don’t think it was possible before to use Python to script against, eg., vhost-user, but now it is:

$ nbdsh -u nbd://localhost -c 'print("%r" % h.pread(512,0))'

Leave a comment

Filed under Uncategorized

nbdkit now supports LUKS encryption

nbdkit, our permissively licensed plugin-based Network Block Device server can now transparently decode encrypted disks, for both reading and writing:

qemu-img create -f luks --object secret,data=SECRET,id=sec0 -o key-secret=sec0 encrypted-disk.img 1G

nbdkit file encrypted-disk.img --filter=luks passphrase=+/tmp/secret

We use LUKSv1 as the encryption format. That’s an older version [more on that in a moment] of the format used for Full Disk Encryption on Linux. It’s much preferable to use LUKS rather than using qemu’s built-in qcow2 encryption, and our implementation is compatible with qemu’s.

You can place the filter on top of other nbdkit plugins, like Curl:

nbdkit curl https://example.com/encrypted-disk.img --filter=luks passphrase=+/tmp/secret

The threat model here is that you can store the encrypted data on a remote server, and the admin of the server cannot decrypt the disk (assuming you don’t give them the passphrase).

If you try this filter (or qemu’s device) with a modern Linux LUKS disk you’ll find that it doesn’t work. This is because modern Linux uses LUKSv2, although they are able to create, read and write LUKSv1 if you use set them up that way in advance. Unfortunately LUKSv2 is significantly more complicated than LUKSv1. It requires parsing JSON data(!) stored in the header, and supports a wider range of password derivation functions, typically the very slow and memory-intensive argon2. LUKSv1 by contrast only requires support for PBKDF2 and is generally far more straightforward to implement.

The new filter will be available in nbdkit 1.32, or you can grab the development version now.

2 Comments

Filed under Uncategorized

nbdkit 1.24 & libnbd 1.6, new copying tool

As well as nbdkit 1.24 being released on Thursday, its sister project libnbd 1.6 was released at the same time. This comes with an enhanced copying tool called nbdcopy designed to replace some uses of qemu-img convert (note: it’s not a general replacement).

nbdcopy lets you copy from and to NBD servers (nbdkit, qemu-nbd, qemu-storage-daemon, nbd-server), local files, local block devices, pipes/sockets, and stdin/stdout. For example to stream the content of an NBD server:

$ nbdcopy nbd://localhost - | hexdump -C

The “-” character streams to stdout. nbd://localhost is an NBD URI referring to an NBD server that is already running. What if you don’t have an already running server? nbdcopy lets you run one from the command line (and cleans up after). For example this is one way to convert a qcow2 file to raw:

$ nbdcopy -- [ qemu-nbd -f qcow2 disk.qcow ] disk.raw

Here the [ ... ] section starts qemu-nbd as a captive NBD server, exposing privately an NBD endpoint, and nbdcopy copies this to local file disk.raw. (“--” is needed to stop nbdcopy trying to interpret qemu-nbd’s own command line arguments.)

However this post is really about the nbdkit release. How did I test and benchmark nbdcopy? Of course I wrote an nbdkit plugin called nbdkit-sparse-random-plugin. This plugin has two clever features for testing copying tools. Firstly it creates random disks which have the same “shape” as virtual machine disk images (but without the overhead of needing to bother with an actual VM). Secondly it can act as both a source and target for testing copies.

Let’s unpack those two things a bit further.

Virtual machine disk images (especially mostly empty ones) are mostly sparse. Here’s part of the sparse map from a Fedora 32 disk image:

$ virt-builder fedora-32
$ filefrag -e fedora-32.img 
 Filesystem type is: 58465342
 File size of fedora-32.img is 6442450944 (1572864 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..       0:    2038672..   2038672:      1:            
    1:        1..      15:    2176040..   2176054:     15:    2038673:
    2:      256..     271:    2188819..   2188834:     16:    2176295:
    3:      512..    3135:    3650850..   3653473:   2624:    2189075:
    4:     3168..    4463:    3781763..   3783058:   1296:    3653506:
[...]

The new sparse-random plugin generates a disk image which has a similar shape — islands of random data in a sea of sparseness. The algorithm for doing this is quite neat. Because the plugin doesn’t need to store the data, unlike a real disk image, it can generate huge disk images (eg. a terabyte) while using almost no memory. We use a low-overhead, high-quality random number generator and are smart about seeds so that every run of sparse-random with the same seed produces identical output.

The other part of this plugin is how we can use it to test copying tools like nbdcopy and qemu-img convert. My idea was that the plugin could be used both as the source and the target of the copy:

$ nbdkit -U - sparse-random 1T --run ' nbdcopy "$uri" "$uri" '

Here we create a terabyte-sized sparse-random disk, and get nbdcopy to copy from the plugin to the plugin. On reads sparse-random supplies the sparseness and random data. On writes it checks if what is being written matches the content of the plugin, throwing -EIO errors if not. Assuming the copying tool is correctly handling errors, we can both validate the copying tool and benchmark it. And it works with qemu-img convert too:

$ nbdkit -U - sparse-random 1T --run ' qemu-img convert "$uri" "$uri" '

And now we can see which one is faster.

Try it, you may be surprised.

Leave a comment

Filed under Uncategorized

FUSE mounting on top of a file

Our tool nbdfuse lets you mount an NBD block device as a file, using Linux FUSE. For example you could create a directory with a single file in it (called nbd) which contains the contents of the NBD export:

$ mkdir /var/tmp/test
$ nbdfuse /var/tmp/test --command nbdkit -s memory 1G &
$ ls -l /var/tmp/test/
total 0
 -rw-rw-rw-. 1 rjones rjones 1073741824 Nov  4 13:25 nbd
$ fusermount -u /var/tmp/test

This is cool, but wouldn’t it be nice to get rid of the directory and create the file anywhere? Recently Max Reitz found out you can mount a FUSE filesystem over a regular file.

It works! (After a few adjustments to the nbdfuse code)

$ touch /var/tmp/disk.img
$ nbdfuse /var/tmp/disk.img --command nbdkit -s memory 1G &
$ ls -l /var/tmp/disk.img
 -rw-rw-rw-. 1 rjones rjones 1073741824 Nov  4 13:29 /var/tmp/disk.img
$ fusermount -u /var/tmp/disk.img 

1 Comment

Filed under Uncategorized

Notes to self on frama-c

Frama-C is a giant modular system for writing formal proofs of C code. For months I’ve been on-and-off trying to see if we could use it to do useful proofs for any parts of the projects we write, like qemu, libvirt, libguestfs, nbdkit etc. I got side-tracked at first with this frama-c tutorial which is fine, but I got stuck trying to make the GUI work.

Yesterday I discovered this set of 3 short command-line based tutorials: https://maniagnosis.crsr.net/2017/06/AFL-brute-force-search.html https://maniagnosis.crsr.net/2017/06/AFL-bug-in-quicksearch.html https://maniagnosis.crsr.net/2017/07/AFL-correctness-of-quicksearch.html

I thought I’d start by trying to apply this to a small section of qemu code, the fairly self-contained range functions.

The first problem is how to invoke frama-c:

frama-c -wp -wp-rte -wp-print util/range.c -cpp-extra-args=" -I include -I build -I /usr/include -DQEMU_WARN_UNUSED_RESULT= "

You have to give all the include directories and define out some qemu-isms.

The first time you run it, this won’t work for “reasons”. You have to initialize the why3 verifier using:

why3 config --full-config

Really frama-c should just do this for you, or at least tell you what you need to do in the obscure error message it prints.

This still won’t work because util/range.c includes glib headers which use GCC attributes and builtins and frama-c simply cannot parse any of that. So I ended up hacking on the source to replace the headers with standard C headers and remove the one glib-based function in the file.

At this point it does compile and the frama-C WP plugin runs. Of course without having added any annotations it simply produces a long list of problems. Also it takes a fair bit of time to run, which is interesting. I wonder if it will get faster with annotations?

That’s as far as I’ve got for the moment. I’ll come back later and try to add annotations.

1 Comment

Filed under Uncategorized

nbdkit with BitTorrent

nbdkit is our high performance Network Block Device server for serving disk images from unusual sources. One (usual) source for Linux installers is to download an ISO from a website like Get Fedora or debian.org. However that costs the host money and is also a central point of failure, so another way to download Linux installers is over BitTorrent. Many Linux distros offer torrents of their installers including Fedora and Debian. By using these you are helping to redistribute Linux and defraying the cost of hosting these ISOs.

Now I’ve written a BitTorrent plugin for nbdkit so you can download, redistribute and install Linux all at the same time!

$ url=https://torrent.fedoraproject.org/torrents/Fedora-Server-dvd-x86_64-32.torrent
$ wget $url
$ nbdkit -U - torrent Fedora-Server-*.torrent \
         --run 'qemu-system-x86_64 -m 2048 -cdrom $nbd -boot d'

So what’s the serious use for this? It has the interesting property that the more people who are installing your Linux distro, the less bandwidth it uses and the faster it runs! This could be interesting technology for any kind of distributed environment where you have lots of machines accessing the same fixed/read-only filesystem or disk image.

If you want to get started with nbdkit it’s already in all popular Linux distributions, and compiles from source on Linux, FreeBSD and OpenBSD.

Leave a comment

Filed under Uncategorized

New nbdkit data strings

You can use nbdkit, our infinitely flexible Network Block Device server to serve small disks and test images with the nbdkit data plugin. For example you can cut and paste this command into your shell to demonstrate a bootable disk image which prints “hello, world”:

nbdkit data data='
    0xb4 0 0xb0 3 0xcd 0x10 0xb4 0x13
    0xb3 0x0a 0xb0 1 0xb9 0x0e 0 0xb6
    0 0xb2 0 0xbd 0x19 0x7c 0xcd 0x10
    0xf4 0x68 0x65 0x6c 0x6c 0x6f 0x2c 0x20
    0x77 0x6f 0x72 0x6c 0x64 0x0d 0x0a
    @0x1fe 0x55 0xaa
' --run 'qemu-system-i386 -fda $nbd'

(As an aside, what is the smallest nbdkit data string that can boot to a “hello, world” message?)

The data parameter is a mini-language, and I recently extended it in an interesting way. It wasn’t possible to make repeated patterns easily before. If you wanted a disk containing 0x55 0xAA repeated (the binary bit patterns 01010101 10101010) then the only way to get that was to literally write:

nbdkit data data='0x55 0xAA 0x55 0xAA [repeated many times ...]'

but now you can group things together and write:

nbdkit data data='( 0x55 0xAA )*256'

The nesting works by recursively creating a new parser, which means you can use any data expression. For example to get 4 sectors containing half blank and half test data you can now do:

nbdkit data data='( @256 ( 0x55 0xAA )*128 )*4'

This gives you lots of way to make disks containing test patterns which you could then use to test Linux programs using /dev/nbd0 loop devices.

1 Comment

Filed under Uncategorized

Pyrit by Řrřola, incredible raytracing demo as a qemu bootable disk image

One of the things I showed at KVM Forum last month was a cool demo by Jan Kadlec (Řrřola). Originally this was a 256 byte MSDOS COM file. I adapted it very slightly to turn it into a boot sector. Here’s how to run it using nbdkit and qemu:

nbdkit data data="
  49 192 49 219 185 255 0 191 254 255 137 252 190 0 1 189 28 9 79 176 
  19 79 208 233 205 16 15 190 203 48 205 136 233 137 200 247 224 209 
  233 254 195 120 2 134 206 184 16 16 117 228 184 79 176 163 0 1 184 19 
  79 163 2 1 184 208 233 163 4 1 184 205 16 163 6 1 184 15 190 163 8 1 
  184 203 48 163 10 1 184 205 136 163 12 1 184 233 137 163 14 1 184 49 
  71 186 202 159 142 194 96 185 12 0 1 245 96 217 69 254 217 251 217 
  238 132 193 117 2 217 224 221 219 226 246 221 219 217 193 217 69 254 
  217 251 222 204 222 201 4 127 112 241 222 195 222 233 114 233 217 26 
  41 254 123 250 97 226 204 97 66 170 96 219 227 140 195 191 252 255 
  223 6 68 125 221 23 223 69 251 223 69 252 232 14 0 97 129 195 205 204 
  115 225 117 222 228 96 72 224 152 145 0 246 112 78 0 210 112 74 185 
  12 0 1 245 217 236 216 2 86 217 2 216 204 41 254 123 248 94 222 193 
  222 193 83 217 19 133 99 2 120 2 41 251 217 192 216 15 223 242 114 6 
  216 249 217 23 137 40 222 217 91 139 87 6 59 87 2 126 16 226 199 139 
  24 217 1 216 8 216 192 216 235 41 254 123 244 217 192 222 14 70 125 
  219 29 102 193 61 22 120 24 222 60 220 201 216 202 219 27 42 67 1 219 
  27 50 67 1 36 72 4 80 246 37 136 37 195 127 112 97 66 68 78 
  @0x1fe 85 170 
  " size=512 --run 'qemu-system-x86_64 -hda $nbd'

(I would normally put a screenshot here, but it doesn’t do it justice. I suggest really running that command and also reading the surprisingly clean source code)

3 Comments

Filed under Uncategorized

NBD over AF_VSOCK

How do you talk to a virtual machine from the host? How does the virtual machine talk to the host? In one sense the answer is obvious: virtual machines should be thought of just like regular machines so you use the network. However the connection between host and guest is a bit more special. Suppose you want to pass a host directory up to the guest? You could use NFS, but that’s sucky to set up and you’ll have to fiddle around with firewalls and ports. Suppose you run a guest agent reporting stats back to the hypervisor. How do they talk? Network, sure, but again that requires an extra network interface and the guest has to explicitly set up firewall rules.

A few years ago my colleague Stefan Hajnoczi ported VMware’s vsock to qemu. It’s a pure guest⟷host (and guest⟷guest) sockets API. It doesn’t use regular networks so no firewall issues or guest network configuration to worry about.

You can run NFS over vsock [PDF] if you want.

And now you can of course run NBD over vsock. nbdkit supports it, and libnbd is (currently the only!) client.

Leave a comment

Filed under Uncategorized