Tag Archives: virtualization

nbdkit 1.24 & libnbd 1.6, new copying tool

As well as nbdkit 1.24 being released on Thursday, its sister project libnbd 1.6 was released at the same time. This comes with an enhanced copying tool called nbdcopy designed to replace some uses of qemu-img convert (note: it’s not a general replacement).

nbdcopy lets you copy from and to NBD servers (nbdkit, qemu-nbd, qemu-storage-daemon, nbd-server), local files, local block devices, pipes/sockets, and stdin/stdout. For example to stream the content of an NBD server:

$ nbdcopy nbd://localhost - | hexdump -C

The “-” character streams to stdout. nbd://localhost is an NBD URI referring to an NBD server that is already running. What if you don’t have an already running server? nbdcopy lets you run one from the command line (and cleans up after). For example this is one way to convert a qcow2 file to raw:

$ nbdcopy -- [ qemu-nbd -f qcow2 disk.qcow ] disk.raw

Here the [ ... ] section starts qemu-nbd as a captive NBD server, exposing privately an NBD endpoint, and nbdcopy copies this to local file disk.raw. (“--” is needed to stop nbdcopy trying to interpret qemu-nbd’s own command line arguments.)

However this post is really about the nbdkit release. How did I test and benchmark nbdcopy? Of course I wrote an nbdkit plugin called nbdkit-sparse-random-plugin. This plugin has two clever features for testing copying tools. Firstly it creates random disks which have the same “shape” as virtual machine disk images (but without the overhead of needing to bother with an actual VM). Secondly it can act as both a source and target for testing copies.

Let’s unpack those two things a bit further.

Virtual machine disk images (especially mostly empty ones) are mostly sparse. Here’s part of the sparse map from a Fedora 32 disk image:

$ virt-builder fedora-32
$ filefrag -e fedora-32.img 
 Filesystem type is: 58465342
 File size of fedora-32.img is 6442450944 (1572864 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..       0:    2038672..   2038672:      1:            
    1:        1..      15:    2176040..   2176054:     15:    2038673:
    2:      256..     271:    2188819..   2188834:     16:    2176295:
    3:      512..    3135:    3650850..   3653473:   2624:    2189075:
    4:     3168..    4463:    3781763..   3783058:   1296:    3653506:
[...]

The new sparse-random plugin generates a disk image which has a similar shape — islands of random data in a sea of sparseness. The algorithm for doing this is quite neat. Because the plugin doesn’t need to store the data, unlike a real disk image, it can generate huge disk images (eg. a terabyte) while using almost no memory. We use a low-overhead, high-quality random number generator and are smart about seeds so that every run of sparse-random with the same seed produces identical output.

The other part of this plugin is how we can use it to test copying tools like nbdcopy and qemu-img convert. My idea was that the plugin could be used both as the source and the target of the copy:

$ nbdkit -U - sparse-random 1T --run ' nbdcopy "$uri" "$uri" '

Here we create a terabyte-sized sparse-random disk, and get nbdcopy to copy from the plugin to the plugin. On reads sparse-random supplies the sparseness and random data. On writes it checks if what is being written matches the content of the plugin, throwing -EIO errors if not. Assuming the copying tool is correctly handling errors, we can both validate the copying tool and benchmark it. And it works with qemu-img convert too:

$ nbdkit -U - sparse-random 1T --run ' qemu-img convert "$uri" "$uri" '

And now we can see which one is faster.

Try it, you may be surprised.

Leave a comment

Filed under Uncategorized

nbdkit 1.24, new data plugin features

nbdkit 1.24 was released on Thursday. It’s our flexible, fast network block device with loads of features. nbdkit-data-plugin, a plugin that lets you create test patterns from the command line gained some interesting new functionality:

$ nbdkit data ' ( 0x55 0xAA )*2048 '

This command worked before as a way to create a repeating test pattern in a disk image. A new feature is you can write a shell script snippet to generate the pattern instead:

$ nbdkit data ' <( while :; do printf "%04x" $((i++)); done ) [:2048] '

This command will create a pattern of characters “0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 …” (truncated to 2048 bytes). We could turn that into a block device and display the contents:

# nbd-client localhost /dev/nbd0
# blockdev --getsize64 /dev/nbd0
2048
# dd if=/dev/nbd0 | hexdump -C | head
4+0 records in
4+0 records out
2048 bytes (2.0 kB, 2.0 KiB) copied, 0.000167082 s, 12.3 MB/s
00000000  30 30 30 30 30 30 30 31  30 30 30 32 30 30 30 33  |0000000100020003|
00000010  30 30 30 34 30 30 30 35  30 30 30 36 30 30 30 37  |0004000500060007|
00000020  30 30 30 38 30 30 30 39  30 30 30 61 30 30 30 62  |00080009000a000b|
00000030  30 30 30 63 30 30 30 64  30 30 30 65 30 30 30 66  |000c000d000e000f|
00000040  30 30 31 30 30 30 31 31  30 30 31 32 30 30 31 33  |0010001100120013|
00000050  30 30 31 34 30 30 31 35  30 30 31 36 30 30 31 37  |0014001500160017|
00000060  30 30 31 38 30 30 31 39  30 30 31 61 30 30 31 62  |00180019001a001b|
00000070  30 30 31 63 30 30 31 64  30 30 31 65 30 30 31 66  |001c001d001e001f|
00000080  30 30 32 30 30 30 32 31  30 30 32 32 30 30 32 33  |0020002100220023|
00000090  30 30 32 34 30 30 32 35  30 30 32 36 30 30 32 37  |0024002500260027|
# nbd-client -d /dev/nbd0
# killall nbdkit

The data plugin also lets you read from files which is useful for making disks with random initial data. For example here’s how to create a disk with 16 identical sectors of random data (notice how /dev/random is read in, truncated to 512 bytes, and then 16 copies are made):

$ nbdkit data ' </dev/urandom[:512]*16 '

The plugin can also create sparse disks. You can do this just by moving the current offset using “@”:

$ nbdkit data ' @32768 1 ' --run 'nbdinfo --map "$uri"'
     0       32768    3  hole,zero
 32768           1    0  allocated

We use this plugin quite extensively when testing libnbd.

1 Comment

Filed under Uncategorized

nbdkit tar filter

nbdkit is our high performance liberally licensed Network Block Device server, and OVA files are a common pseudo-standard for exporting virtual machines including their disk images.

A .ova file is really an uncompressed tar file:

$ tar tf rhel.ova
rhel.ovf
rhel-disk1.vmdk
rhel.mf

Since tar files usually store their content unmangled, this opens an interesting possibility for reading (or even writing) the embedded disk image without needing to unpack the tar. You just have to work out the offset of the disk image within the tar file. virt-v2v has used this trick to save a copy when importing OVAs for years.

nbdkit has also included a tar plugin which can access a file inside a local tar file, but the problem is what if the tar file doesn’t happen to be a local file? (eg. It’s on a webserver). Or what if it’s compressed?

To fix this I’ve turned the plugin into a filter. Using nbdkit-tar-filter you can unpack even non-local compressed tar files:

$ nbdkit curl http://example.com/qcow2.tar.xz \
         --filter=tar --filter=xz tar-entry=disk.qcow2

(To understand how filters are stacked, see my FOSDEM talk from last year). Because in this example the disk inside the tarball is a qcow2 file, it appears as qcow2 on the wire, so:

$ guestfish --ro --format=qcow2 -a nbd://localhost

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

><fs> run
><fs> list-filesystems 
/dev/sda1: ext2
><fs> mount /dev/sda1 /
><fs> ll /
total 19
drwxr-xr-x   3 root root  1024 Jul  6 20:03 .
drwxr-xr-x  19 root root  4096 Jul  9 11:01 ..
-rw-rw-r--.  1 1000 1000    11 Jul  6 20:03 hello.txt
drwx------   2 root root 12288 Jul  6 20:03 lost+found

Leave a comment

Filed under Uncategorized

nbdkit with BitTorrent

nbdkit is our high performance Network Block Device server for serving disk images from unusual sources. One (usual) source for Linux installers is to download an ISO from a website like Get Fedora or debian.org. However that costs the host money and is also a central point of failure, so another way to download Linux installers is over BitTorrent. Many Linux distros offer torrents of their installers including Fedora and Debian. By using these you are helping to redistribute Linux and defraying the cost of hosting these ISOs.

Now I’ve written a BitTorrent plugin for nbdkit so you can download, redistribute and install Linux all at the same time!

$ url=https://torrent.fedoraproject.org/torrents/Fedora-Server-dvd-x86_64-32.torrent
$ wget $url
$ nbdkit -U - torrent Fedora-Server-*.torrent \
         --run 'qemu-system-x86_64 -m 2048 -cdrom $nbd -boot d'

So what’s the serious use for this? It has the interesting property that the more people who are installing your Linux distro, the less bandwidth it uses and the faster it runs! This could be interesting technology for any kind of distributed environment where you have lots of machines accessing the same fixed/read-only filesystem or disk image.

If you want to get started with nbdkit it’s already in all popular Linux distributions, and compiles from source on Linux, FreeBSD and OpenBSD.

Leave a comment

Filed under Uncategorized

Golang bindings for both libnbd and nbdkit

I have to say for full transparency up front that Golang is not my favourite programming language, even less after using it for a while. Nevertheless with a lot of help from Dan Berrangé we now have Golang bindings for libnbd and nbdkit which are respectively client and server software for the Linux Network Block Device protocol.

The Golang bindings for libnbd let you connect to a server and read and write from it. This is all pretty straightforward so read the manual page if you want to find out more.

The Golang bindings for nbdkit are considerably more interesting because you can use them to write pretty natural and high performance NBD servers to expose “interesting things”.

I’m hoping in particular there are interesting block device sources in the Kubernetes / Docker ecosystem which are probably only available from Golang that we could now expose to other software (although I’m also still researching this area so I don’t yet know what in particular).

You can make a complete Golang NBD server really easily now with only a few lines of code. Minus boilerplate, something like this is sufficient (see this link for complete working examples):

type MyPlugin struct {
	nbdkit.Plugin
}

type MyConnection struct {
	nbdkit.Connection
}

func (p *MyPlugin) Open(readonly bool) (nbdkit.ConnectionInterface, error) {
	return &MyConnection{}, nil
}

func (c *MyConnection) GetSize() (uint64, error) {
	return size, nil
}

func (c *MyConnection) PRead(buf []byte, offset uint64,
	flags uint32) error {
	copy(buf, ... from the source of your data here ...)
	return nil
}

func (c *MyConnection) CanWrite() (bool, error) {
	return true, nil
}

func (c *MyConnection) PWrite(buf []byte, offset uint64,
	flags uint32) error {
	copy(... to the data source here ..., buf)
	return nil
}

Editor note: In an earlier version of these bindings we passed the whole struct to each callback rather than a pointer, hence James’s first comment below.

3 Comments

Filed under Uncategorized

Pyrit by Řrřola, incredible raytracing demo as a qemu bootable disk image

One of the things I showed at KVM Forum last month was a cool demo by Jan Kadlec (Řrřola). Originally this was a 256 byte MSDOS COM file. I adapted it very slightly to turn it into a boot sector. Here’s how to run it using nbdkit and qemu:

nbdkit data data="
  49 192 49 219 185 255 0 191 254 255 137 252 190 0 1 189 28 9 79 176 
  19 79 208 233 205 16 15 190 203 48 205 136 233 137 200 247 224 209 
  233 254 195 120 2 134 206 184 16 16 117 228 184 79 176 163 0 1 184 19 
  79 163 2 1 184 208 233 163 4 1 184 205 16 163 6 1 184 15 190 163 8 1 
  184 203 48 163 10 1 184 205 136 163 12 1 184 233 137 163 14 1 184 49 
  71 186 202 159 142 194 96 185 12 0 1 245 96 217 69 254 217 251 217 
  238 132 193 117 2 217 224 221 219 226 246 221 219 217 193 217 69 254 
  217 251 222 204 222 201 4 127 112 241 222 195 222 233 114 233 217 26 
  41 254 123 250 97 226 204 97 66 170 96 219 227 140 195 191 252 255 
  223 6 68 125 221 23 223 69 251 223 69 252 232 14 0 97 129 195 205 204 
  115 225 117 222 228 96 72 224 152 145 0 246 112 78 0 210 112 74 185 
  12 0 1 245 217 236 216 2 86 217 2 216 204 41 254 123 248 94 222 193 
  222 193 83 217 19 133 99 2 120 2 41 251 217 192 216 15 223 242 114 6 
  216 249 217 23 137 40 222 217 91 139 87 6 59 87 2 126 16 226 199 139 
  24 217 1 216 8 216 192 216 235 41 254 123 244 217 192 222 14 70 125 
  219 29 102 193 61 22 120 24 222 60 220 201 216 202 219 27 42 67 1 219 
  27 50 67 1 36 72 4 80 246 37 136 37 195 127 112 97 66 68 78 
  @0x1fe 85 170 
  " size=512 --run 'qemu-system-x86_64 -hda $nbd'

(I would normally put a screenshot here, but it doesn’t do it justice. I suggest really running that command and also reading the surprisingly clean source code)

3 Comments

Filed under Uncategorized

nbdkit new eval plugin and ip filter

nbdkit is our flexible toolkit for building block devices. I just added a couple of new features which will appear in the next stable release, nbdkit 1.18.

Previously I’ve talked on this blog and gave a talk at FOSDEM about how you can write block devices in shell script using nbdkit-sh-plugin. But that requires you to use an extra file for the script. What if opening an extra file is too much work? Well now you can specify the script directly on the nbdkit command line using the new eval plugin.

You can write code like:

nbdkit eval \
       config='ln -sf "$(realpath "$3")" $tmpdir/file' \
       get_size='stat -Lc %s $tmpdir/file' \
       pread='dd if=$tmpdir/file skip=$4 count=$3 iflag=count_bytes,skip_bytes' \
       pwrite='dd of=$tmpdir/file seek=$4 conv=notrunc oflag=seek_bytes' \
       file=disk.img

which is a complete NBD server / block device backed by a local file. Of course it’s probably easier to use nbdkit-file-plugin for this, but the shell script gives you more control like letting you simulate failures or delays.

The other new feature is connected to a CVE we had earlier this year. CVE-2019-14850 happened because nbdkit used to open the plugin as soon as any client established a TCP connection. For some plugins opening them is quite a heavyweight action (eg. it might mean that the plugin has to establish a connection to a second server). This is before NBD negotiation or TLS had started, and it allowed clients potentially to overwhelm the server with requests even if those clients would not be authorized to connect.

To fix this we delay opening plugins until after the NBD handshake (and thus TLS authentication) has completed. But this in turn meant there was no way for plugins to reject connections early, for example based on IP address. So now I have added a preconnect method which gets runs on first TCP connection and can be used to do lightweight early filtering. There is a new nbdkit-ip-filter which implements simple TCP-wrappers-style allow/deny lists.

Leave a comment

Filed under Uncategorized

NBD over AF_VSOCK

How do you talk to a virtual machine from the host? How does the virtual machine talk to the host? In one sense the answer is obvious: virtual machines should be thought of just like regular machines so you use the network. However the connection between host and guest is a bit more special. Suppose you want to pass a host directory up to the guest? You could use NFS, but that’s sucky to set up and you’ll have to fiddle around with firewalls and ports. Suppose you run a guest agent reporting stats back to the hypervisor. How do they talk? Network, sure, but again that requires an extra network interface and the guest has to explicitly set up firewall rules.

A few years ago my colleague Stefan Hajnoczi ported VMware’s vsock to qemu. It’s a pure guest⟷host (and guest⟷guest) sockets API. It doesn’t use regular networks so no firewall issues or guest network configuration to worry about.

You can run NFS over vsock [PDF] if you want.

And now you can of course run NBD over vsock. nbdkit supports it, and libnbd is (currently the only!) client.

Leave a comment

Filed under Uncategorized

libnbd + FUSE = nbdfuse

I’ve talked before about libnbd, our NBD client library. New in libnbd 1.2 is a tool called nbdfuse which lets you turn NBD servers into virtual files.

A couple of weeks ago I mentioned you can use libnbd as a C library to edit qcow2 files. Now you can turn qcow2 files into virtual raw files:

$ mkdir dir
$ nbdfuse dir/file.raw \
      --socket-activation qemu-nbd -f qcow2 file.qcow2
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 1073741824 Jan  1 10:10 file.raw

Reads and writes to file.raw are backed by the original qcow2 file which is updated in real time.

Another fun thing to do is to use nbdkit, xz filter and curl to turn xz-compressed remote disk images into uncompressed local files:

$ mkdir dir
$ nbdfuse dir/disk.img \
      --command nbdkit -s curl --filter=xz \
                       http://builder.libguestfs.org/fedora-30.xz
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 6442450944 Jan  1 10:10 disk.img
$ file dir/disk.img
dir/disk.img: DOS/MBR boot sector
$ qemu-system-x86_64 -m 4G \
      -drive file=dir/disk.img,format=raw,if=virtio,snapshot=on

1 Comment

Filed under Uncategorized

libnbd – A new NBD client library

NBD is a high performance protocol for exporting disks between processes and machines. We use it as a kind of “universal connector” for connecting hypervisors with data sources, and previously myself and Eric Blake wrote a general purpose NBD server called nbdkit. (If you’re interested in the topic of nbdkit as a universal connector, watch my FOSDEM talk.)

Up til now our NBD client has been qemu or one of the qemu tools like qemu-img. That was fine if you wanted to expose a disk source as a running virtual machine (ie. running it with qemu), or if you wanted to perform one of the limited copying operations that qemu-img convert can do, but there were many cases where it would have been nice to have a general client library.

For example I started to add NBD support to Jen Axboe’s FIO. Lacking a client library I synthesized NBD request packets as C structs and sent them on the wire using low level socket commands. The performance was, to put it bluntly, crap.

Although NBD is a very simple protocol and you can write it by hand, it would be nicer to have a library wrap the low-level stuff, and that’s why we have written libnbd (downloads).

Getting reasonable performance from NBD requires a few tricks:

  • You must issue as many commands as possible “in flight” (the server will reply to them out of order, but requests and replies are tied together by a unique ID).
  • You may need to open multiple connections to the server, but doing that requires attention to the special MULTI_CONN flag which the server will use to indicate that this is safe.
  • Most crucially you must disable Nagle’s algorithm.

This isn’t an exhaustive list. In fact while writing libnbd over about 3 weeks we improved performance by a factor of over 15 times, just by paying attention to system calls, maximizing parallelism and minimizing latency. One advantage of libnbd is that it encodes all this knowledge in an easy to use library so NBD clients won’t have to reinvent it in future.

The library has a simple high-level synchronous API which works how you would expect (but doesn’t get the best performance). A typical program might look like:

struct nbd_handle *nbd;
int64_t exportsize;
char buf[512];

nbd = nbd_create ();
if (!nbd) goto error;
if (nbd_connect_tcp (nbd, "localhost", "nbd") == -1)
  goto error;
exportsize = nbd_get_size (nbd);
if (nbd_pread (nbd, buf, sizeof buf, 0, 0) == -1) {
 error:
  fprintf (stderr, "%s\n", nbd_get_error ());
}

To get the best performance you have to use the more low-level asynchronous API which allows you to queue up commands and bring your own main loop.

There are also bindings in OCaml and Python (and Rust, soon). There’s also a nice little shell written in Python so you can access NBD servers interactively:

$ nbdsh
nbd> h.connect_command (["nbdkit", "-s", "memory", "1M"])
nbd> print ("%r" % h.get_size ())
1048576
nbd> h.pwrite (b"12345", 0)
nbd> h.pread (5, 0)
b'12345'

libnbd and the shell, nbdsh, are available now in Fedora 29 and above.

3 Comments

Filed under Uncategorized