Tag Archives: libnbd

nbdkit 1.24 & libnbd 1.6, new copying tool

As well as nbdkit 1.24 being released on Thursday, its sister project libnbd 1.6 was released at the same time. This comes with an enhanced copying tool called nbdcopy designed to replace some uses of qemu-img convert (note: it’s not a general replacement).

nbdcopy lets you copy from and to NBD servers (nbdkit, qemu-nbd, qemu-storage-daemon, nbd-server), local files, local block devices, pipes/sockets, and stdin/stdout. For example to stream the content of an NBD server:

$ nbdcopy nbd://localhost - | hexdump -C

The “-” character streams to stdout. nbd://localhost is an NBD URI referring to an NBD server that is already running. What if you don’t have an already running server? nbdcopy lets you run one from the command line (and cleans up after). For example this is one way to convert a qcow2 file to raw:

$ nbdcopy -- [ qemu-nbd -f qcow2 disk.qcow ] disk.raw

Here the [ ... ] section starts qemu-nbd as a captive NBD server, exposing privately an NBD endpoint, and nbdcopy copies this to local file disk.raw. (“--” is needed to stop nbdcopy trying to interpret qemu-nbd’s own command line arguments.)

However this post is really about the nbdkit release. How did I test and benchmark nbdcopy? Of course I wrote an nbdkit plugin called nbdkit-sparse-random-plugin. This plugin has two clever features for testing copying tools. Firstly it creates random disks which have the same “shape” as virtual machine disk images (but without the overhead of needing to bother with an actual VM). Secondly it can act as both a source and target for testing copies.

Let’s unpack those two things a bit further.

Virtual machine disk images (especially mostly empty ones) are mostly sparse. Here’s part of the sparse map from a Fedora 32 disk image:

$ virt-builder fedora-32
$ filefrag -e fedora-32.img 
 Filesystem type is: 58465342
 File size of fedora-32.img is 6442450944 (1572864 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..       0:    2038672..   2038672:      1:            
    1:        1..      15:    2176040..   2176054:     15:    2038673:
    2:      256..     271:    2188819..   2188834:     16:    2176295:
    3:      512..    3135:    3650850..   3653473:   2624:    2189075:
    4:     3168..    4463:    3781763..   3783058:   1296:    3653506:
[...]

The new sparse-random plugin generates a disk image which has a similar shape — islands of random data in a sea of sparseness. The algorithm for doing this is quite neat. Because the plugin doesn’t need to store the data, unlike a real disk image, it can generate huge disk images (eg. a terabyte) while using almost no memory. We use a low-overhead, high-quality random number generator and are smart about seeds so that every run of sparse-random with the same seed produces identical output.

The other part of this plugin is how we can use it to test copying tools like nbdcopy and qemu-img convert. My idea was that the plugin could be used both as the source and the target of the copy:

$ nbdkit -U - sparse-random 1T --run ' nbdcopy "$uri" "$uri" '

Here we create a terabyte-sized sparse-random disk, and get nbdcopy to copy from the plugin to the plugin. On reads sparse-random supplies the sparseness and random data. On writes it checks if what is being written matches the content of the plugin, throwing -EIO errors if not. Assuming the copying tool is correctly handling errors, we can both validate the copying tool and benchmark it. And it works with qemu-img convert too:

$ nbdkit -U - sparse-random 1T --run ' qemu-img convert "$uri" "$uri" '

And now we can see which one is faster.

Try it, you may be surprised.

Leave a comment

Filed under Uncategorized

Read and writing VMware .vmdk disks

(This is in answer to an IRC question, but the answer is a bit longer than I can cover in IRC)

Can you read and write at the block level in a .vmdk file? I think the questioner was asking about writing a backup/restore type tool. Using only free software, qemu can do reads. You can attach qemu-nbd to a vmdk file and that will expose the logical blocks as NBD, and you can then read at the block level using libnbd:

#!/usr/bin/python3
import nbd
h = nbd.NBD()
h.connect_systemd_socket_activation(
    ["qemu-nbd", "-t", "/var/tmp/disk.vmdk"])
print("size = %d" % h.get_size())
buf = h.pread(512, 0)
$ ./qemu-test.py 
size = 1073741824

The example is in Python, but libnbd would let you do this from C or other languages just as easily.

While this works fine for reading, I wouldn’t necessarily be sure that writing is safe. The vmdk format is complex, baroque and only lightly documented, and the only implementation I’d trust is the one from VMware.

So as long as you’re prepared to use a bit of closed source software and agree with the (nasty) license, VDDK is the safer choice. You can isolate your own software from VDDK using our nbdkit plugin.

#!/usr/bin/python3
import nbd
h = nbd.NBD()
h.connect_command(
    ["nbdkit", "-s", "--exit-with-parent",
     "vddk", "libdir=/var/tmp/vmware-vix-disklib-distrib",
     "file=/var/tmp/disk.vmdk"])
print("size = %d" % h.get_size())
buf = h.pread(512, 0)
h.pwrite(buf, 512)

I quite like how we’re using small tools and assembling them together into a pipeline in just a few lines of code:

┌─────────┬────────┐          ┌─────────┬────────┐
│ your    │ libnbd │   NBD    │ nbdkit  │ VDDK   │
│ program │     ●──────────────➤        │        │
└─────────┴────────┘          └─────────┴────────┘
                                          disk.vmdk

One advantage of this approach is that it exposes the extents in the disk which you can iterate over using libnbd APIs. For a backup tool this would let you save the disk efficiently, or do change-block tracking.

1 Comment

Filed under Uncategorized

Golang bindings for both libnbd and nbdkit

I have to say for full transparency up front that Golang is not my favourite programming language, even less after using it for a while. Nevertheless with a lot of help from Dan Berrangé we now have Golang bindings for libnbd and nbdkit which are respectively client and server software for the Linux Network Block Device protocol.

The Golang bindings for libnbd let you connect to a server and read and write from it. This is all pretty straightforward so read the manual page if you want to find out more.

The Golang bindings for nbdkit are considerably more interesting because you can use them to write pretty natural and high performance NBD servers to expose “interesting things”.

I’m hoping in particular there are interesting block device sources in the Kubernetes / Docker ecosystem which are probably only available from Golang that we could now expose to other software (although I’m also still researching this area so I don’t yet know what in particular).

You can make a complete Golang NBD server really easily now with only a few lines of code. Minus boilerplate, something like this is sufficient (see this link for complete working examples):

type MyPlugin struct {
	nbdkit.Plugin
}

type MyConnection struct {
	nbdkit.Connection
}

func (p *MyPlugin) Open(readonly bool) (nbdkit.ConnectionInterface, error) {
	return &MyConnection{}, nil
}

func (c *MyConnection) GetSize() (uint64, error) {
	return size, nil
}

func (c *MyConnection) PRead(buf []byte, offset uint64,
	flags uint32) error {
	copy(buf, ... from the source of your data here ...)
	return nil
}

func (c *MyConnection) CanWrite() (bool, error) {
	return true, nil
}

func (c *MyConnection) PWrite(buf []byte, offset uint64,
	flags uint32) error {
	copy(... to the data source here ..., buf)
	return nil
}

Editor note: In an earlier version of these bindings we passed the whole struct to each callback rather than a pointer, hence James’s first comment below.

3 Comments

Filed under Uncategorized

NBD over AF_VSOCK

How do you talk to a virtual machine from the host? How does the virtual machine talk to the host? In one sense the answer is obvious: virtual machines should be thought of just like regular machines so you use the network. However the connection between host and guest is a bit more special. Suppose you want to pass a host directory up to the guest? You could use NFS, but that’s sucky to set up and you’ll have to fiddle around with firewalls and ports. Suppose you run a guest agent reporting stats back to the hypervisor. How do they talk? Network, sure, but again that requires an extra network interface and the guest has to explicitly set up firewall rules.

A few years ago my colleague Stefan Hajnoczi ported VMware’s vsock to qemu. It’s a pure guest⟷host (and guest⟷guest) sockets API. It doesn’t use regular networks so no firewall issues or guest network configuration to worry about.

You can run NFS over vsock [PDF] if you want.

And now you can of course run NBD over vsock. nbdkit supports it, and libnbd is (currently the only!) client.

Leave a comment

Filed under Uncategorized

libnbd + FUSE = nbdfuse

I’ve talked before about libnbd, our NBD client library. New in libnbd 1.2 is a tool called nbdfuse which lets you turn NBD servers into virtual files.

A couple of weeks ago I mentioned you can use libnbd as a C library to edit qcow2 files. Now you can turn qcow2 files into virtual raw files:

$ mkdir dir
$ nbdfuse dir/file.raw \
      --socket-activation qemu-nbd -f qcow2 file.qcow2
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 1073741824 Jan  1 10:10 file.raw

Reads and writes to file.raw are backed by the original qcow2 file which is updated in real time.

Another fun thing to do is to use nbdkit, xz filter and curl to turn xz-compressed remote disk images into uncompressed local files:

$ mkdir dir
$ nbdfuse dir/disk.img \
      --command nbdkit -s curl --filter=xz \
                       http://builder.libguestfs.org/fedora-30.xz
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 6442450944 Jan  1 10:10 disk.img
$ file dir/disk.img
dir/disk.img: DOS/MBR boot sector
$ qemu-system-x86_64 -m 4G \
      -drive file=dir/disk.img,format=raw,if=virtio,snapshot=on

1 Comment

Filed under Uncategorized

Using American Fuzzy Lop on network clients

Previously I’ve fuzzed hivex and nbdkit using my favourite fuzzing tool, Michał Zalewski’s American Fuzzy Lop (AFL).

AFL works by creating test cases which are files on disk, and then feeding those to programs which have been specially compiled so that AFL can trace into them and find out which parts of the code are run by the test case. It then adjusts the test cases and repeats, aiming to run more parts of the code and find ways to crash the program.

This works well for programs that parse files (like hivex, but also binary parsers of all sorts and XML parsers and similar). It can also be used to fuzz some servers where you can feed a file to the server and discard anything the server sends back. In nbdkit we can use the nbdkit -s option to do exactly this, making it easy to fuzz.

However it’s not obvious how you could use this to fuzz network clients. As readers will know we’ve been writing a new NBD client library called libnbd. But can we fuzz this? And find bugs? As it happens yes, and ooops — yes — AFL found a remote code execution bug allowing complete takeover of the client by a malicious server.

The trick to fuzzing a network client is to do the server thing in reverse. We set up a phony server which feeds the test case back to the client socket, while discarding anything that the client writes:

libnbd.svg

This is wrapped up into a single wrapper program which takes the test case on the command line and forks itself to make the client and server sides connected by a socket. This allows easy integration into an AFL workflow.

We found our Very Serious Bug within 3 days of fuzzing.

3 Comments

Filed under Uncategorized

How to edit a qcow2 file from C

Suppose you want to edit or read or write the data inside a qcow2 file? One way is to use libguestfs, and that’s the recommended way if you need to mount a filesystem inside the file.

But for accessing the data blocks alone, you can now use the libnbd API and qemu-nbd together and this has a couple of advantages: It’s faster and you can open snapshots (which libguestfs cannot do).

We start by creating a libnbd handle and connecting it to a qemu-nbd instance. The qemu-nbd instance is linked with qemu’s internal drivers that know how to read and write qcow2.

  const char *filename;
  struct nbd_handle *nbd;

  nbd = nbd_create ();
  if (nbd == NULL) {
    fprintf (stderr, "%s\n", nbd_get_error ());
    exit (EXIT_FAILURE);
  }

  char *args[] = {
    "qemu-nbd", "-f", "qcow2",
    // "-s", snapshot,
    (char *) filename,
    NULL
  };
  if (nbd_connect_systemd_socket_activation (nbd, args) == -1) {
    fprintf (stderr, "%s\n", nbd_get_error ());
    exit (EXIT_FAILURE);
  }

Now you can get the virtual size:

  int64_t size = nbd_get_size (nbd);
  printf ("virtual size = %" PRIi64 "\n", size);

Or read and write sectors from the file:

  if (nbd_pread (nbd, buf, sizeof buf, 0, 0) == -1) {
    fprintf (stderr, "%s\n", nbd_get_error ());
    exit (EXIT_FAILURE);
  }

Once you’re done with the file, call nbd_close on the handle which will also shut down the qemu-nbd process.

A complete example can be found here.

1 Comment

Filed under Uncategorized

nbdkit supports exportnames

(You’ll need the very latest version of libnbd and nbdkit from git for this to work.)

The NBD protocol lets the client send an export name string to the server. The idea is a single server can serve different content to clients based on a requested export. nbdkit has largely ignored export names, but we recently added basic support upstream.

One consequence of this is you can now write a shell plugin which reflects the export name back to the client:

$ cat export.sh
#!/bin/bash -
case "$1" in
    open) echo "$3" ;;
    get_size) LC_ALL=C echo ${#2} ;;
    pread) echo "$2" | dd skip=$4 count=$3 iflag=skip_bytes,count_bytes ;;
    *) exit 2 ;;
esac
$ chmod +x export.sh
$ nbdkit -f sh export.sh

The size of the disk is the same as the export name:

$ nbdsh -u 'nbd://localhost/fooooo' -c 'print(h.get_size())'
6

The content and size of the disk is the exportname:

┌─────────────┐
│ f o o o o o │
└─────────────┘

Not very interesting in itself. But we can now pass the content of small disks entirely in the export name. Using a slightly more advanced plugin which supports base64-encoded export names (so we can pass in NUL bytes):

$ cat export-b64.sh
#!/bin/bash -
case "$1" in
    open) echo "$3" ;;
    get_size) echo "$2" | base64 -d | wc -c ;;
    pread) echo "$2" | base64 -d |
           dd skip=$4 count=$3 iflag=skip_bytes,count_bytes ;;
    can_write) exit 0 ;;
    pwrite) exit 1 ;;
    *) exit 2 ;;
esac
$ chmod +x export-b64.sh
$ nbdkit -f sh export-b64.sh

We can pass in an entire program to qemu:

qemu-system-x86_64 -fda 'nbd:localhost:10809:exportname=uBMAzRD8uACgjtiOwLQEo5D8McC5SH4x//OriwVAQKuIxJK4AByruJjmq7goFLsQJbELq4PAFpOr/sST4vUFjhWA/1x167+g1LEFuAQL6IUBg8c84vW+lvyAfAIgci3+xYD9N3SsrZetPCh0CzwgdQTGRP4o6F4Bgf5y/XXbiPAsAnLSNAGIwojG68qAdAIIRYPlB1JWVXUOtADNGjsWjPx09okWjPy+gPy5BACtPUABl3JD6BcBge9CAYoFLCByKVZXtAT25AHGrZfGBCC4CA7oAgFfXusfrQnAdC09APCXcxTo6ACBxz4BuAwMiXz+gL1AAQt1BTHAiUT+gD0cdQbHBpL8OAroxgDizL6S/K0IwHQMBAh1CLQc/g6R/HhKiUT+izzorgB1LrQCzRaoBHQCT0+oCHQCR0eoA3QNgz6A/AB1Bo1FCKOA/Jc9/uV0Bz0y53QCiQRdXlqLBID6AXYKBYACPYDUchvNIEhIcgODwARQ0eixoPbx/syA/JRYcgOAzhaJBAUGD5O5AwDkQDz8cg2/gvyDPQB0A6/i+Ikd6cL+GBg8JDx+/yQAgEIYEEiCAQC9234kPGbDADxa/6U8ZmYAAAAAAAAAAHICMcCJhUABq8NRV5xQu6R9LteTuQoA+Ij4iPzo4f/Q4+L1gcdsAlhAqAd14J1fWcNPVao='

(Spoiler)

1 Comment

Filed under Uncategorized

libnbd and nbdkit man pages online

Let’s seed those search engines …

http://libguestfs.org/libnbd.3.html

http://libguestfs.org/nbdkit.1.html

And yes I know there are a few broken links.

Leave a comment

Filed under Uncategorized

fio has support for testing NBD servers directly

fio (“flexible I/O tester”) is a key tool for testing performance of filesystems and block devices. A couple of weeks ago I added support for testing NBD servers directly.

You will need libnbd 0.9.8 and to compile fio from git with:

./configure --enable-libnbd
make

Testing either nbdkit or qemu-nbd is trivial. Just follow the instructions here.

2 Comments

Filed under Uncategorized