Category Archives: Uncategorized

NBD’s state machine


Eric and I are writing a Linux NBD client library. There were lots of requirements but the central one for this post is it has to be a library callable from programs written in C and other programming languages (Python, OCaml and Rust being important), and we don’t control those programs so they may be single or multithreaded, or may use non-blocking main loops like gio and glib.

An NBD command involves sending a request over a socket to a remote server and receiving a reply. You can also have multiple requests “in flight” and the reply can be received in multiple parts. On top of this the “fixed newstyle” NBD protocol has a complex multi-step initial handshake. Complicating it further we might be using a TLS transport which has its own handshake.

It’s complicated and we mustn’t block the caller.

There are a few ways to deal with this in my experience — one is to ignore the problem and insist that the main program uses a thread for each NBD connection, but that pushes complexity onto someone else. Another way is to use some variation of coroutines or call/cc — if we get to a place where we would block then we save the stack, return to the caller, and have some way to restore the stack later. However this doesn’t necessarily work well with non-C programming languages. It likely won’t work with either OCaml or Ruby’s garbage collectors since they both involve stack walking to find GC roots. I’d generally want to avoid “tricksy” stuff in a library.

The final way that I know about is to implement a state machine. However large state machines are hellishly difficult to write. Our state machine has 75 states (so far — it’s nowhere near finished). So we need a lot of help.

I came up with a slightly nicer way to write state machines.

The first idea is that states in a large state machine could be grouped. You can consider each group like a mini state machine — it has its own namespace, lives in a single file (example), and may only be entered via a single START state (so you don’t need to consider what happens if another part of the state machine jumps into the middle of the group).

Secondly groups can be hierarchical. This lets us organise the groups logically, so for example there is a group for “fixed newstyle handshaking” and within that there are separate groups for negotiating each newstyle option. States can refer to each other using either relative or absolute paths in this hierarchy.

Thirdly all states and transitions are defined and checked in a single place, allowing us to enforce rules about what transitions are permitted.

Fourthly the final C code that implements the state machine is generated (mostly). This lets us generate helper functions to (eg) turn state transitions into debug messages, or return whether the connection is in a mode where it’s expecting to read or write from the socket (making it easier to integrate with main loops).

The final code looks like this and generates currently 173K lines of C (although as it’s mostly large switch statements it compiles down to a reasonably small size).

Has anyone else implemented a large state machine in a similar way?



Filed under Uncategorized

virt-install + nbdkit live install

This seems to be completely undocumented which is why I’m writing this … It is possible to boot a Linux guest (Fedora in this case) from a live CD on a website without downloading it. I’m using our favourite flexible NBD server, nbdkit and virt-install.

First of all we’ll run nbdkit and attach it to the Fedora 29 live workstation ISO. To make this work more efficiently I’m going to place a couple of filters on top — one is the readahead (prefetch) filter recently added to nbdkit 1.12, and the other is the cache filter. In combination these filters should reduce the load on the website and improve local performance.

$ rm /tmp/socket
$ nbdkit -f -U /tmp/socket --filter=readahead --filter=cache \

I actually replaced that URL with a UK-based mirror to make the process a little faster.

Now comes the undocumented virt-install command:

$ virt-install --name test --ram 2048 \
    --disk /var/tmp/disk.img,size=10 
    --disk device=cdrom,source_protocol=nbd,source_host_transport=unix,source_host_socket=/tmp/socket \
    --os-variant fedora29

After a bit of grinding that should boot into Fedora 29, and you never (not explicitly at least) had to download the ISO.


To be fair qemu does also have a curl driver which virt-install could use, but nbdkit is better with the filters and plugins system giving you ultimate flexibility — check out my video about it.

1 Comment

Filed under Uncategorized

nbdkit 1.12

The new stable release of nbdkit, our flexible Network Block Device server, is out. You can read the announcement and release notes here.

The big new features are SSH support, the linuxdisk plugin, writing plugins in Rust, and extents. Extents allows NBD clients to work out which parts of a disk are sparse or zeroes and skip reading them. It was hellishly difficult to write because of the number of obscure corner cases.

Also in this release, are a couple of interesting filters. The rate filter lets you add a bandwidth limit to connections. We will use this in virt-v2v to allow v2v instances to be rate limited (even dynamically). The readahead filter makes sequential copying and scanning of plugins more efficient by prefetching data ahead of time. It is self-configuring and in most cases simply adding the filter into your filter stack is sufficient to get a nice performance boost, assuming your client’s access patterns are mostly sequential.

1 Comment

Filed under Uncategorized

Tip: Edit grub kernel command line in RHEL 7 or CentOS 7

Easy with virt-customize. In this example I’m adding the nosmt option to the command line:

$ virt-customize -a rhel7.img \
    --edit '/etc/default/grub:
    --run-command 'grub2-mkconfig -o /boot/grub2/grub.cfg'

Leave a comment

Filed under Uncategorized

nbdkit linuxdisk plugin

I’m writing a new nbdkit plugin called linuxdisk. nbdkit is our flexible, plugin-based NBD server, and this new plugin lets you create a complete Linux-compatible virtual disk from a host directory on the fly.

One of the many uses for this is booting minimal VMs very quickly. Here’s an example you can set up in a few seconds. It boots to an interactive busybox shell:

$ mkdir /tmp/root /tmp/root/sbin /tmp/root/bin /tmp/root/dev
$ sudo mknod /tmp/root/dev/console c 5 1
$ cp /sbin/busybox /tmp/root/sbin/
$ ln /tmp/root/sbin/busybox /tmp/root/bin/sh
$ ln /tmp/root/sbin/busybox /tmp/root/bin/ls
$ ln /tmp/root/sbin/busybox /tmp/root/sbin/init
$ nbdkit -U - linuxdisk /tmp/root \
    --run 'qemu-kvm -display none -kernel /boot/vmlinuz-4.20.8-200.fc29.x86_64 -drive file=nbd:unix:$unixsocket,snapshot=on -append "console=ttyS0 root=/dev/sda1 rw" -serial stdio'

If you need any extra files in the VM just drop them straight into /tmp/root before booting it.

Edit: How the heck does /dev get populated in this VM?


Filed under Uncategorized

Write nbdkit plugins in Rust

nbdkit is our flexible, pluggable NBD server. See my FOSDEM video to find out more about some of the marvellous things you can do with it.

The news is you can now write nbdkit plugins in Rust. As with the OCaml bindings, Rust plugins compile to native *.so plugin files which are loaded and called directly from nbdkit. However it really needs a Rust expert to implement a more natural and idiomatic way to use the API than what we have now.

1 Comment

Filed under Uncategorized

Video: Take your loop mounts to the next level with nbdkit

Loop mounting is popular, but very limited in what it can do on Linux. I gave a talk at FOSDEM on Saturday entitled Better loop mounts with NBD: Take your loop mounts to the next level with nbdkit, and it’s online already!

Download the WebM format or MP4 format files directly.

Also I did subtitles! Download the subtitles directly here. (The subs only cover the first 30 minutes of the talk, not the Summary and Q&A.)

Photo from presentation
(Thanks to Thomas Huth for the photo)

There are a few small problems and corrections:

  1. There’s a part of the talk where I refer to the light blue trimmed blocks. Unfortunately the video feed didn’t capture that, so the light blue looks like white. If you really want to see that then go look at the video in my earlier post.
  2. During the Q&A I mentioned that we could support writing to xz files. This is true, sort of, but I forgot that there’s a problem: nbdkit doesn’t support file resizing (and I believe that’s even experimental in the NBD protocol), so someone would have to add that first. There are other serious down-sides to implementing writable XZ, I doubt it could ever be fast.

For subtitles I used gaupol which is actually quite nice, although subtitling is inherently slow and tedious. It took me a good 4 hours to subtitle 30 minutes of video.


Filed under Uncategorized