Tag Archives: nbdkit

nbdkit supports exportnames

(You’ll need the very latest version of libnbd and nbdkit from git for this to work.)

The NBD protocol lets the client send an export name string to the server. The idea is a single server can serve different content to clients based on a requested export. nbdkit has largely ignored export names, but we recently added basic support upstream.

One consequence of this is you can now write a shell plugin which reflects the export name back to the client:

$ cat export.sh
#!/bin/bash -
case "$1" in
    open) echo "$3" ;;
    get_size) LC_ALL=C echo ${#2} ;;
    pread) echo "$2" | dd skip=$4 count=$3 iflag=skip_bytes,count_bytes ;;
    *) exit 2 ;;
esac
$ chmod +x export.sh
$ nbdkit -f sh export.sh

The size of the disk is the same as the export name:

$ nbdsh -u 'nbd://localhost/fooooo' -c 'print(h.get_size())'
6

The content and size of the disk is the exportname:

┌─────────────┐
│ f o o o o o │
└─────────────┘

Not very interesting in itself. But we can now pass the content of small disks entirely in the export name. Using a slightly more advanced plugin which supports base64-encoded export names (so we can pass in NUL bytes):

$ cat export-b64.sh
#!/bin/bash -
case "$1" in
    open) echo "$3" ;;
    get_size) echo "$2" | base64 -d | wc -c ;;
    pread) echo "$2" | base64 -d |
           dd skip=$4 count=$3 iflag=skip_bytes,count_bytes ;;
    can_write) exit 0 ;;
    pwrite) exit 1 ;;
    *) exit 2 ;;
esac
$ chmod +x export-b64.sh
$ nbdkit -f sh export-b64.sh

We can pass in an entire program to qemu:

qemu-system-x86_64 -fda 'nbd:localhost:10809:exportname=uBMAzRD8uACgjtiOwLQEo5D8McC5SH4x//OriwVAQKuIxJK4AByruJjmq7goFLsQJbELq4PAFpOr/sST4vUFjhWA/1x167+g1LEFuAQL6IUBg8c84vW+lvyAfAIgci3+xYD9N3SsrZetPCh0CzwgdQTGRP4o6F4Bgf5y/XXbiPAsAnLSNAGIwojG68qAdAIIRYPlB1JWVXUOtADNGjsWjPx09okWjPy+gPy5BACtPUABl3JD6BcBge9CAYoFLCByKVZXtAT25AHGrZfGBCC4CA7oAgFfXusfrQnAdC09APCXcxTo6ACBxz4BuAwMiXz+gL1AAQt1BTHAiUT+gD0cdQbHBpL8OAroxgDizL6S/K0IwHQMBAh1CLQc/g6R/HhKiUT+izzorgB1LrQCzRaoBHQCT0+oCHQCR0eoA3QNgz6A/AB1Bo1FCKOA/Jc9/uV0Bz0y53QCiQRdXlqLBID6AXYKBYACPYDUchvNIEhIcgODwARQ0eixoPbx/syA/JRYcgOAzhaJBAUGD5O5AwDkQDz8cg2/gvyDPQB0A6/i+Ikd6cL+GBg8JDx+/yQAgEIYEEiCAQC9234kPGbDADxa/6U8ZmYAAAAAAAAAAHICMcCJhUABq8NRV5xQu6R9LteTuQoA+Ij4iPzo4f/Q4+L1gcdsAlhAqAd14J1fWcNPVao='

(Spoiler)

Advertisements

Leave a comment

Filed under Uncategorized

libnbd and nbdkit man pages online

Let’s seed those search engines …

http://libguestfs.org/libnbd.3.html

http://libguestfs.org/nbdkit.1.html

And yes I know there are a few broken links.

Leave a comment

Filed under Uncategorized

libnbd – A new NBD client library

NBD is a high performance protocol for exporting disks between processes and machines. We use it as a kind of “universal connector” for connecting hypervisors with data sources, and previously myself and Eric Blake wrote a general purpose NBD server called nbdkit. (If you’re interested in the topic of nbdkit as a universal connector, watch my FOSDEM talk.)

Up til now our NBD client has been qemu or one of the qemu tools like qemu-img. That was fine if you wanted to expose a disk source as a running virtual machine (ie. running it with qemu), or if you wanted to perform one of the limited copying operations that qemu-img convert can do, but there were many cases where it would have been nice to have a general client library.

For example I started to add NBD support to Jen Axboe’s FIO. Lacking a client library I synthesized NBD request packets as C structs and sent them on the wire using low level socket commands. The performance was, to put it bluntly, crap.

Although NBD is a very simple protocol and you can write it by hand, it would be nicer to have a library wrap the low-level stuff, and that’s why we have written libnbd (downloads).

Getting reasonable performance from NBD requires a few tricks:

  • You must issue as many commands as possible “in flight” (the server will reply to them out of order, but requests and replies are tied together by a unique ID).
  • You may need to open multiple connections to the server, but doing that requires attention to the special MULTI_CONN flag which the server will use to indicate that this is safe.
  • Most crucially you must disable Nagle’s algorithm.

This isn’t an exhaustive list. In fact while writing libnbd over about 3 weeks we improved performance by a factor of over 15 times, just by paying attention to system calls, maximizing parallelism and minimizing latency. One advantage of libnbd is that it encodes all this knowledge in an easy to use library so NBD clients won’t have to reinvent it in future.

The library has a simple high-level synchronous API which works how you would expect (but doesn’t get the best performance). A typical program might look like:

struct nbd_handle *nbd;
int64_t exportsize;
char buf[512];

nbd = nbd_create ();
if (!nbd) goto error;
if (nbd_connect_tcp (nbd, "localhost", "nbd") == -1)
  goto error;
exportsize = nbd_get_size (nbd);
if (nbd_pread (nbd, buf, sizeof buf, 0, 0) == -1) {
 error:
  fprintf (stderr, "%s\n", nbd_get_error ());
}

To get the best performance you have to use the more low-level asynchronous API which allows you to queue up commands and bring your own main loop.

There are also bindings in OCaml and Python (and Rust, soon). There’s also a nice little shell written in Python so you can access NBD servers interactively:

$ nbdsh
nbd> h.connect_command (["nbdkit", "-s", "memory", "1M"])
nbd> print ("%r" % h.get_size ())
1048576
nbd> h.pwrite (b"12345", 0)
nbd> h.pread (5, 0)
b'12345'

libnbd and the shell, nbdsh, are available now in Fedora 29 and above.

1 Comment

Filed under Uncategorized

virt-install + nbdkit live install

This seems to be completely undocumented which is why I’m writing this … It is possible to boot a Linux guest (Fedora in this case) from a live CD on a website without downloading it. I’m using our favourite flexible NBD server, nbdkit and virt-install.

First of all we’ll run nbdkit and attach it to the Fedora 29 live workstation ISO. To make this work more efficiently I’m going to place a couple of filters on top — one is the readahead (prefetch) filter recently added to nbdkit 1.12, and the other is the cache filter. In combination these filters should reduce the load on the website and improve local performance.

$ rm /tmp/socket
$ nbdkit -f -U /tmp/socket --filter=readahead --filter=cache \
    curl https://download.fedoraproject.org/pub/fedora/linux/releases/29/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-29-1.2.iso

I actually replaced that URL with a UK-based mirror to make the process a little faster.

Now comes the undocumented virt-install command:

$ virt-install --name test --ram 2048 \
    --disk /var/tmp/disk.img,size=10 
    --disk device=cdrom,source_protocol=nbd,source_host_transport=unix,source_host_socket=/tmp/socket \
    --os-variant fedora29

After a bit of grinding that should boot into Fedora 29, and you never (not explicitly at least) had to download the ISO.

Screenshot_2019-04-13_10-30-00

To be fair qemu does also have a curl driver which virt-install could use, but nbdkit is better with the filters and plugins system giving you ultimate flexibility — check out my video about it.

1 Comment

Filed under Uncategorized

nbdkit 1.12

The new stable release of nbdkit, our flexible Network Block Device server, is out. You can read the announcement and release notes here.

The big new features are SSH support, the linuxdisk plugin, writing plugins in Rust, and extents. Extents allows NBD clients to work out which parts of a disk are sparse or zeroes and skip reading them. It was hellishly difficult to write because of the number of obscure corner cases.

Also in this release, are a couple of interesting filters. The rate filter lets you add a bandwidth limit to connections. We will use this in virt-v2v to allow v2v instances to be rate limited (even dynamically). The readahead filter makes sequential copying and scanning of plugins more efficient by prefetching data ahead of time. It is self-configuring and in most cases simply adding the filter into your filter stack is sufficient to get a nice performance boost, assuming your client’s access patterns are mostly sequential.

1 Comment

Filed under Uncategorized

nbdkit linuxdisk plugin

I’m writing a new nbdkit plugin called linuxdisk. nbdkit is our flexible, plugin-based NBD server, and this new plugin lets you create a complete Linux-compatible virtual disk from a host directory on the fly.

One of the many uses for this is booting minimal VMs very quickly. Here’s an example you can set up in a few seconds. It boots to an interactive busybox shell:

$ mkdir /tmp/root /tmp/root/sbin /tmp/root/bin /tmp/root/dev
$ sudo mknod /tmp/root/dev/console c 5 1
$ cp /sbin/busybox /tmp/root/sbin/
$ ln /tmp/root/sbin/busybox /tmp/root/bin/sh
$ ln /tmp/root/sbin/busybox /tmp/root/bin/ls
$ ln /tmp/root/sbin/busybox /tmp/root/sbin/init
$ nbdkit -U - linuxdisk /tmp/root \
    --run 'qemu-kvm -display none -kernel /boot/vmlinuz-4.20.8-200.fc29.x86_64 -drive file=nbd:unix:$unixsocket,snapshot=on -append "console=ttyS0 root=/dev/sda1 rw" -serial stdio'

If you need any extra files in the VM just drop them straight into /tmp/root before booting it.

Edit: How the heck does /dev get populated in this VM?

4 Comments

Filed under Uncategorized

Write nbdkit plugins in Rust

nbdkit is our flexible, pluggable NBD server. See my FOSDEM video to find out more about some of the marvellous things you can do with it.

The news is you can now write nbdkit plugins in Rust. As with the OCaml bindings, Rust plugins compile to native *.so plugin files which are loaded and called directly from nbdkit. However it really needs a Rust expert to implement a more natural and idiomatic way to use the API than what we have now.

1 Comment

Filed under Uncategorized