Tag Archives: virtualization

nbdkit + libblkio

Our plugin-based Network Block Device server, nbdkit, now has support for libblkio.

libblkio is a library written by Stefan Hajnoczi, Alberto Faria, Stefano Garzarella and others for accessing some somewhat unusual disk protocols including vhost-user, NVMe, vDPA, VFIO and io_uring which I’ll talk about below. It’s important to know that these are not disk formats (like raw or qcow2), but accelerated protocols for talking to virtual or real hardware.

The library is written in Rust (but offers a C API) and I believe it’s intended to replace various bottom-end parts of the qemu block layer at some point in the future.

The library uses a set of property strings to describe how to connect to a device. The nbdkit plugin maps those almost exactly into command line parameters, so you can usually follow the libblkio docs and translate that into an nbdkit command line, eg:

$ nbdkit blkio io_uring path=fedora.img

This sets the libblkio driver to “io_uring” and the path to the path of a local file. This libblkio driver uses Linux’s relatively new io_uring facility to access a local file or block device, the simplest way to use libblkio.

The other most frequently used protocol or libblkio driver is vhost-user. This is a protocol that allows a server to share a disk image to client(s) on the same machine. It uses a Unix domain socket for communication, but unlike Network Block Device (NBD) it’s not possible to use this over the network. For greater performance vhost uses shared memory between the client and server for data transfer.

qemu-storage-daemon is the most common server:

$ qemu-storage-daemon \

--blockdev driver=file,node-name=file,filename=fedora.qcow2 \

--blockdev driver=qcow2,node-name=qcow2,file=file \

--export type=vhost-user-blk,id=export,addr.type=unix,addr.path=sock,node-name=qcow2

To connect from nbdkit, just use the socket:

$ nbdkit blkio virtio-blk-vhost-user path=sock

You might wonder why we want to add libblkio support to nbdkit (apart from it being fun). There’s a practical reason which is this brings along all of the scripting support we’ve created around NBD to these somewhat obscure (albeit quite widely used) protocols. I don’t think it was possible before to use Python to script against, eg., vhost-user, but now it is:

$ nbdsh -u nbd://localhost -c 'print("%r" % h.pread(512,0))'

Leave a comment

Filed under Uncategorized

Creating a modifiable gzipped disk image

Dusty Mabe set me a challenge yesterday. He wants to create several compressed disk images that have slightly different content, but are otherwise mostly the same. The disk images are large and compressing them takes a long time (30 minutes each, apparently), so ideally what we’d want to do is compress the disk image just the once and then do the updates on the gzipped image.

Modifying a file which has already been compressed is not usually possible.

However if we make some relatively uncontroversial assumptions and accept a few limitations then we can create a compressed disk image which is modifiable in this way, certainly for gzip and xz (I need to investigate zstd).

Firstly let’s assume we’re using some kind of block-based compression with fixed size blocks. This is true for gzip and xz already. Secondly let’s assume that we want to only modify a single, small partition of the image (Dusty only wants to modify the /boot partition). Lastly we’ll assume that the partition boundaries are aligned to the compression blocks. Since partitions can be placed under control of whoever creates the disk image this last one is pretty easy to arrange.

The trick is to use uncompressed blocks for the part of the input covering the partition you want to modify, and compress the rest of the file normally. Both gzip and xz have an uncompressed block type. (In fact, just about any reasonable compression algorithm works like this – if the input data doesn’t become smaller after trying to compress it, the algorithm will emit the data uncompressed since that takes less space.)

Normal tools won’t let you create files like this, but I wrote one for gzip here, and I’m confident that one could be written for xz (exercise for readers! … or me if we decide to productise this).

I created a normal Fedora 36 image using virt-builder and used guestfish to find the partition boundaries:

$ guestfish --ro -a /var/tmp/fedora-36.img -i part-list /dev/sda
[0] = {
  part_num: 1
  part_start: 1048576
  part_end: 2097151
  part_size: 1048576
}
[1] = {
  part_num: 2
  part_start: 2097152
  part_end: 1075838975
  part_size: 1073741824
}
[2] = {
  part_num: 3
  part_start: 1075838976
  part_end: 6441402367
  part_size: 5365563392
}

We will compress partition #3 while leaving partitions #1 and #2 uncompressed so they can be modified in place later. The tool is a bit crude, but:

$ ./partial-deflate /var/tmp/fedora-36.img /var/tmp/fedora-36.img.gz 0 1075838976 6441402368 

The output is a regular gzip file (albeit rather large because the first 1GB is uncompressed – if I was doing this for real I’d make that boot partition smaller):

$ ll /var/tmp/fedora-36.img*
-rw-r--r--. 1 rjones rjones 6442450944 Nov 30 17:03 /var/tmp/fedora-36.img
-rw-r--r--. 1 rjones rjones 1698849910 Dec  1 16:23 /var/tmp/fedora-36.img.gz
$ zcat /var/tmp/fedora-36.img.gz | md5sum 
57cbd5ebcbe59613378c8aee7ad9e40d  -
$ md5sum /var/tmp/fedora-36.img
57cbd5ebcbe59613378c8aee7ad9e40d  /var/tmp/fedora-36.img

Then I went ahead and modified some known content inside the gzip file (but not compressed). I used a hex editor for this, but you could play around with guestfish + nbdkit-offset-filter for something more supportable:

And the result is a gzipped file with the modifications:

$ zcat /var/tmp/fedora-36.img.gz > /var/tmp/fedora-36.img-modified
gzip: /var/tmp/fedora-36.img.gz: invalid compressed data--crc error
$ guestfish --ro -a /var/tmp/fedora-36.img-modified -i cat /boot/mydata.txt 
helloworld------

…and a CRC error. That’s to be expected, as I haven’t yet worked out how to update CRC-32 after making changes, but it should be easy to solve (with brute force if necessary).

1 Comment

Filed under Uncategorized

nbdkit for macOS

nbdkit, our high performance, portable Network Block Device server has now been ported to macOS. It’s a command line tool and macOS is sufficiently FreeBSD-like that the port wasn’t very hard. It’s relatively full featured, including a large portion of the plugins and filters, a brand new exit-with-parent implementation, and almost all tests passing.

However one larger problem remains (for performance) which is the lack of atomic CLOEXEC when opening pipes or sockets. Linux has pipe2 and accept4. I wasn’t able to find any good equivalent on macOS, and hence most of the time we are limited to serializing some requests that could otherwise run in parallel.

nbdkit already supported Linux, FreeBSD, OpenBSD, Haiku and Windows!

Leave a comment

Filed under Uncategorized

Composable tools for disk images

Over the past 3 or 4 years, my colleagues and I at Red Hat have been making a set of composable command line tools for handling virtual machine disk images. These let you copy, create, manipulate, display and modify disk images using simple tools that can be connected together in pipelines, while at the same time working very efficiently. It’s all based around the very efficient Network Block Device (NBD) protocol and NBD URI specification.

A basic and very old tool is qemu-img:

$ qemu-img create -f qcow2 disk.qcow2 1G

which creates an empty disk image in qcow2 format. Suppose you want to write into this image? We can compose a few programs:

$ touch disk.raw
$ nbdfuse disk.raw [ qemu-nbd -f qcow2 disk.qcow2 ] &

This serves the qcow2 file up over NBD (qemu-nbd) and then exposes that as a local file using FUSE (nbdfuse). Of interest here, nbdfuse runs and manages qemu-nbd as a subprocess, cleaning it up when the FUSE file is unmounted. We can partition the file using regular tools:

$ gdisk disk.raw
Command (? for help): n
Partition number (1-128, default 1): 
First sector (34-2097118, default = 2048) or {+-}size{KMGTP}: 
Last sector (2048-2097118, default = 2097118) or {+-}size{KMGTP}: 
Current type is 8300 (Linux filesystem)
Hex code or GUID (L to show codes, Enter = 8300): 
Changed type of partition to 'Linux filesystem'
Command (? for help): p
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048         2097118   1023.0 MiB  8300  Linux filesystem
Command (? for help): w

Let’s fill that partition with some files using guestfish and unmount it:

$ guestfish -a disk.raw run : \
  mkfs ext2 /dev/sda1 : mount /dev/sda1 / : \
  copy-in ~/libnbd /
$ fusermount3 -u disk.raw
[1]+  Done    nbdfuse disk.raw [ qemu-nbd -f qcow2 disk.qcow2 ]

Now the original qcow2 file is no longer empty but populated with a partition, a filesystem and some files. We can see the space used by examining it with virt-df:

$ virt-df -a disk.qcow2 -h
Filesystem                Size   Used  Available  Use%
disk.qcow2:/dev/sda1     1006M    52M       903M    6%

Now let’s see the first sector. You can’t just “cat” a qcow2 file because it’s a complicated format understood only by qemu. I can assemble qemu-nbd, nbdcopy and hexdump into a pipeline, where qemu-nbd converts the qcow2 format to raw blocks, and nbdcopy copies those out to a pipe:

$ nbdcopy -- [ qemu-nbd -r -f qcow2 disk.qcow2 ] - | \
  hexdump -C -n 512
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001c0  02 00 ee 8a 08 82 01 00  00 00 ff ff 1f 00 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
00000200

How about instead of a local file, we start with a disk image hosted on a web server, and compressed? We can do that too. Let’s start by querying the size by composing nbdkit’s curl plugin, xz filter and nbdinfo. nbdkit’s --run option composes nbdkit with an external program, connecting them together over an NBD URI ($uri).

$ web=http://mirror.bytemark.co.uk/fedora/linux/development/rawhide/Cloud/x86_64/images/Fedora-Cloud-Base-Rawhide-20220127.n.0.x86_64.raw.xz
$ nbdkit curl --filter=xz $web --run 'nbdinfo $uri'
protocol: newstyle-fixed without TLS
export="":
	export-size: 5368709120 (5G)
	content: DOS/MBR boot sector, extended partition table (last)
	uri: nbd://localhost:10809/
...

Notice it prints the uncompressed (raw) size. Fedora already provides a qcow2 equivalent, but we can also make our own by composing nbdkit, curl, xz, nbdcopy and qemu-nbd:

$ qemu-img create -f qcow2 cloud.qcow2 5368709120 -o preallocation=metadata
$ nbdkit curl --filter=xz $web \
    --run 'nbdcopy -p -- $uri [ qemu-nbd -f qcow2 cloud.qcow2 ]'

Why would you do that instead of downloading and uncompressing? In this case it wouldn’t matter much, but in the general case the disk image might be enormous (terabytes) and you don’t have enough local disk space to do it. Assembling tools into pipelines means you don’t need to keep an intermediate local copy at any point.

We can find out what we’ve got in our new image using various tools:

$ qemu-img info cloud.qcow2 
image: cloud.qcow2
file format: qcow2
virtual size: 5 GiB (5368709120 bytes)
disk size: 951 MiB
$ virt-df -a cloud.qcow2  -h
Filesystem              Size       Used  Available  Use%
cloud.qcow2:/dev/sda2   458M        50M       379M   12%
cloud.qcow2:/dev/sda3   100M       9.8M        90M   10%
cloud.qcow2:/dev/sda5   4.4G       311M       3.6G    7%
cloud.qcow2:btrfsvol:/dev/sda5/root
                        4.4G       311M       3.6G    7%
cloud.qcow2:btrfsvol:/dev/sda5/home
                        4.4G       311M       3.6G    7%
cloud.qcow2:btrfsvol:/dev/sda5/root/var/lib/portables
                        4.4G       311M       3.6G    7%
$ virt-cat -a cloud.qcow2 /etc/redhat-release
Fedora release 36 (Rawhide)

If we wanted to play with the guest in a sandbox, we could stand up an in-memory NBD server populated with the cloud image and connect it to qemu using standard NBD URIs:

$ nbdkit memory 10G
$ qemu-img convert cloud.qcow2 nbd://localhost 
$ virt-customize --format=raw -a nbd://localhost \
    --root-password password:123456 
$ qemu-system-x86_64 -machine accel=kvm \
    -cpu host -m 2048 -serial stdio \
    -drive file=nbd://localhost,if=virtio 
...
fedora login: root
Password: 123456

# lsblk
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sr0     11:0    1 1024M  0 rom  
zram0  251:0    0  1.9G  0 disk [SWAP]
vda    252:0    0   10G  0 disk 
├─vda1 252:1    0    1M  0 part 
├─vda2 252:2    0  500M  0 part /boot
├─vda3 252:3    0  100M  0 part /boot/efi
├─vda4 252:4    0    4M  0 part 
└─vda5 252:5    0  4.4G  0 part /home
                                /

We can even find out what changed between the in-memory copy and the pristine qcow2 version (quite a lot as it happens):

$ virt-diff --format=raw -a nbd://localhost --format=qcow2 -A cloud.qcow2 
- d 0755       2518 /etc
+ d 0755       2502 /etc
# changed: st_size
- - 0644        208 /etc/.updated
- d 0750        108 /etc/audit
+ d 0750         86 /etc/audit
# changed: st_size
- - 0640         84 /etc/audit/audit.rules
- d 0755         36 /etc/issue.d
+ d 0755          0 /etc/issue.d
# changed: st_size
... for several pages ...

In conclusion, we’ve got a couple of ways to serve disk content over NBD, a set of composable tools for copying, creating, displaying and modifying disk content either from local files or over NBD, and a way to pipe disk data between processes and systems.

We use this in virt-v2v which can suck VMs out of VMware to KVM systems, efficiently, in parallel, and without using local disk space for even the largest guest.

Leave a comment

Filed under Uncategorized

nbdkit now supports LUKS encryption

nbdkit, our permissively licensed plugin-based Network Block Device server can now transparently decode encrypted disks, for both reading and writing:

qemu-img create -f luks --object secret,data=SECRET,id=sec0 -o key-secret=sec0 encrypted-disk.img 1G

nbdkit file encrypted-disk.img --filter=luks passphrase=+/tmp/secret

We use LUKSv1 as the encryption format. That’s an older version [more on that in a moment] of the format used for Full Disk Encryption on Linux. It’s much preferable to use LUKS rather than using qemu’s built-in qcow2 encryption, and our implementation is compatible with qemu’s.

You can place the filter on top of other nbdkit plugins, like Curl:

nbdkit curl https://example.com/encrypted-disk.img --filter=luks passphrase=+/tmp/secret

The threat model here is that you can store the encrypted data on a remote server, and the admin of the server cannot decrypt the disk (assuming you don’t give them the passphrase).

If you try this filter (or qemu’s device) with a modern Linux LUKS disk you’ll find that it doesn’t work. This is because modern Linux uses LUKSv2, although they are able to create, read and write LUKSv1 if you use set them up that way in advance. Unfortunately LUKSv2 is significantly more complicated than LUKSv1. It requires parsing JSON data(!) stored in the header, and supports a wider range of password derivation functions, typically the very slow and memory-intensive argon2. LUKSv1 by contrast only requires support for PBKDF2 and is generally far more straightforward to implement.

The new filter will be available in nbdkit 1.32, or you can grab the development version now.

2 Comments

Filed under Uncategorized

nbdkit 1.24 & libnbd 1.6, new copying tool

As well as nbdkit 1.24 being released on Thursday, its sister project libnbd 1.6 was released at the same time. This comes with an enhanced copying tool called nbdcopy designed to replace some uses of qemu-img convert (note: it’s not a general replacement).

nbdcopy lets you copy from and to NBD servers (nbdkit, qemu-nbd, qemu-storage-daemon, nbd-server), local files, local block devices, pipes/sockets, and stdin/stdout. For example to stream the content of an NBD server:

$ nbdcopy nbd://localhost - | hexdump -C

The “-” character streams to stdout. nbd://localhost is an NBD URI referring to an NBD server that is already running. What if you don’t have an already running server? nbdcopy lets you run one from the command line (and cleans up after). For example this is one way to convert a qcow2 file to raw:

$ nbdcopy -- [ qemu-nbd -f qcow2 disk.qcow ] disk.raw

Here the [ ... ] section starts qemu-nbd as a captive NBD server, exposing privately an NBD endpoint, and nbdcopy copies this to local file disk.raw. (“--” is needed to stop nbdcopy trying to interpret qemu-nbd’s own command line arguments.)

However this post is really about the nbdkit release. How did I test and benchmark nbdcopy? Of course I wrote an nbdkit plugin called nbdkit-sparse-random-plugin. This plugin has two clever features for testing copying tools. Firstly it creates random disks which have the same “shape” as virtual machine disk images (but without the overhead of needing to bother with an actual VM). Secondly it can act as both a source and target for testing copies.

Let’s unpack those two things a bit further.

Virtual machine disk images (especially mostly empty ones) are mostly sparse. Here’s part of the sparse map from a Fedora 32 disk image:

$ virt-builder fedora-32
$ filefrag -e fedora-32.img 
 Filesystem type is: 58465342
 File size of fedora-32.img is 6442450944 (1572864 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..       0:    2038672..   2038672:      1:            
    1:        1..      15:    2176040..   2176054:     15:    2038673:
    2:      256..     271:    2188819..   2188834:     16:    2176295:
    3:      512..    3135:    3650850..   3653473:   2624:    2189075:
    4:     3168..    4463:    3781763..   3783058:   1296:    3653506:
[...]

The new sparse-random plugin generates a disk image which has a similar shape — islands of random data in a sea of sparseness. The algorithm for doing this is quite neat. Because the plugin doesn’t need to store the data, unlike a real disk image, it can generate huge disk images (eg. a terabyte) while using almost no memory. We use a low-overhead, high-quality random number generator and are smart about seeds so that every run of sparse-random with the same seed produces identical output.

The other part of this plugin is how we can use it to test copying tools like nbdcopy and qemu-img convert. My idea was that the plugin could be used both as the source and the target of the copy:

$ nbdkit -U - sparse-random 1T --run ' nbdcopy "$uri" "$uri" '

Here we create a terabyte-sized sparse-random disk, and get nbdcopy to copy from the plugin to the plugin. On reads sparse-random supplies the sparseness and random data. On writes it checks if what is being written matches the content of the plugin, throwing -EIO errors if not. Assuming the copying tool is correctly handling errors, we can both validate the copying tool and benchmark it. And it works with qemu-img convert too:

$ nbdkit -U - sparse-random 1T --run ' qemu-img convert "$uri" "$uri" '

And now we can see which one is faster.

Try it, you may be surprised.

Leave a comment

Filed under Uncategorized

nbdkit 1.24, new data plugin features

nbdkit 1.24 was released on Thursday. It’s our flexible, fast network block device with loads of features. nbdkit-data-plugin, a plugin that lets you create test patterns from the command line gained some interesting new functionality:

$ nbdkit data ' ( 0x55 0xAA )*2048 '

This command worked before as a way to create a repeating test pattern in a disk image. A new feature is you can write a shell script snippet to generate the pattern instead:

$ nbdkit data ' <( while :; do printf "%04x" $((i++)); done ) [:2048] '

This command will create a pattern of characters “0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 …” (truncated to 2048 bytes). We could turn that into a block device and display the contents:

# nbd-client localhost /dev/nbd0
# blockdev --getsize64 /dev/nbd0
2048
# dd if=/dev/nbd0 | hexdump -C | head
4+0 records in
4+0 records out
2048 bytes (2.0 kB, 2.0 KiB) copied, 0.000167082 s, 12.3 MB/s
00000000  30 30 30 30 30 30 30 31  30 30 30 32 30 30 30 33  |0000000100020003|
00000010  30 30 30 34 30 30 30 35  30 30 30 36 30 30 30 37  |0004000500060007|
00000020  30 30 30 38 30 30 30 39  30 30 30 61 30 30 30 62  |00080009000a000b|
00000030  30 30 30 63 30 30 30 64  30 30 30 65 30 30 30 66  |000c000d000e000f|
00000040  30 30 31 30 30 30 31 31  30 30 31 32 30 30 31 33  |0010001100120013|
00000050  30 30 31 34 30 30 31 35  30 30 31 36 30 30 31 37  |0014001500160017|
00000060  30 30 31 38 30 30 31 39  30 30 31 61 30 30 31 62  |00180019001a001b|
00000070  30 30 31 63 30 30 31 64  30 30 31 65 30 30 31 66  |001c001d001e001f|
00000080  30 30 32 30 30 30 32 31  30 30 32 32 30 30 32 33  |0020002100220023|
00000090  30 30 32 34 30 30 32 35  30 30 32 36 30 30 32 37  |0024002500260027|
# nbd-client -d /dev/nbd0
# killall nbdkit

The data plugin also lets you read from files which is useful for making disks with random initial data. For example here’s how to create a disk with 16 identical sectors of random data (notice how /dev/random is read in, truncated to 512 bytes, and then 16 copies are made):

$ nbdkit data ' </dev/urandom[:512]*16 '

The plugin can also create sparse disks. You can do this just by moving the current offset using “@”:

$ nbdkit data ' @32768 1 ' --run 'nbdinfo --map "$uri"'
     0       32768    3  hole,zero
 32768           1    0  allocated

We use this plugin quite extensively when testing libnbd.

1 Comment

Filed under Uncategorized

nbdkit tar filter

nbdkit is our high performance liberally licensed Network Block Device server, and OVA files are a common pseudo-standard for exporting virtual machines including their disk images.

A .ova file is really an uncompressed tar file:

$ tar tf rhel.ova
rhel.ovf
rhel-disk1.vmdk
rhel.mf

Since tar files usually store their content unmangled, this opens an interesting possibility for reading (or even writing) the embedded disk image without needing to unpack the tar. You just have to work out the offset of the disk image within the tar file. virt-v2v has used this trick to save a copy when importing OVAs for years.

nbdkit has also included a tar plugin which can access a file inside a local tar file, but the problem is what if the tar file doesn’t happen to be a local file? (eg. It’s on a webserver). Or what if it’s compressed?

To fix this I’ve turned the plugin into a filter. Using nbdkit-tar-filter you can unpack even non-local compressed tar files:

$ nbdkit curl http://example.com/qcow2.tar.xz \
         --filter=tar --filter=xz tar-entry=disk.qcow2

(To understand how filters are stacked, see my FOSDEM talk from last year). Because in this example the disk inside the tarball is a qcow2 file, it appears as qcow2 on the wire, so:

$ guestfish --ro --format=qcow2 -a nbd://localhost

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

><fs> run
><fs> list-filesystems 
/dev/sda1: ext2
><fs> mount /dev/sda1 /
><fs> ll /
total 19
drwxr-xr-x   3 root root  1024 Jul  6 20:03 .
drwxr-xr-x  19 root root  4096 Jul  9 11:01 ..
-rw-rw-r--.  1 1000 1000    11 Jul  6 20:03 hello.txt
drwx------   2 root root 12288 Jul  6 20:03 lost+found

Leave a comment

Filed under Uncategorized

nbdkit with BitTorrent

nbdkit is our high performance Network Block Device server for serving disk images from unusual sources. One (usual) source for Linux installers is to download an ISO from a website like Get Fedora or debian.org. However that costs the host money and is also a central point of failure, so another way to download Linux installers is over BitTorrent. Many Linux distros offer torrents of their installers including Fedora and Debian. By using these you are helping to redistribute Linux and defraying the cost of hosting these ISOs.

Now I’ve written a BitTorrent plugin for nbdkit so you can download, redistribute and install Linux all at the same time!

$ url=https://torrent.fedoraproject.org/torrents/Fedora-Server-dvd-x86_64-32.torrent
$ wget $url
$ nbdkit -U - torrent Fedora-Server-*.torrent \
         --run 'qemu-system-x86_64 -m 2048 -cdrom $nbd -boot d'

So what’s the serious use for this? It has the interesting property that the more people who are installing your Linux distro, the less bandwidth it uses and the faster it runs! This could be interesting technology for any kind of distributed environment where you have lots of machines accessing the same fixed/read-only filesystem or disk image.

If you want to get started with nbdkit it’s already in all popular Linux distributions, and compiles from source on Linux, FreeBSD and OpenBSD.

Leave a comment

Filed under Uncategorized

Golang bindings for both libnbd and nbdkit

I have to say for full transparency up front that Golang is not my favourite programming language, even less after using it for a while. Nevertheless with a lot of help from Dan Berrangé we now have Golang bindings for libnbd and nbdkit which are respectively client and server software for the Linux Network Block Device protocol.

The Golang bindings for libnbd let you connect to a server and read and write from it. This is all pretty straightforward so read the manual page if you want to find out more.

The Golang bindings for nbdkit are considerably more interesting because you can use them to write pretty natural and high performance NBD servers to expose “interesting things”.

I’m hoping in particular there are interesting block device sources in the Kubernetes / Docker ecosystem which are probably only available from Golang that we could now expose to other software (although I’m also still researching this area so I don’t yet know what in particular).

You can make a complete Golang NBD server really easily now with only a few lines of code. Minus boilerplate, something like this is sufficient (see this link for complete working examples):

type MyPlugin struct {
	nbdkit.Plugin
}

type MyConnection struct {
	nbdkit.Connection
}

func (p *MyPlugin) Open(readonly bool) (nbdkit.ConnectionInterface, error) {
	return &MyConnection{}, nil
}

func (c *MyConnection) GetSize() (uint64, error) {
	return size, nil
}

func (c *MyConnection) PRead(buf []byte, offset uint64,
	flags uint32) error {
	copy(buf, ... from the source of your data here ...)
	return nil
}

func (c *MyConnection) CanWrite() (bool, error) {
	return true, nil
}

func (c *MyConnection) PWrite(buf []byte, offset uint64,
	flags uint32) error {
	copy(... to the data source here ..., buf)
	return nil
}

Editor note: In an earlier version of these bindings we passed the whole struct to each callback rather than a pointer, hence James’s first comment below.

3 Comments

Filed under Uncategorized