Tag Archives: nbdkit

nbdkit 1.24 & libnbd 1.6, new copying tool

As well as nbdkit 1.24 being released on Thursday, its sister project libnbd 1.6 was released at the same time. This comes with an enhanced copying tool called nbdcopy designed to replace some uses of qemu-img convert (note: it’s not a general replacement).

nbdcopy lets you copy from and to NBD servers (nbdkit, qemu-nbd, qemu-storage-daemon, nbd-server), local files, local block devices, pipes/sockets, and stdin/stdout. For example to stream the content of an NBD server:

$ nbdcopy nbd://localhost - | hexdump -C

The “-” character streams to stdout. nbd://localhost is an NBD URI referring to an NBD server that is already running. What if you don’t have an already running server? nbdcopy lets you run one from the command line (and cleans up after). For example this is one way to convert a qcow2 file to raw:

$ nbdcopy -- [ qemu-nbd -f qcow2 disk.qcow ] disk.raw

Here the [ ... ] section starts qemu-nbd as a captive NBD server, exposing privately an NBD endpoint, and nbdcopy copies this to local file disk.raw. (“--” is needed to stop nbdcopy trying to interpret qemu-nbd’s own command line arguments.)

However this post is really about the nbdkit release. How did I test and benchmark nbdcopy? Of course I wrote an nbdkit plugin called nbdkit-sparse-random-plugin. This plugin has two clever features for testing copying tools. Firstly it creates random disks which have the same “shape” as virtual machine disk images (but without the overhead of needing to bother with an actual VM). Secondly it can act as both a source and target for testing copies.

Let’s unpack those two things a bit further.

Virtual machine disk images (especially mostly empty ones) are mostly sparse. Here’s part of the sparse map from a Fedora 32 disk image:

$ virt-builder fedora-32
$ filefrag -e fedora-32.img 
 Filesystem type is: 58465342
 File size of fedora-32.img is 6442450944 (1572864 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..       0:    2038672..   2038672:      1:            
    1:        1..      15:    2176040..   2176054:     15:    2038673:
    2:      256..     271:    2188819..   2188834:     16:    2176295:
    3:      512..    3135:    3650850..   3653473:   2624:    2189075:
    4:     3168..    4463:    3781763..   3783058:   1296:    3653506:

The new sparse-random plugin generates a disk image which has a similar shape — islands of random data in a sea of sparseness. The algorithm for doing this is quite neat. Because the plugin doesn’t need to store the data, unlike a real disk image, it can generate huge disk images (eg. a terabyte) while using almost no memory. We use a low-overhead, high-quality random number generator and are smart about seeds so that every run of sparse-random with the same seed produces identical output.

The other part of this plugin is how we can use it to test copying tools like nbdcopy and qemu-img convert. My idea was that the plugin could be used both as the source and the target of the copy:

$ nbdkit -U - sparse-random 1T --run ' nbdcopy "$uri" "$uri" '

Here we create a terabyte-sized sparse-random disk, and get nbdcopy to copy from the plugin to the plugin. On reads sparse-random supplies the sparseness and random data. On writes it checks if what is being written matches the content of the plugin, throwing -EIO errors if not. Assuming the copying tool is correctly handling errors, we can both validate the copying tool and benchmark it. And it works with qemu-img convert too:

$ nbdkit -U - sparse-random 1T --run ' qemu-img convert "$uri" "$uri" '

And now we can see which one is faster.

Try it, you may be surprised.

Leave a comment

Filed under Uncategorized

nbdkit 1.24, new data plugin features

nbdkit 1.24 was released on Thursday. It’s our flexible, fast network block device with loads of features. nbdkit-data-plugin, a plugin that lets you create test patterns from the command line gained some interesting new functionality:

$ nbdkit data ' ( 0x55 0xAA )*2048 '

This command worked before as a way to create a repeating test pattern in a disk image. A new feature is you can write a shell script snippet to generate the pattern instead:

$ nbdkit data ' <( while :; do printf "%04x" $((i++)); done ) [:2048] '

This command will create a pattern of characters “0 0 0 0 0 0 0 1 0 0 0 2 0 0 0 3 …” (truncated to 2048 bytes). We could turn that into a block device and display the contents:

# nbd-client localhost /dev/nbd0
# blockdev --getsize64 /dev/nbd0
# dd if=/dev/nbd0 | hexdump -C | head
4+0 records in
4+0 records out
2048 bytes (2.0 kB, 2.0 KiB) copied, 0.000167082 s, 12.3 MB/s
00000000  30 30 30 30 30 30 30 31  30 30 30 32 30 30 30 33  |0000000100020003|
00000010  30 30 30 34 30 30 30 35  30 30 30 36 30 30 30 37  |0004000500060007|
00000020  30 30 30 38 30 30 30 39  30 30 30 61 30 30 30 62  |00080009000a000b|
00000030  30 30 30 63 30 30 30 64  30 30 30 65 30 30 30 66  |000c000d000e000f|
00000040  30 30 31 30 30 30 31 31  30 30 31 32 30 30 31 33  |0010001100120013|
00000050  30 30 31 34 30 30 31 35  30 30 31 36 30 30 31 37  |0014001500160017|
00000060  30 30 31 38 30 30 31 39  30 30 31 61 30 30 31 62  |00180019001a001b|
00000070  30 30 31 63 30 30 31 64  30 30 31 65 30 30 31 66  |001c001d001e001f|
00000080  30 30 32 30 30 30 32 31  30 30 32 32 30 30 32 33  |0020002100220023|
00000090  30 30 32 34 30 30 32 35  30 30 32 36 30 30 32 37  |0024002500260027|
# nbd-client -d /dev/nbd0
# killall nbdkit

The data plugin also lets you read from files which is useful for making disks with random initial data. For example here’s how to create a disk with 16 identical sectors of random data (notice how /dev/random is read in, truncated to 512 bytes, and then 16 copies are made):

$ nbdkit data ' </dev/urandom[:512]*16 '

The plugin can also create sparse disks. You can do this just by moving the current offset using “@”:

$ nbdkit data ' @32768 1 ' --run 'nbdinfo --map "$uri"'
     0       32768    3  hole,zero
 32768           1    0  allocated

We use this plugin quite extensively when testing libnbd.

1 Comment

Filed under Uncategorized

Read and writing VMware .vmdk disks

(This is in answer to an IRC question, but the answer is a bit longer than I can cover in IRC)

Can you read and write at the block level in a .vmdk file? I think the questioner was asking about writing a backup/restore type tool. Using only free software, qemu can do reads. You can attach qemu-nbd to a vmdk file and that will expose the logical blocks as NBD, and you can then read at the block level using libnbd:

import nbd
h = nbd.NBD()
    ["qemu-nbd", "-t", "/var/tmp/disk.vmdk"])
print("size = %d" % h.get_size())
buf = h.pread(512, 0)
$ ./qemu-test.py 
size = 1073741824

The example is in Python, but libnbd would let you do this from C or other languages just as easily.

While this works fine for reading, I wouldn’t necessarily be sure that writing is safe. The vmdk format is complex, baroque and only lightly documented, and the only implementation I’d trust is the one from VMware.

So as long as you’re prepared to use a bit of closed source software and agree with the (nasty) license, VDDK is the safer choice. You can isolate your own software from VDDK using our nbdkit plugin.

import nbd
h = nbd.NBD()
    ["nbdkit", "-s", "--exit-with-parent",
     "vddk", "libdir=/var/tmp/vmware-vix-disklib-distrib",
print("size = %d" % h.get_size())
buf = h.pread(512, 0)
h.pwrite(buf, 512)

I quite like how we’re using small tools and assembling them together into a pipeline in just a few lines of code:

┌─────────┬────────┐          ┌─────────┬────────┐
│ your    │ libnbd │   NBD    │ nbdkit  │ VDDK   │
│ program │     ●──────────────➤        │        │
└─────────┴────────┘          └─────────┴────────┘

One advantage of this approach is that it exposes the extents in the disk which you can iterate over using libnbd APIs. For a backup tool this would let you save the disk efficiently, or do change-block tracking.

1 Comment

Filed under Uncategorized

Loop mount an S3 or Ceph object

This is a fun, small nbdkit Python plugin using the Boto3 AWS SDK:

#!/usr/sbin/nbdkit python

import nbdkit
import boto3
from contextlib import closing


def thread_model():
    return nbdkit.THREAD_MODEL_PARALLEL

def config(key, value):
    global access_key, secret_key, endpoint_url, bucket_name, key_name

    if key == "access-key" or key == "access_key":
        access_key = value
    elif key == "secret-key" or key == "secret_key":
        secret_key = value
    elif key == "endpoint-url" or key == "endpoint_url":
        endpoint_url = value
    elif key == "bucket":
        bucket_name = value
    elif key == "key":
        key_name = value
        raise Exception("unknown parameter %s" % key)

def open(readonly):
    global access_key, secret_key, endpoint_url

    s3 = boto3.client("s3",
                      aws_access_key_id = access_key,
                      aws_secret_access_key = secret_key,
                      endpoint_url = endpoint_url)
    if s3 is None:
        raise Exception("could not connect to S3")
    return s3

def get_size(s3):
    global bucket_name, key_name

    resp = s3.get_object(Bucket = bucket_name, Key = key_name)
    size = resp['ResponseMetadata']['HTTPHeaders']['content-length']
    return int(size)

def pread(s3, buf, offset, flags):
    global bucket_name, key_name

    size = len(buf)
    rnge = 'bytes=%d-%d' % (offset, offset+size-1)
    resp = s3.get_object(Bucket = bucket_name, Key = key_name, Range = rnge)
    body = resp['Body']
    with closing(body):
        buf[:] = body.read(size)

This lets you loop mount a single object (file):

$ ./nbdkit-S3-plugin -f -v -U /tmp/sock \
  access_key="XYZ" secret_key="XYZ" \
  bucket="my_files" key="fedora-28.iso"
$ sudo nbd-client -b 2048 -unix /tmp/sock /dev/nbd0
Negotiation: ..size = 583MB
$ ls /dev/nbd0
 nbd0    nbd0p1  nbd0p2  
$ sudo mount -o ro /dev/nbd0p1 /tmp/mnt
$ ls -l /tmp/mnt
 total 11
 dr-xr-xr-x. 3 root root 2048 Apr 25  2018 EFI
 -rw-r--r--. 1 root root 2532 Apr 23  2018 Fedora-Legal-README.txt
 dr-xr-xr-x. 3 root root 2048 Apr 25  2018 images
 drwxrwxr-x. 2 root root 2048 Apr 25  2018 isolinux
 -rw-r--r--. 1 root root 1063 Apr 21  2018 LICENSE
 -r--r--r--. 1 root root  454 Apr 25  2018 TRANS.TBL

I should note this is a bit different from s3fs which is a FUSE driver that mounts all the files in a bucket.

Leave a comment

Filed under Uncategorized

Ridiculously big “files”

In the last post I showed how you can combine nbdfuse with nbdkit’s RAM disk to mount a RAM disk as a local file. In a talk I gave at FOSDEM last year I described creating these absurdly large RAM-backed filesystems and you can do the same thing now to create ridiculously big “files”. Here’s a 7 exabyte file:

$ touch /var/tmp/disk.img
$ nbdfuse /var/tmp/disk.img --command nbdkit -s memory 7E &
$ ll /var/tmp/disk.img 
 -rw-rw-rw-. 1 rjones rjones 8070450532247928832 Nov  4 13:37 /var/tmp/disk.img
$ ls -lh /var/tmp/disk.img 
 -rw-rw-rw-. 1 rjones rjones 7.0E Nov  4 13:37 /var/tmp/disk.img

What can you actually do with this file, and more importantly does anything break? As in the talk, creating a Btrfs filesystem boringly just works. mkfs.ext4 spins using 100% of CPU. I let it go for 15 minutes but it seemed no closer to either succeeding or crashing. Emacs said:

File disk.img is large (7 EiB), really open? (y)es or (n)o or (l)iterally

and I was too chicken to find out what it would do if I really opened it.

I do wonder if there’s a DoS attack here if I leave this seemingly massive regular file lying around in a public directory.


Filed under Uncategorized

nbdkit Windows port contd.

We ported nbdkit to Windows. That port is now upstream and should appear in the next stable release (1.24). There is also a new native file plugin for Windows which supports Windows files and volumes, hole punching for sparse files, querying file sparseness, and efficient zeroing.


Filed under Uncategorized

nbdkit now ported to Windows

This week I ported nbdkit, our high performance plugin-based Network Block Device server, to Windows. Currently it’s not upstream but you can download the Windows branch from here.

There were several possible ways we could have done this including Cygwin which might have been easier, but in the end I did a port to the raw Win32/Winsock APIs. You can compile it on Fedora using the mingw-w64-based Fedora Windows Cross Compiler and run it using Wine. Familiar commands like this work:

$ ./nbdkit.exe -f -v memory 1G

Windows is such a trash pile of awful APIs it’s a wonder how anyone can use it. Like what, and huh and WTF having to split a 64 bit int across two fields in a struct?? Not to mention the whole mess which is HANDLEs vs SOCKETs vs file descriptors and errno handling in Winsock. But I got there in the end.

I got many existing plugins and filters compiled. Not all of those listed will be working, but the main features are fine. You can also write your own plugins to the same API as Linux ones. I’m hoping that someone can write a Windows block device plugin (especially one which integrates with features like VSS).

$ find \( -name '*.exe' -o -name '*.dll' \) -a -printf "%f\n" | sort -u


Filed under Uncategorized

nbdkit tar filter

nbdkit is our high performance liberally licensed Network Block Device server, and OVA files are a common pseudo-standard for exporting virtual machines including their disk images.

A .ova file is really an uncompressed tar file:

$ tar tf rhel.ova

Since tar files usually store their content unmangled, this opens an interesting possibility for reading (or even writing) the embedded disk image without needing to unpack the tar. You just have to work out the offset of the disk image within the tar file. virt-v2v has used this trick to save a copy when importing OVAs for years.

nbdkit has also included a tar plugin which can access a file inside a local tar file, but the problem is what if the tar file doesn’t happen to be a local file? (eg. It’s on a webserver). Or what if it’s compressed?

To fix this I’ve turned the plugin into a filter. Using nbdkit-tar-filter you can unpack even non-local compressed tar files:

$ nbdkit curl http://example.com/qcow2.tar.xz \
         --filter=tar --filter=xz tar-entry=disk.qcow2

(To understand how filters are stacked, see my FOSDEM talk from last year). Because in this example the disk inside the tarball is a qcow2 file, it appears as qcow2 on the wire, so:

$ guestfish --ro --format=qcow2 -a nbd://localhost

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

><fs> run
><fs> list-filesystems 
/dev/sda1: ext2
><fs> mount /dev/sda1 /
><fs> ll /
total 19
drwxr-xr-x   3 root root  1024 Jul  6 20:03 .
drwxr-xr-x  19 root root  4096 Jul  9 11:01 ..
-rw-rw-r--.  1 1000 1000    11 Jul  6 20:03 hello.txt
drwx------   2 root root 12288 Jul  6 20:03 lost+found

Leave a comment

Filed under Uncategorized

nbdkit with BitTorrent

nbdkit is our high performance Network Block Device server for serving disk images from unusual sources. One (usual) source for Linux installers is to download an ISO from a website like Get Fedora or debian.org. However that costs the host money and is also a central point of failure, so another way to download Linux installers is over BitTorrent. Many Linux distros offer torrents of their installers including Fedora and Debian. By using these you are helping to redistribute Linux and defraying the cost of hosting these ISOs.

Now I’ve written a BitTorrent plugin for nbdkit so you can download, redistribute and install Linux all at the same time!

$ url=https://torrent.fedoraproject.org/torrents/Fedora-Server-dvd-x86_64-32.torrent
$ wget $url
$ nbdkit -U - torrent Fedora-Server-*.torrent \
         --run 'qemu-system-x86_64 -m 2048 -cdrom $nbd -boot d'

So what’s the serious use for this? It has the interesting property that the more people who are installing your Linux distro, the less bandwidth it uses and the faster it runs! This could be interesting technology for any kind of distributed environment where you have lots of machines accessing the same fixed/read-only filesystem or disk image.

If you want to get started with nbdkit it’s already in all popular Linux distributions, and compiles from source on Linux, FreeBSD and OpenBSD.

Leave a comment

Filed under Uncategorized

Compressed RAM disks

There was a big discussion last week about whether zram swap should be the default in a future version of Fedora.

This lead me to think about the RAM disk implementation in nbdkit. In nbdkit up to 1.20 it supports giant virtual disks up to 8 exabytes using a sparse array implemented with a 2-level page table. However it’s still a RAM disk and so you can’t actually store more real data in these disks than you have available RAM (plus swap).

But what if we compressed the data? There are some fine, very fast compression libraries around nowadays — I’m using Facebook’s Zstandard — so the overhead of compression can be quite small, and this lets you make limited RAM go further.

So I implemented allocators for nbdkit ≥ 1.22, including:

$ nbdkit memory 1T allocator=zstd

Compression ratios can be really good. I tested this by creating a RAM disk and filling it with a filesystem containing text and source files, and was getting 10:1 compression. (Note that filesystems start with very regular, easily compressible metadata, so you’d expect this ratio to quickly drop if you filled the filesystem up with a lot of files).

The compression overhead is small, although the current nbdkit-memory-plugin isn’t very smart about locking so it has rather poor performance under multi-threaded loads anyway. (A fun little project to fix that for someone who loves pthread and C.)

I also implemented allocator=malloc which is a non-sparse direct-mapped RAM disk. This is simpler and a bit faster, but has rather obvious limitations compared to using the sparse allocator.


Filed under Uncategorized