Tag Archives: disk image

nbdkit 1.24 & libnbd 1.6, new copying tool

As well as nbdkit 1.24 being released on Thursday, its sister project libnbd 1.6 was released at the same time. This comes with an enhanced copying tool called nbdcopy designed to replace some uses of qemu-img convert (note: it’s not a general replacement).

nbdcopy lets you copy from and to NBD servers (nbdkit, qemu-nbd, qemu-storage-daemon, nbd-server), local files, local block devices, pipes/sockets, and stdin/stdout. For example to stream the content of an NBD server:

$ nbdcopy nbd://localhost - | hexdump -C

The “-” character streams to stdout. nbd://localhost is an NBD URI referring to an NBD server that is already running. What if you don’t have an already running server? nbdcopy lets you run one from the command line (and cleans up after). For example this is one way to convert a qcow2 file to raw:

$ nbdcopy -- [ qemu-nbd -f qcow2 disk.qcow ] disk.raw

Here the [ ... ] section starts qemu-nbd as a captive NBD server, exposing privately an NBD endpoint, and nbdcopy copies this to local file disk.raw. (“--” is needed to stop nbdcopy trying to interpret qemu-nbd’s own command line arguments.)

However this post is really about the nbdkit release. How did I test and benchmark nbdcopy? Of course I wrote an nbdkit plugin called nbdkit-sparse-random-plugin. This plugin has two clever features for testing copying tools. Firstly it creates random disks which have the same “shape” as virtual machine disk images (but without the overhead of needing to bother with an actual VM). Secondly it can act as both a source and target for testing copies.

Let’s unpack those two things a bit further.

Virtual machine disk images (especially mostly empty ones) are mostly sparse. Here’s part of the sparse map from a Fedora 32 disk image:

$ virt-builder fedora-32
$ filefrag -e fedora-32.img 
 Filesystem type is: 58465342
 File size of fedora-32.img is 6442450944 (1572864 blocks of 4096 bytes)
  ext:     logical_offset:        physical_offset: length:   expected: flags:
    0:        0..       0:    2038672..   2038672:      1:            
    1:        1..      15:    2176040..   2176054:     15:    2038673:
    2:      256..     271:    2188819..   2188834:     16:    2176295:
    3:      512..    3135:    3650850..   3653473:   2624:    2189075:
    4:     3168..    4463:    3781763..   3783058:   1296:    3653506:

The new sparse-random plugin generates a disk image which has a similar shape — islands of random data in a sea of sparseness. The algorithm for doing this is quite neat. Because the plugin doesn’t need to store the data, unlike a real disk image, it can generate huge disk images (eg. a terabyte) while using almost no memory. We use a low-overhead, high-quality random number generator and are smart about seeds so that every run of sparse-random with the same seed produces identical output.

The other part of this plugin is how we can use it to test copying tools like nbdcopy and qemu-img convert. My idea was that the plugin could be used both as the source and the target of the copy:

$ nbdkit -U - sparse-random 1T --run ' nbdcopy "$uri" "$uri" '

Here we create a terabyte-sized sparse-random disk, and get nbdcopy to copy from the plugin to the plugin. On reads sparse-random supplies the sparseness and random data. On writes it checks if what is being written matches the content of the plugin, throwing -EIO errors if not. Assuming the copying tool is correctly handling errors, we can both validate the copying tool and benchmark it. And it works with qemu-img convert too:

$ nbdkit -U - sparse-random 1T --run ' qemu-img convert "$uri" "$uri" '

And now we can see which one is faster.

Try it, you may be surprised.

Leave a comment

Filed under Uncategorized

Read and writing VMware .vmdk disks

(This is in answer to an IRC question, but the answer is a bit longer than I can cover in IRC)

Can you read and write at the block level in a .vmdk file? I think the questioner was asking about writing a backup/restore type tool. Using only free software, qemu can do reads. You can attach qemu-nbd to a vmdk file and that will expose the logical blocks as NBD, and you can then read at the block level using libnbd:

import nbd
h = nbd.NBD()
    ["qemu-nbd", "-t", "/var/tmp/disk.vmdk"])
print("size = %d" % h.get_size())
buf = h.pread(512, 0)
$ ./qemu-test.py 
size = 1073741824

The example is in Python, but libnbd would let you do this from C or other languages just as easily.

While this works fine for reading, I wouldn’t necessarily be sure that writing is safe. The vmdk format is complex, baroque and only lightly documented, and the only implementation I’d trust is the one from VMware.

So as long as you’re prepared to use a bit of closed source software and agree with the (nasty) license, VDDK is the safer choice. You can isolate your own software from VDDK using our nbdkit plugin.

import nbd
h = nbd.NBD()
    ["nbdkit", "-s", "--exit-with-parent",
     "vddk", "libdir=/var/tmp/vmware-vix-disklib-distrib",
print("size = %d" % h.get_size())
buf = h.pread(512, 0)
h.pwrite(buf, 512)

I quite like how we’re using small tools and assembling them together into a pipeline in just a few lines of code:

┌─────────┬────────┐          ┌─────────┬────────┐
│ your    │ libnbd │   NBD    │ nbdkit  │ VDDK   │
│ program │     ●──────────────➤        │        │
└─────────┴────────┘          └─────────┴────────┘

One advantage of this approach is that it exposes the extents in the disk which you can iterate over using libnbd APIs. For a backup tool this would let you save the disk efficiently, or do change-block tracking.

1 Comment

Filed under Uncategorized

Golang bindings for both libnbd and nbdkit

I have to say for full transparency up front that Golang is not my favourite programming language, even less after using it for a while. Nevertheless with a lot of help from Dan Berrangé we now have Golang bindings for libnbd and nbdkit which are respectively client and server software for the Linux Network Block Device protocol.

The Golang bindings for libnbd let you connect to a server and read and write from it. This is all pretty straightforward so read the manual page if you want to find out more.

The Golang bindings for nbdkit are considerably more interesting because you can use them to write pretty natural and high performance NBD servers to expose “interesting things”.

I’m hoping in particular there are interesting block device sources in the Kubernetes / Docker ecosystem which are probably only available from Golang that we could now expose to other software (although I’m also still researching this area so I don’t yet know what in particular).

You can make a complete Golang NBD server really easily now with only a few lines of code. Minus boilerplate, something like this is sufficient (see this link for complete working examples):

type MyPlugin struct {

type MyConnection struct {

func (p *MyPlugin) Open(readonly bool) (nbdkit.ConnectionInterface, error) {
	return &MyConnection{}, nil

func (c *MyConnection) GetSize() (uint64, error) {
	return size, nil

func (c *MyConnection) PRead(buf []byte, offset uint64,
	flags uint32) error {
	copy(buf, ... from the source of your data here ...)
	return nil

func (c *MyConnection) CanWrite() (bool, error) {
	return true, nil

func (c *MyConnection) PWrite(buf []byte, offset uint64,
	flags uint32) error {
	copy(... to the data source here ..., buf)
	return nil

Editor note: In an earlier version of these bindings we passed the whole struct to each callback rather than a pointer, hence James’s first comment below.


Filed under Uncategorized

New nbdkit data strings

You can use nbdkit, our infinitely flexible Network Block Device server to serve small disks and test images with the nbdkit data plugin. For example you can cut and paste this command into your shell to demonstrate a bootable disk image which prints “hello, world”:

nbdkit data data='
    0xb4 0 0xb0 3 0xcd 0x10 0xb4 0x13
    0xb3 0x0a 0xb0 1 0xb9 0x0e 0 0xb6
    0 0xb2 0 0xbd 0x19 0x7c 0xcd 0x10
    0xf4 0x68 0x65 0x6c 0x6c 0x6f 0x2c 0x20
    0x77 0x6f 0x72 0x6c 0x64 0x0d 0x0a
    @0x1fe 0x55 0xaa
' --run 'qemu-system-i386 -fda $nbd'

(As an aside, what is the smallest nbdkit data string that can boot to a “hello, world” message?)

The data parameter is a mini-language, and I recently extended it in an interesting way. It wasn’t possible to make repeated patterns easily before. If you wanted a disk containing 0x55 0xAA repeated (the binary bit patterns 01010101 10101010) then the only way to get that was to literally write:

nbdkit data data='0x55 0xAA 0x55 0xAA [repeated many times ...]'

but now you can group things together and write:

nbdkit data data='( 0x55 0xAA )*256'

The nesting works by recursively creating a new parser, which means you can use any data expression. For example to get 4 sectors containing half blank and half test data you can now do:

nbdkit data data='( @256 ( 0x55 0xAA )*128 )*4'

This gives you lots of way to make disks containing test patterns which you could then use to test Linux programs using /dev/nbd0 loop devices.

1 Comment

Filed under Uncategorized

New in nbdkit: Create a virtual floppy disk

nbdkit is our flexible, plug-in based Network Block Device server.

While I was visiting the KVM Forum last week, one of the most respected members of the QEMU development team mentioned to me that he wanted to think about deprecating QEMU’s VVFAT driver. This QEMU driver is a bit of an oddity — it lets you point QEMU to a directory of files, and inside the guest it will see a virtual floppy containing those files:

$ qemu -drive file=fat:/some/directory

That’s not the odd thing. The odd thing is that it also lets you make the drive writable, and the VVFAT driver then turns those writes back into modifications to the host filesystem (remember that these are writes happening to raw FAT32 data structures, the driver has to infer from just seeing the writes what is happening at the filesystem level). Which is both amazing and crazy (and also buggy).

Anyway I have implemented the read-only part of this in nbdkit. I didn’t implement the write stuff because that’s very ambitious, although if you were going to implement that, doing it in nbdkit would be better than qemu since the only thing that can crash is nbdkit, not the whole hypervisor.

Usage is very simple:

$ nbdkit floppy /some/directory

This gives you an NBD source which you can connect straight to a qemu virtual machine:

$ qemu -drive nbd:localhost:10809

or examine with guestfish:

$ guestfish --ro --format=raw -a nbd://localhost -m /dev/sda1
Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

> ll /
total 2420
drwxr-xr-x 14 root root  16384 Jan  1  1970 .
drwxr-xr-x 19 root root   4096 Oct 28 10:07 ..
-rwxr-xr-x  1 root root     40 Sep 17 21:23 .dir-locals.el
-rwxr-xr-x  1 root root    879 Oct 27 21:10 .gdb_history
drwxr-xr-x  8 root root  16384 Oct 28 10:05 .git
-rwxr-xr-x  1 root root   1383 Sep 17 21:23 .gitignore
-rwxr-xr-x  1 root root   1453 Sep 17 21:23 LICENSE
-rwxr-xr-x  1 root root  34182 Oct 28 10:04 Makefile
-rwxr-xr-x  1 root root   2568 Oct 27 22:17 Makefile.am
-rwxr-xr-x  1 root root  32085 Oct 27 22:18 Makefile.in
-rwxr-xr-x  1 root root    620 Sep 17 21:23 OTHER_PLUGINS
-rwxr-xr-x  1 root root   4628 Oct 16 22:36 README
-rwxr-xr-x  1 root root   4007 Sep 17 21:23 TODO
-rwxr-xr-x  1 root root  54733 Oct 27 22:18 aclocal.m4
drwxr-xr-x  2 root root  16384 Oct 27 22:18 autom4te.cache
drwxr-xr-x  2 root root  16384 Oct 28 10:04 bash
drwxr-xr-x  5 root root  16384 Oct 27 18:07 common

Previously … create ISO images on the fly in nbdkit

Leave a comment

Filed under Uncategorized

nbdkit for loopback pt 6: giant file-backed disks for testing

In part 1 and part 5 of this series I created some giant disks with a virtual size of 263-1 bytes (8 exabytes). However these were stored in memory using nbdkit-memory-plugin so you could never allocate more space in these disks than available RAM plus swap.

This is a problem when testing some filesystems because the filesystem overhead (the space used to store superblocks, inode tables, block free maps and so on) can be 1% or more.

The solution to this is to back the virtual disks using a sparse file instead. XFS lets you create sparse files up to 263-1 bytes and you can serve them using nbdkit-file-plugin instead:

$ rm -f temp
$ truncate -s $(( 2**63 - 1 )) temp
$ stat -c %s temp
$ nbdkit file file=temp

nbdkit-file-plugin recently got a lot of updates to ensure it always maintains sparseness where possible and supports efficient zeroing, so make sure you’re using at least nbdkit ≥ 1.6.

Now you can serve this in the ordinary way and you should be able to allocate as much space as is available on the host filesystem:

# nbd-client -b 512 localhost /dev/nbd0
Negotiation: ..size = 8796093022207MB
Connected /dev/nbd0
# blockdev --getsize64 /dev/nbd0
# sgdisk -n 1 /dev/nbd0
# gdisk -l /dev/nbd0
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048  18014398509481948   8.0 EiB     8300

This command will still probably fail unless you have a lot of patience and a huge amount of space on your host:

# mkfs.xfs -K /dev/nbd0p1

Leave a comment

Filed under Uncategorized

Partitioning a 7 exabyte disk

In the latest nbdkit (and at the time of writing you will need nbdkit from git) you can type this magical incantation:

nbdkit data data="
       @0x1c0 2 0 0xee 0xfe 0xff 0xff 0x01 0  0 0 0xff 0xff 0xff 0xff
       @0x1fe 0x55 0xaa
       @0x200 0x45 0x46 0x49 0x20 0x50 0x41 0x52 0x54
                     0 0 1 0 0x5c 0 0 0
              0x9b 0xe5 0x6a 0xc5 0 0 0 0  1 0 0 0 0 0 0 0
              0xff 0xff 0xff 0xff 0xff 0xff 0x37 0  0x22 0 0 0 0 0 0 0
              0xde 0xff 0xff 0xff 0xff 0xff 0x37 0
                     0x72 0xb6 0x9e 0x0c 0x6b 0x76 0xb0 0x4f
              0xb3 0x94 0xb2 0xf1 0x61 0xec 0xdd 0x3c  2 0 0 0 0 0 0 0
              0x80 0 0 0 0x80 0 0 0  0x79 0x8a 0xd0 0x7e 0 0 0 0
       @0x400 0xaf 0x3d 0xc6 0x0f 0x83 0x84 0x72 0x47
                     0x8e 0x79 0x3d 0x69 0xd8 0x47 0x7d 0xe4
              0xd5 0x19 0x46 0x95 0xe3 0x82 0xa8 0x4c
                     0x95 0x82 0x7a 0xbe 0x1c 0xfc 0x62 0x90
              0x80 0 0 0 0 0 0 0  0x80 0xff 0xff 0xff 0xff 0xff 0x37 0
              0 0 0 0 0 0 0 0  0x70 0 0x31 0 0 0 0 0
              0xaf 0x3d 0xc6 0x0f 0x83 0x84 0x72 0x47
                     0x8e 0x79 0x3d 0x69 0xd8 0x47 0x7d 0xe4
              0xd5 0x19 0x46 0x95 0xe3 0x82 0xa8 0x4c
                     0x95 0x82 0x7a 0xbe 0x1c 0xfc 0x62 0x90
              0x80 0 0 0 0 0 0 0  0x80 0xff 0xff 0xff 0xff 0xff 0x37 0
              0 0 0 0 0 0 0 0  0x70 0 0x31 0 0 0 0 0
              0x45 0x46 0x49 0x20 0x50 0x41 0x52 0x54
                     0 0 1 0 0x5c 0 0 0
              0x6c 0x76 0xa1 0xa0 0 0 0 0
                     0xff 0xff 0xff 0xff 0xff 0xff 0x37 0
              1 0 0 0 0 0 0 0  0x22 0 0 0 0 0 0 0
              0xde 0xff 0xff 0xff 0xff 0xff 0x37 0
                     0x72 0xb6 0x9e 0x0c 0x6b 0x76 0xb0 0x4f
              0xb3 0x94 0xb2 0xf1 0x61 0xec 0xdd 0x3c
                     0xdf 0xff 0xff 0xff 0xff 0xff 0x37 0
              0x80 0 0 0 0x80 0 0 0  0x79 0x8a 0xd0 0x7e 0 0 0 0
" size=7E

When nbdkit starts up you can connect to it in a few ways. If you have a qemu virtual machine running an installed operating system, attach a second NBD drive. On the command line that would look like this:

$ qemu-system-x86_64 ... -file drive=nbd:localhost:10809,if=virtio

Or you could use guestfish:

$ guestfish --format=raw -a nbd://localhost
><fs> run

What this creates is a 7 exabyte disk with a single, empty GPT partition.

7 exabytes is a lot. It’s 8,070,450,532,247,928,832 bytes, or about 7 billion gigabytes. In fact even with ever increasing storage capacities in hard disk drives it’ll be a very long time before we get exabyte drives.

Peculiar things happen when you try to use this disk in Linux. For sure the kernel has no problem finding the partition, creating a /dev/sda1 device, and returning the right size. Ext4 has a maximum filesystem size of merely 1 exabyte so it won’t even try to make a filesystem, and on my laptop trying to write an XFS filesystem on the partition just caused qemu to grind away at 200% CPU making no apparent progress even after many minutes.

Why not throw your own favourite disk analysis tools at this image and see what they make of it.

Finally how did I create the magic command line above?

I used the nbdkit memory plugin to make an empty 7 EB disk. Note this requires a recent version of the plugin which was rewritten with support for sparse arrays.

$ nbdkit memory size=7E

Then I could connect to it with guestfish to create the GPT partition:

$ guestfish --format=raw -a nbd://localhost
><fs> run
><fs> part-disk /dev/sda gpt

GPT uses a partition table at the beginning and end of the disk. So – still in guestfish – I could sample what the partitioning tool had written to both ends of the disk:

><fs> pread-device /dev/sda 1M 0 | cat > start
><fs> pread-device /dev/sda 1M 8070450532246880256 | cat > end

I then used hexdump + manual inspection of the hexdump output to write the long data string:

$ hexdump -C start
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001c0  02 00 ee fe ff ff 01 00  00 00 ff ff ff ff 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|

translates to …

@0x1c0 2 0 0xee 0xfe 0xff 0xff 0x01 0  0 0 0xff 0xff 0xff 0xff
@0x1fe 0x55 0xaa

1 Comment

Filed under Uncategorized

Tip: guestmount (FUSE mount) every filesystem in a disk image

Maxim asks an interesting question which is if you’ve got a disk image, how do you mount every filesystem onto your host. Like this:

$ ./fs-mount.pl rhel-5.11.img /tmp/fs &
$ cd /tmp/fs
/tmp/fs$ ls
/tmp/fs$ cd dev
/tmp/fs/dev$ ls
sda1  sda2  sda3
/tmp/fs/dev$ cd sda2
/tmp/fs/dev/sda2$ ls
bin   dev  home  lib64       media  mnt  proc  sbin     srv  tmp  var
boot  etc  lib   lost+found  misc   opt  root  selinux  sys  usr
$ cd /tmp
$ guestunmount /tmp/fs

The answer is this surprisingly short Perl script.


use warnings;
use strict;

use Sys::Guestfs;

die "usage: $0 disk1 [disk2 ...] mountpoint\n" if @ARGV <= 1;

my $mp = pop;

my $g = Sys::Guestfs->new ();
foreach (@ARGV) {
    $g->add_drive ($_);
$g->launch ();

# Examine the filesystems.
my %fses = $g->list_filesystems ();

# Create the mountpoint directories (in the libguestfs namespace)
# and mount the filesystems on them.
foreach my $fs (sort keys %fses) {
    # mkmountpoint is really the same as mkdir.  Unfortunately there
    # is no 'mkdir -p' equivalent, so we have to do this instead:
    my @components = split ("/", $fs);
    for (my $i = 1; $i < @components; ++$i) {
        my $dir = "/" . join ("/", @components[1 .. $i]);
        eval { $g->mkmountpoint ($dir) }

    # Don't fail if the filesystem can't be mounted, eg. it's swap.
    eval { $g->mount ($fs, $fs) }

# Export the filesystem on the host.
$g->mount_local ($mp);
$g->mount_local_run ();

# Close nicely since we mounted everything writable.
$g->shutdown ();
$g->close ();

Leave a comment

Filed under Uncategorized

Importing KVM guests to oVirt or RHEV

One of the tools I maintain is virt-v2v. It’s a program to import guests from foreign hypervisors like VMware and Xen, to KVM. It only does conversions to KVM, not the other way. And a feature I intentionally removed in RHEL 7 was importing KVM → KVM.

Why would you want to “import” KVM → KVM? Well, no reason actually. In fact it’s one of those really bad ideas for V2V. However it used to have a useful purpose: oVirt/RHEV can’t import a plain disk image, but virt-v2v knows how to import things to oVirt, so people used virt-v2v as backdoor for this missing feature.

Removing this virt-v2v feature has caused a lot of moaning, but I’m adamant it’s a very bad idea to use virt-v2v as a way to import disk images. Virt-v2v does all sorts of complex filesystem and Windows Registry manipulations, which you don’t want and don’t need if your guest already runs on KVM. Worst case, you could even end up breaking your guest.

However I have now written a replacement script that does the job: http://git.annexia.org/?p=import-to-ovirt.git

If your guest is a disk image that already runs on KVM, then you can use this script to import the guest. You’ll need to clone the git repo, read the README file, and then read the tool’s man page. It’s pretty straightforward.

There are a few shortcomings with this script to be aware of:

  1. The guest must have virtio drivers installed already, and must be able to boot off virtio-blk (default) or virtio-scsi. For virtio-scsi, you’ll need to flip the checkbox in the ‘Advanced’ section of the guest parameters in the oVirt UI.
  2. It should be possible to import guests that don’t have virtio drivers installed, but can use IDE. This is a missing feature (patches welcome).
  3. No network card is added to the guest, so it probably won’t have network when it boots. It should be possible to add a network card through the UI, but really this is something that needs to be fixed in the script (patches welcome).
  4. It doesn’t handle all the random packaging formats that guests come in, like OVA. You’ll have to extract these first and import just the disk image.
  5. It’s not in any way supported or endorsed by Red Hat.


Filed under Uncategorized

Tip: compress raw disk images using qcow2

$ qemu-img convert -c -f raw -O qcow2 win.img winq.img
$ ls -lh win*
-rw-r--r--. 1 root   root    10G May 18 14:34 win.img
-rw-r--r--. 1 rjones rjones 6.5G May 18 14:59 winq.img

Of course the degree of compression you get depends on the amount of zeroed free space in the image, and the amount by which qcow2 is able to compress the other blocks containing data.

qcow2 uses zlib for compression, so the compression won’t be that spectacular. It’s better to keep the filesystems “sparse” in the first place, by ensuring unused disk blocks are zeroed.

For ext2/3 filesystems, Fedora ships a utility called zerofree, which you can either run inside the guest, or run offline from guestfish. This turns unused filesystem blocks into zeroes, which will make outside compression eg with qcow2 much more efficient. For other filesystems, the usual trick is to create a large file of all zeroes until you fill up the free space, then delete it.

qcow2 files are completely interchangeable with raw disk images:

$ virt-df -h win.img
Filesystem                                Size       Used  Available  Use%
win.img:/dev/vda1                       100.0M      24.1M      75.9M   25%
win.img:/dev/vda2                         9.9G       7.4G       2.5G   75%
$ virt-df -h winq.img
Filesystem                                Size       Used  Available  Use%
winq.img:/dev/vda1                      100.0M      24.1M      75.9M   25%
winq.img:/dev/vda2                        9.9G       7.4G       2.5G   75%


Filed under Uncategorized