nbdkit Windows port contd.

We ported nbdkit to Windows. That port is now upstream and should appear in the next stable release (1.24). There is also a new native file plugin for Windows which supports Windows files and volumes, hole punching for sparse files, querying file sparseness, and efficient zeroing.

Leave a comment

Filed under Uncategorized

nbdkit now ported to Windows

This week I ported nbdkit, our high performance plugin-based Network Block Device server, to Windows. Currently it’s not upstream but you can download the Windows branch from here.

There were several possible ways we could have done this including Cygwin which might have been easier, but in the end I did a port to the raw Win32/Winsock APIs. You can compile it on Fedora using the mingw-w64-based Fedora Windows Cross Compiler and run it using Wine. Familiar commands like this work:

$ ./nbdkit.exe -f -v memory 1G

Windows is such a trash pile of awful APIs it’s a wonder how anyone can use it. Like what, and huh and WTF having to split a 64 bit int across two fields in a struct?? Not to mention the whole mess which is HANDLEs vs SOCKETs vs file descriptors and errno handling in Winsock. But I got there in the end.

I got many existing plugins and filters compiled. Not all of those listed will be working, but the main features are fine. You can also write your own plugins to the same API as Linux ones. I’m hoping that someone can write a Windows block device plugin (especially one which integrates with features like VSS).

$ find \( -name '*.exe' -o -name '*.dll' \) -a -printf "%f\n" | sort -u
nbdkit-blocksize-filter.dll
nbdkit-cacheextents-filter.dll
nbdkit-cow-filter.dll
nbdkit-data-plugin.dll
nbdkit-ddrescue-filter.dll
nbdkit-delay-filter.dll
nbdkit-error-filter.dll
nbdkit-example1-plugin.dll
nbdkit.exe
nbdkit-exitlast-filter.dll
nbdkit-extentlist-filter.dll
nbdkit-file-plugin.dll
nbdkit-fua-filter.dll
nbdkit-full-plugin.dll
nbdkit-gzip-filter.dll
nbdkit-gzip-plugin.dll
nbdkit-info-plugin.dll
nbdkit-ip-filter.dll
nbdkit-limit-filter.dll
nbdkit-memory-plugin.dll
nbdkit-nocache-filter.dll
nbdkit-noextents-filter.dll
nbdkit-nofilter-filter.dll
nbdkit-noparallel-filter.dll
nbdkit-nozero-filter.dll
nbdkit-null-plugin.dll
nbdkit-offset-filter.dll
nbdkit-partition-filter.dll
nbdkit-partitioning-plugin.dll
nbdkit-pattern-plugin.dll
nbdkit-random-plugin.dll
nbdkit-rate-filter.dll
nbdkit-readahead-filter.dll
nbdkit-retry-filter.dll
nbdkit-split-plugin.dll
nbdkit-stats-filter.dll
nbdkit-swab-filter.dll
nbdkit-tls-fallback-filter.dll
nbdkit-truncate-filter.dll
nbdkit-zero-plugin.dll

5 Comments

Filed under Uncategorized

AMD Zen 2 laptop

AMD Zen 2 laptops are a thing, and they’re blazingly fast.

I just bought the HP Envy x360 which has a 6 core AMD Ryzen 5 4500U. Measuring some real world compiles it’s comfortably two and half times faster than my year old Intel-based Thinkpad T480s (which has 4 cores but 8 threads, and cost at least twice as much).

2 Comments

Filed under Uncategorized

nbdkit tar filter

nbdkit is our high performance liberally licensed Network Block Device server, and OVA files are a common pseudo-standard for exporting virtual machines including their disk images.

A .ova file is really an uncompressed tar file:

$ tar tf rhel.ova
rhel.ovf
rhel-disk1.vmdk
rhel.mf

Since tar files usually store their content unmangled, this opens an interesting possibility for reading (or even writing) the embedded disk image without needing to unpack the tar. You just have to work out the offset of the disk image within the tar file. virt-v2v has used this trick to save a copy when importing OVAs for years.

nbdkit has also included a tar plugin which can access a file inside a local tar file, but the problem is what if the tar file doesn’t happen to be a local file? (eg. It’s on a webserver). Or what if it’s compressed?

To fix this I’ve turned the plugin into a filter. Using nbdkit-tar-filter you can unpack even non-local compressed tar files:

$ nbdkit curl http://example.com/qcow2.tar.xz \
         --filter=tar --filter=xz tar-entry=disk.qcow2

(To understand how filters are stacked, see my FOSDEM talk from last year). Because in this example the disk inside the tarball is a qcow2 file, it appears as qcow2 on the wire, so:

$ guestfish --ro --format=qcow2 -a nbd://localhost

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

><fs> run
><fs> list-filesystems 
/dev/sda1: ext2
><fs> mount /dev/sda1 /
><fs> ll /
total 19
drwxr-xr-x   3 root root  1024 Jul  6 20:03 .
drwxr-xr-x  19 root root  4096 Jul  9 11:01 ..
-rw-rw-r--.  1 1000 1000    11 Jul  6 20:03 hello.txt
drwx------   2 root root 12288 Jul  6 20:03 lost+found

Leave a comment

Filed under Uncategorized

nbdkit with BitTorrent

nbdkit is our high performance Network Block Device server for serving disk images from unusual sources. One (usual) source for Linux installers is to download an ISO from a website like Get Fedora or debian.org. However that costs the host money and is also a central point of failure, so another way to download Linux installers is over BitTorrent. Many Linux distros offer torrents of their installers including Fedora and Debian. By using these you are helping to redistribute Linux and defraying the cost of hosting these ISOs.

Now I’ve written a BitTorrent plugin for nbdkit so you can download, redistribute and install Linux all at the same time!

$ url=https://torrent.fedoraproject.org/torrents/Fedora-Server-dvd-x86_64-32.torrent
$ wget $url
$ nbdkit -U - torrent Fedora-Server-*.torrent \
         --run 'qemu-system-x86_64 -m 2048 -cdrom $nbd -boot d'

So what’s the serious use for this? It has the interesting property that the more people who are installing your Linux distro, the less bandwidth it uses and the faster it runs! This could be interesting technology for any kind of distributed environment where you have lots of machines accessing the same fixed/read-only filesystem or disk image.

If you want to get started with nbdkit it’s already in all popular Linux distributions, and compiles from source on Linux, FreeBSD and OpenBSD.

Leave a comment

Filed under Uncategorized

Compressed RAM disks

There was a big discussion last week about whether zram swap should be the default in a future version of Fedora.

This lead me to think about the RAM disk implementation in nbdkit. In nbdkit up to 1.20 it supports giant virtual disks up to 8 exabytes using a sparse array implemented with a 2-level page table. However it’s still a RAM disk and so you can’t actually store more real data in these disks than you have available RAM (plus swap).

But what if we compressed the data? There are some fine, very fast compression libraries around nowadays — I’m using Facebook’s Zstandard — so the overhead of compression can be quite small, and this lets you make limited RAM go further.

So I implemented allocators for nbdkit ≥ 1.22, including:

$ nbdkit memory 1T allocator=zstd

Compression ratios can be really good. I tested this by creating a RAM disk and filling it with a filesystem containing text and source files, and was getting 10:1 compression. (Note that filesystems start with very regular, easily compressible metadata, so you’d expect this ratio to quickly drop if you filled the filesystem up with a lot of files).

The compression overhead is small, although the current nbdkit-memory-plugin isn’t very smart about locking so it has rather poor performance under multi-threaded loads anyway. (A fun little project to fix that for someone who loves pthread and C.)

I also implemented allocator=malloc which is a non-sparse direct-mapped RAM disk. This is simpler and a bit faster, but has rather obvious limitations compared to using the sparse allocator.

2 Comments

Filed under Uncategorized

nbdkit C script plugins

Should you want to, you can now write your nbdkit plugins like scripts, chmod +x plugin.c and run them …

#if 0
exec nbdkit cc "$0" "$@"
#endif
#include <stdint.h>
#include <string.h>
#include <nbdkit-plugin.h>

char data[100*1024*1024];

#define THREAD_MODEL NBDKIT_THREAD_MODEL_PARALLEL

static void *
my_open (int readonly)
{
  return NBDKIT_HANDLE_NOT_NEEDED;
}

static int64_t
my_get_size (void *handle)
{
  return (int64_t) sizeof (data);
}

static int
my_pread (void *handle, void *buf,
          uint32_t count, uint64_t offset,
          uint32_t flags)
{
  memcpy (buf, data+offset, count);
  return 0;
}

static int
my_pwrite (void *handle, const void *buf,
           uint32_t count, uint64_t offset,
           uint32_t flags)
{
  memcpy (data+offset, buf, count);
  return 0;
}

static struct nbdkit_plugin plugin = {
  .name              = "myplugin",
  .open              = my_open,
  .get_size          = my_get_size,
  .pread             = my_pread,
  .pwrite            = my_pwrite,
};

NBDKIT_REGISTER_PLUGIN(plugin)
$ chmod +x plugin.c
$ ./plugin.c

Leave a comment

Filed under Uncategorized

OCaml RISC-V port is now upstream!

https://github.com/ocaml/ocaml/commit/8f3833c4d0ef656c826359f4137c1eb3d46ea0ef

We’ve been using this patch in Fedora since Nov 2016.

Leave a comment

Filed under Uncategorized

Golang bindings for both libnbd and nbdkit

I have to say for full transparency up front that Golang is not my favourite programming language, even less after using it for a while. Nevertheless with a lot of help from Dan Berrangé we now have Golang bindings for libnbd and nbdkit which are respectively client and server software for the Linux Network Block Device protocol.

The Golang bindings for libnbd let you connect to a server and read and write from it. This is all pretty straightforward so read the manual page if you want to find out more.

The Golang bindings for nbdkit are considerably more interesting because you can use them to write pretty natural and high performance NBD servers to expose “interesting things”.

I’m hoping in particular there are interesting block device sources in the Kubernetes / Docker ecosystem which are probably only available from Golang that we could now expose to other software (although I’m also still researching this area so I don’t yet know what in particular).

You can make a complete Golang NBD server really easily now with only a few lines of code. Minus boilerplate, something like this is sufficient (see this link for complete working examples):

type MyPlugin struct {
	nbdkit.Plugin
}

type MyConnection struct {
	nbdkit.Connection
}

func (p *MyPlugin) Open(readonly bool) (nbdkit.ConnectionInterface, error) {
	return &MyConnection{}, nil
}

func (c *MyConnection) GetSize() (uint64, error) {
	return size, nil
}

func (c *MyConnection) PRead(buf []byte, offset uint64,
	flags uint32) error {
	copy(buf, ... from the source of your data here ...)
	return nil
}

func (c *MyConnection) CanWrite() (bool, error) {
	return true, nil
}

func (c *MyConnection) PWrite(buf []byte, offset uint64,
	flags uint32) error {
	copy(... to the data source here ..., buf)
	return nil
}

Editor note: In an earlier version of these bindings we passed the whole struct to each callback rather than a pointer, hence James’s first comment below.

3 Comments

Filed under Uncategorized

New nbdkit “remote tmpfs” (tmpdisk plugin)

I was making some thin clients for the Fedora RISC-V project a few weeks ago. These are based on the HiFive Unleashed U540 board and so they have no local SATA, only slow, unreliable SD cards. Any filesystems that might be heavily used must be network filesystems.

As these are Linux clients we essentially have three possible choices for network filesystems: NFS, NBD or a cluster FS. As I didn’t want to set up multiple server nodes or need the redundancy, cluster filesystems are immediately discounted. NFS is — “fine”. And indeed I selected that for /home where performance is not a problem. But for the clients that are build servers I selected NBD as a high-performance block-based storage. I’m using nbdkit, the high performance flexible NBD server.

However nbdkit didn’t quite do what I wanted, so I had to write a new plugin. I needed a “remote tmpfs“.

To get a new “remote tmpfs”, a fresh filesystem each time, the client does:

modprobe nbd
nbd-client server /dev/nbd0
mount /dev/nbd0 /var/scratch

Backing this is a flexible new plugin called tmpdisk. By default it creates a new disk for each connection, formats it with mkfs and serves it to the client. But it’s also scriptable, so you can substitute any command you like instead of mkfs to create your own custom disks (I recommend looking at mke2fs -d in case you need to have pre-populated scratch disks).

It’s important to note that these are scratch disks so when the client unmounts the filesystem it is completely deleted from the server. But that’s fine for temporary directories and (for my purposes) package builds.

To find out more about nbdkit, watch my video from FOSDEM 2019.

2 Comments

Filed under Uncategorized