nbdkit + xz + curl

I’ve submitted a talk about nbdkit, our flexible, pluggable NBD server, to FOSDEM next year about how you can use nbdkit as a replacement for loopback mounts (or “loop mounts” as I was told off for not calling them last week). In preparation for that talk I ran through it in private to a small Red Hat audience on Monday. If I can I will release that video some time, but I may have to edit out Red Hat “super-secret” stuff first (or most likely not because there aren’t any secrets in it, but I’m still waiting for the internal video to be released).

Anyway this attracted a lot of interest and one question that was asked was why the xz plugin which lets you transparently open and uncompress XZ files on the fly was a plugin at all. Surely it would make more sense for it to be a filter? So it could be used not just to uncompress local files, but also xz-compressed cloud images over HTTPS.

The answer is yes it would! So I fixed it. XZ is now a filter (the plugin is left around but we’ll deprecate it eventually).

You can use it on top of the file plugin, curl plugin or other plugins:

$ nbdkit --filter=xz file file.xz
$ nbdkit --filter=xz curl https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.raw.xz

This is fun and you can use this to boot the cloud image entirely remotely:

$ qemu-system-x86_64 -machine accel=kvm:tcg \
    -cpu host -m 2048 \
    -drive file=nbd:localhost:10809,if=virtio

However it’s incredibly slow. One problem is that the Fedora mirror sites aren’t very happy about you issuing lots of small HTTP Range requests and I observed that they throttle the connection quite aggressively. The second problem is that the xz block size for these cloud images is too large.

The XZ format (or rather, LZMA format) is divided into streams and blocks. We don’t normally use streams, and many XZ files use a single block. But it’s possible to tell the xz program to use a smaller than default block size, and in that case the output is divided into indexed blocks. Note the block size applies to the uncompressed input, the compressed blocks will have varying sizes, but the index that is created lets us find the block boundaries easily. When a byte is requested we can use a binary search to take us quickly to the compressed block, uncompress it (and cache it), and answer the request. We will only uncompress at most one block instead of the whole file.

For disk images I normally advocate a 16M block size. The current cloud images use (I think) a 192M block size, so both a huge amount of data has to be read over HTTPS to read one uncompressed byte, plus we have to cache very large blocks in RAM.

As an experiment I recompressed the cloud image using xz --block-size=$((16 * 1024 * 1024)) and hosted it locally, and booting is much quicker (albeit still slow because the cloud image contains cloud-init).

But even better we already ship a variety of disk images compressed with a 16M block size for virt-builder here, and these can be booted directly too:

$ nbdkit -U - --filter=xz curl \
        http://builder.libguestfs.org/fedora-29.xz \
        --run \
    'qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -m 2048 -drive file=$nbd,if=virtio'

… although you can’t log in because they all have locked root accounts (virt-builder normally customizes them after download).

2 Comments

Filed under Uncategorized

2 responses to “nbdkit + xz + curl

  1. Great!Do you think Nbdkit is ready to substitute iSCSI’s targetcli? Thanks!

  2. Pingback: libnbd + FUSE = nbdfuse | Richard WM Jones

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.