Tag Archives: cloud

nbdkit + xz + curl

I’ve submitted a talk about nbdkit, our flexible, pluggable NBD server, to FOSDEM next year about how you can use nbdkit as a replacement for loopback mounts (or “loop mounts” as I was told off for not calling them last week). In preparation for that talk I ran through it in private to a small Red Hat audience on Monday. If I can I will release that video some time, but I may have to edit out Red Hat “super-secret” stuff first (or most likely not because there aren’t any secrets in it, but I’m still waiting for the internal video to be released).

Anyway this attracted a lot of interest and one question that was asked was why the xz plugin which lets you transparently open and uncompress XZ files on the fly was a plugin at all. Surely it would make more sense for it to be a filter? So it could be used not just to uncompress local files, but also xz-compressed cloud images over HTTPS.

The answer is yes it would! So I fixed it. XZ is now a filter (the plugin is left around but we’ll deprecate it eventually).

You can use it on top of the file plugin, curl plugin or other plugins:

$ nbdkit --filter=xz file file.xz
$ nbdkit --filter=xz curl https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.raw.xz

This is fun and you can use this to boot the cloud image entirely remotely:

$ qemu-system-x86_64 -machine accel=kvm:tcg \
    -cpu host -m 2048 \
    -drive file=nbd:localhost:10809,if=virtio

However it’s incredibly slow. One problem is that the Fedora mirror sites aren’t very happy about you issuing lots of small HTTP Range requests and I observed that they throttle the connection quite aggressively. The second problem is that the xz block size for these cloud images is too large.

The XZ format (or rather, LZMA format) is divided into streams and blocks. We don’t normally use streams, and many XZ files use a single block. But it’s possible to tell the xz program to use a smaller than default block size, and in that case the output is divided into indexed blocks. Note the block size applies to the uncompressed input, the compressed blocks will have varying sizes, but the index that is created lets us find the block boundaries easily. When a byte is requested we can use a binary search to take us quickly to the compressed block, uncompress it (and cache it), and answer the request. We will only uncompress at most one block instead of the whole file.

For disk images I normally advocate a 16M block size. The current cloud images use (I think) a 192M block size, so both a huge amount of data has to be read over HTTPS to read one uncompressed byte, plus we have to cache very large blocks in RAM.

As an experiment I recompressed the cloud image using xz --block-size=$((16 * 1024 * 1024)) and hosted it locally, and booting is much quicker (albeit still slow because the cloud image contains cloud-init).

But even better we already ship a variety of disk images compressed with a 16M block size for virt-builder here, and these can be booted directly too:

$ nbdkit -U - --filter=xz curl \
        http://builder.libguestfs.org/fedora-29.xz \
        --run \
    'qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -m 2048 -drive file=$nbd,if=virtio'

… although you can’t log in because they all have locked root accounts (virt-builder normally customizes them after download).

Advertisements

1 Comment

Filed under Uncategorized

Tip: Updating RHEL 7.1 cloud images using virt-customize and subscription-manager

Red Hat provide RHEL KVM guest and cloud images. At time of writing, the last one was built in Feb 2015, and so undoubtedly contains packages which are out of date or insecure.

You can use virt-customize to update the packages in the cloud image. This requires the libguestfs subscription-manager feature which will only be available in RHEL 7.3, but see here for RHEL 7.3 preview packages. Alternatively you can use Fedora ≥ 22.

$ virt-customize \
  -a rhel-guest-image-7.1-20150224.0.x86_64.qcow2 \
  --sm-credentials 'USERNAME:password:PASSWORD' \
  --sm-register --sm-attach auto \
  --update
[   0.0] Examining the guest ...
[  17.2] Setting a random seed
[  17.2] Registering with subscription-manager
[  28.8] Attaching to compatible subscriptions
[  61.3] Updating core packages
[ 976.8] Finishing off
  1. You should probably use --sm-credentials USERNAME:file:FILENAME to specify your password using a file, rather than having it exposed on the command line.
  2. The command above will leave the image template registered to RHN. To unregister it, add --sm-unregister at the end.

3 Comments

Filed under Uncategorized

Mini Cloud/Cluster v2.0

Last year I wrote and rewrote a little command line tool for managing my virtualization cluster.

Of course I could use OpenStack RDO but OpenStack is a vast box of somewhat working bits and pieces. I think for a small cluster like mine you can get the essential functionality of OpenStack a lot more simply — in 1300 lines of code as it turns out.

The first thing that small cluster management software doesn’t need is any permanent daemon running on the nodes. The reason is that we already have sshd (for secure management access) and libvirtd (to manage the guests) out of the box. That’s quite sufficient to manage all the state we care about. My Mini Cloud/Cluster software just goes out and queries each node for that information whenever it needs it (in parallel of course). Nodes that are switched off are handled by ignoring them.

The second thing is that for a small cloud we can toss features that aren’t needed at all: multi-user/multi-tenant, failover, VLANs, a nice GUI.

The old mclu (Mini Cluster) v1.0 was written in Python and used Ansible to query nodes. If you’re not familiar with Ansible, it’s basically parallel ssh on steroids. This was convenient to get the implementation working, but I ended up rewriting this essential feature of Ansible in ~ 60 lines of code.

The huge down-side of Python is that even such a small program has loads of hidden bugs, because there’s no safety at all. The rewrite (in OCaml) is 1,300 lines of code, so a fraction larger, but I have a far higher confidence that it is mostly bug free.

I also changed around the way the software works to make it more “cloud like” (and hence the name change from “Mini Cluster” to “Mini Cloud”). Guests are now created from templates using virt-builder, and are stateless “cattle” (although you can mix in “pets” and mclu will manage those perfectly well because all it’s doing is remote libvirt-over-ssh commands).

$ mclu status
ham0                     on
                           total: 8pcpus 15.2G
                            used: 8vcpus 8.0G by 2 guest(s)
                            free: 6.2G
ham1                     on
                           total: 8pcpus 15.2G
                            free: 14.2G
ham2                     on
                           total: 8pcpus 30.9G
                            free: 29.9G
ham3                     off

You can grab mclu v2.0 from the git repository.

2 Comments

Filed under Uncategorized