Great new changes coming to nbdkit

Eric Blake has been doing some great stuff for nbdkit, the flexible plugin-based NBD server.

  • Full parallel request handling.
    You’ve always been able to tell nbdkit that your plugin can handle multiple requests in parallel from a single client, but until now that didn’t actually do anything (only parallel requests from multiple clients worked).
  • An NBD forwarding plugin, so if you have another NBD server which doesn’t support a feature like encryption or new-style protocol, then you can front that server with nbdkit which does.

As well as that he’s fixed lots of small bugs with NBD compliance so hopefully we’re now much closer to the protocol spec (we always check that we interoperate with qemu’s nbd client, but it’s nice to know that we’re also complying with the spec). He also fixed a potential DoS where nbdkit would try to handle very large writes which would delay a thread in the server indefinitely.


Also this week, I wrote an nbdkit plugin for handling the weird Xen XVA file format. The whole thread is worth reading because 3 people came up with 3 unique solutions to this problem.

Advertisements

Leave a comment

Filed under Uncategorized

Fedora 27 virt-builder images

Fedora 27 has just been released, and I’ve just uploaded virt-builder images so you can try it right away:

$ virt-builder -l | grep fedora-27
fedora-27                aarch64    Fedora® 27 Server (aarch64)
fedora-27                armv7l     Fedora® 27 Server (armv7l)
fedora-27                i686       Fedora® 27 Server (i686)
fedora-27                ppc64      Fedora® 27 Server (ppc64)
fedora-27                ppc64le    Fedora® 27 Server (ppc64le)
fedora-27                x86_64     Fedora® 27 Server
$ virt-builder fedora-27 \
      --root-password password:123456 \
      --install emacs \
      --selinux-relabel \
      --size 30G
$ qemu-system-x86_64 \
      -machine accel=kvm:tcg \
      -cpu host -m 2048 \
      -drive file=fedora-27.img,format=raw,if=virtio &

Leave a comment

Filed under Uncategorized

nbdkit finally supports TLS (encryption)

nbdkit is a liberally licensed NBD server which a stable plugin API for serving disks from unconventional sources.

Finally I got around to adding TLS (encryption and authentication) support. The support is complete and appears to interoperate with QEMU. It also supports a certificate authority, client certificate verification, certificate revocation, server verification (by the client), and configurable algorithms.

Actually using TLS with NBD is no easy matter. It takes a few pages of instructions just to explain how to set up the public-key infrastructure. On the client (QEMU) side, the command line parameter for connecting to a TLS-enabled NBD server is lengthy.

Then there’s the question of how you ensure TLS is being used. In nbdkit as in other NBD servers you can either turn on TLS in which case it’s used when the client requests it, or you can require TLS. In the latter case nbdkit will reject non-TLS connections (thus ensuring TLS is really being used), but most clients won’t be able to connect to such a server.

As usual, where SSH got it right, SSL/TLS/HTTPS got it all horribly wrong.

7 Comments

Filed under Uncategorized

Tip: Changing the qemu product name in libguestfs

20:30 < koike> Hi. Is it possible to configure the dmi codes for libguestfs? I mean, I am running cloud-init inside a libguestfs session (through python-guestfs) in GCE, the problem is that cloud-init reads /sys/class/dmi/id/product_name to determine if the machine is a GCE machine, but the value it read is Standard PC (i440FX + PIIX, 1996) instead of the expected Google Compute Engine so cloud-init fails.

The answer is yes, using the guestfs_config API that lets you set arbitrary qemu parameters:

g.config('-smbios',
         'type=1,product=Google Compute Engine')

Leave a comment

Filed under Uncategorized

Downloading all the 78rpm rips at the Internet Archive

I’m a bit of a fan of 1930s popular music on gramophone records, so much so that I own an original early-30s gramophone player and an extensive collection of discs. So the announcement that the Internet Archive had released a collection of 29,000 records was pretty amazing.

[Edit: If you want a light introduction to this, I recommend this double CD]

I wanted to download it … all!

But apart from this gnomic explanation it isn’t obvious how, so I had to work it out. Here’s how I did it …

Firstly you do need to start with the Advanced Search form. Using the second form on that page, in the query box put collection:georgeblood, select the identifier field (only), set the format to CSV. Set the limit to 30000 (there are about 25000+ records), and download the huge CSV:

$ ls -l search.csv
-rw-rw-r--. 1 rjones rjones 2186375 Aug 14 21:03 search.csv
$ wc -l search.csv
25992 search.csv
$ head -5 search.csv
"identifier"
"78_jeannine-i-dream-of-you-lilac-time_bar-harbor-society-orch.-irving-kaufman-shilkr_gbia0010841b"
"78_a-prisoners-adieu_jerry-irby-modern-mountaineers_gbia0000549b"
"78_if-i-had-the-heart-of-a-clown_bobby-wayne-joe-reisman-rollins-nelson-kane_gbia0004921b"
"78_how-many-times-can-i-fall-in-love_patty-andrews-and-tommy-dorsey-victor-young-an_gbia0013066b"

A bit of URL exploration found a fairly straightforward way to turn those identifiers into directory listings. For example:

78_jeannine-i-dream-of-you-lilac-time_bar-harbor-society-orch.-irving-kaufman-shilkr_gbia0010841bhttps://archive.org/download/78_jeannine-i-dream-of-you-lilac-time_bar-harbor-society-orch.-irving-kaufman-shilkr_gbia0010841b

What I want to do is pick the first MP3 file in the directory and download it. I’m not fussy about how to do that, and Python has both a CSV library and an HTML fetching library. This turns the CSV file of links into a list of MP3 URLs. You could easily adapt this to download FLAC files instead.

#!/usr/bin/python

import csv
import re
import urllib2
import urlparse
from BeautifulSoup import BeautifulSoup

with open('search.csv', 'rb') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',', quotechar='"')
    for row in csvreader:
        if row[0] == "identifier":
            continue
        url = "https://archive.org/download/%s/" % row[0]
        page = urllib2.urlopen(url).read()
        soup = BeautifulSoup(page)
        links = soup.findAll('a', attrs={'href': re.compile("\.mp3$")})
        # Only want the first link in the page.
        link = links[0]
        link = link.get('href', None)
        link = urlparse.urljoin(url, link)
        print link

When you run this it converts each identifier into a download URL:

Edit: Amusingly WordPress turns the next pre section with MP3 URLs into music players. I recommend listening to them!

$ ./download.py | head -10











And after that you can download as many 78s as you can handle 🙂 by doing:

$ ./download.py > downloads
$ wget -nc -i downloads

Update

I only downloaded about 5% of the tracks, but it looks as if downloading it all would be ~ 100 GB. Also most of these tracks are still in copyright (thanks to insane copyright terms), so they may not be suitable for sampling on your next gramophone-rap record.

Update #2

Don’t forget to donate to the Internet Archive. I gave them $50 to continue their excellent work.

3 Comments

Filed under Uncategorized

Fedora 26 is out, virt-builder images available

Fedora 26 is released today. virt-builder images are already available for almost all architectures:

$ virt-builder -l | grep fedora-26
fedora-26                aarch64    Fedora® 26 Server (aarch64)
fedora-26                armv7l     Fedora® 26 Server (armv7l)
fedora-26                i686       Fedora® 26 Server (i686)
fedora-26                ppc64      Fedora® 26 Server (ppc64)
fedora-26                ppc64le    Fedora® 26 Server (ppc64le)
fedora-26                x86_64     Fedora® 26 Server

For example:

$ virt-builder fedora-26
$ qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -m 2048 \
    -drive file=fedora-26.img,format=raw,if=virtio

Why not s390x? That’s because qemu doesn’t yet emulate enough of the s390x instruction set / architecture so that we can run Fedora under TCG emulation.

Leave a comment

Filed under Uncategorized

Patch review and message brokers

One thing I’ve wanted to do for a long time is get better at patch review. It’s pretty important for successful open source projects to provide feedback to developers quickly, and as anyone who follows me on the libguestfs mailing list will know, I’m terrible at it.

One thing I could do to make it a bit better is to automate the boring bits: Does the patch series apply? Does it compile? Does it pass the test suite? If one of those things isn’t true then we tell the submitter to fix it.

Some projects — the Linux Kernel Mailing List (LKML) for instance — provide basic feedback automatically. For LKML this is provided by Intel’s 0-day test service. If you post a patch on LKML then sooner or later you’ll receive an automated reply like this one.

Today I thought I’d write something like this, partly to reinvent the wheel, but mostly to learn more about the RabbitMQ message broker.

You see, if you have to receive emails, run large tests, and send more emails, then at least two and possibly more machines and going to be involved, and as soon as you are using two or more machines, you are writing a distributed system and you need to use the right tools. Message brokers and RabbitMQ in particular make writing distributed systems easy — trust me, I’ll show you how!

Receiving emails

Our first task is going to be how to get the emails into the system. We can use a procmail rule to copy emails to a command of our choice, but the command only sees one email at a time. Patch series are spread over many individual emails, you don’t always get them at once, and you certainly aren’t guaranteed to get them in order.

So first of all I set up a RabbitMQ queue which just takes in emails in any order and queues them:

The input to this queue is a simple script which can inject single emails or (for testing) mboxes.

Threading emails

Reconstructing the email thread, filtering out non-patch emails, and putting the patches into the right order, is done by a second task which runs periodically from a cron job.

The threading task examines the input queue and tries to reconstruct whole patch series from it. If it gets a patch series, it removes those messages from the input queue and places the whole patch series as a single message on a second queue:

What makes this possible is that RabbitMQ allows you to get messages from a queue, and then acknowledge (or not acknowledge) them later. So the threader gets all the available messages, tries to assemble them into threads. If it finds a complete patch series, then it acknowledges all of those emails — which deletes them from the input queue. For incomplete patch series, it doesn’t bother to acknowledge them, so they stay on the queue for next time.

By the magic of message brokers, the threader doesn’t even need to run on the same machine. Conceivably you could even run it on multiple machines if there was a very high load situation, and it would still work reliably.

Performing the tests

Once we have our fully assembled patch series threads, these are distributed to multiple queues by a RabbitMQ fanout exchange. There is one queue for each type of test we need to run:

An instance of a third task called perform-tests.py picks up the patches and tests them using custom scripts that you have to write.

The tests usually run inside a virtual machine to keep them out of harm’s way. Again the use of a message broker makes this trivial, and you can even distribute the tests over many machines if you want with no extra programming.

Reporting

There is a final queue: When tests finish, we don’t necessarily want to email out the report from the test machine. There would be several problems with that: it would reveal details of your testing infrastructure in email headers; SMTP servers aren’t necessarily available all the time; and you don’t always want your test machines to have access to the public internet.

Instead the result is placed on the patchq_reports queue, and a final task called send-reports.py picks these reports up and sends them out periodically. The report emails have the proper headers so they are threaded into the original mailing list postings.

Conclusion

It’s a simple but powerful multi-machine test framework, all in under 600 lines of code.

3 Comments

Filed under Uncategorized