Tag Archives: virtualization

NBD over AF_VSOCK

How do you talk to a virtual machine from the host? How does the virtual machine talk to the host? In one sense the answer is obvious: virtual machines should be thought of just like regular machines so you use the network. However the connection between host and guest is a bit more special. Suppose you want to pass a host directory up to the guest? You could use NFS, but that’s sucky to set up and you’ll have to fiddle around with firewalls and ports. Suppose you run a guest agent reporting stats back to the hypervisor. How do they talk? Network, sure, but again that requires an extra network interface and the guest has to explicitly set up firewall rules.

A few years ago my colleague Stefan Hajnoczi ported VMware’s vsock to qemu. It’s a pure guest⟷host (and guest⟷guest) sockets API. It doesn’t use regular networks so no firewall issues or guest network configuration to worry about.

You can run NFS over vsock [PDF] if you want.

And now you can of course run NBD over vsock. nbdkit supports it, and libnbd is (currently the only!) client.

Leave a comment

Filed under Uncategorized

libnbd + FUSE = nbdfuse

I’ve talked before about libnbd, our NBD client library. New in libnbd 1.2 is a tool called nbdfuse which lets you turn NBD servers into virtual files.

A couple of weeks ago I mentioned you can use libnbd as a C library to edit qcow2 files. Now you can turn qcow2 files into virtual raw files:

$ mkdir dir
$ nbdfuse dir/file.raw \
      --socket-activation qemu-nbd -f qcow2 file.qcow2
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 1073741824 Jan  1 10:10 file.raw

Reads and writes to file.raw are backed by the original qcow2 file which is updated in real time.

Another fun thing to do is to use nbdkit, xz filter and curl to turn xz-compressed remote disk images into uncompressed local files:

$ mkdir dir
$ nbdfuse dir/disk.img \
      --command nbdkit -s curl --filter=xz \
                       http://builder.libguestfs.org/fedora-30.xz
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 6442450944 Jan  1 10:10 disk.img
$ file dir/disk.img
dir/disk.img: DOS/MBR boot sector
$ qemu-system-x86_64 -m 4G \
      -drive file=dir/disk.img,format=raw,if=virtio,snapshot=on

Leave a comment

Filed under Uncategorized

libnbd – A new NBD client library

NBD is a high performance protocol for exporting disks between processes and machines. We use it as a kind of “universal connector” for connecting hypervisors with data sources, and previously myself and Eric Blake wrote a general purpose NBD server called nbdkit. (If you’re interested in the topic of nbdkit as a universal connector, watch my FOSDEM talk.)

Up til now our NBD client has been qemu or one of the qemu tools like qemu-img. That was fine if you wanted to expose a disk source as a running virtual machine (ie. running it with qemu), or if you wanted to perform one of the limited copying operations that qemu-img convert can do, but there were many cases where it would have been nice to have a general client library.

For example I started to add NBD support to Jen Axboe’s FIO. Lacking a client library I synthesized NBD request packets as C structs and sent them on the wire using low level socket commands. The performance was, to put it bluntly, crap.

Although NBD is a very simple protocol and you can write it by hand, it would be nicer to have a library wrap the low-level stuff, and that’s why we have written libnbd (downloads).

Getting reasonable performance from NBD requires a few tricks:

  • You must issue as many commands as possible “in flight” (the server will reply to them out of order, but requests and replies are tied together by a unique ID).
  • You may need to open multiple connections to the server, but doing that requires attention to the special MULTI_CONN flag which the server will use to indicate that this is safe.
  • Most crucially you must disable Nagle’s algorithm.

This isn’t an exhaustive list. In fact while writing libnbd over about 3 weeks we improved performance by a factor of over 15 times, just by paying attention to system calls, maximizing parallelism and minimizing latency. One advantage of libnbd is that it encodes all this knowledge in an easy to use library so NBD clients won’t have to reinvent it in future.

The library has a simple high-level synchronous API which works how you would expect (but doesn’t get the best performance). A typical program might look like:

struct nbd_handle *nbd;
int64_t exportsize;
char buf[512];

nbd = nbd_create ();
if (!nbd) goto error;
if (nbd_connect_tcp (nbd, "localhost", "nbd") == -1)
  goto error;
exportsize = nbd_get_size (nbd);
if (nbd_pread (nbd, buf, sizeof buf, 0, 0) == -1) {
 error:
  fprintf (stderr, "%s\n", nbd_get_error ());
}

To get the best performance you have to use the more low-level asynchronous API which allows you to queue up commands and bring your own main loop.

There are also bindings in OCaml and Python (and Rust, soon). There’s also a nice little shell written in Python so you can access NBD servers interactively:

$ nbdsh
nbd> h.connect_command (["nbdkit", "-s", "memory", "1M"])
nbd> print ("%r" % h.get_size ())
1048576
nbd> h.pwrite (b"12345", 0)
nbd> h.pread (5, 0)
b'12345'

libnbd and the shell, nbdsh, are available now in Fedora 29 and above.

3 Comments

Filed under Uncategorized

virt-install + nbdkit live install

This seems to be completely undocumented which is why I’m writing this … It is possible to boot a Linux guest (Fedora in this case) from a live CD on a website without downloading it. I’m using our favourite flexible NBD server, nbdkit and virt-install.

First of all we’ll run nbdkit and attach it to the Fedora 29 live workstation ISO. To make this work more efficiently I’m going to place a couple of filters on top — one is the readahead (prefetch) filter recently added to nbdkit 1.12, and the other is the cache filter. In combination these filters should reduce the load on the website and improve local performance.

$ rm /tmp/socket
$ nbdkit -f -U /tmp/socket --filter=readahead --filter=cache \
    curl https://download.fedoraproject.org/pub/fedora/linux/releases/29/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-29-1.2.iso

I actually replaced that URL with a UK-based mirror to make the process a little faster.

Now comes the undocumented virt-install command:

$ virt-install --name test --ram 2048 \
    --disk /var/tmp/disk.img,size=10 
    --disk device=cdrom,source_protocol=nbd,source_host_transport=unix,source_host_socket=/tmp/socket \
    --os-variant fedora29

After a bit of grinding that should boot into Fedora 29, and you never (not explicitly at least) had to download the ISO.

Screenshot_2019-04-13_10-30-00

To be fair qemu does also have a curl driver which virt-install could use, but nbdkit is better with the filters and plugins system giving you ultimate flexibility — check out my video about it.

1 Comment

Filed under Uncategorized

nbdkit 1.12

The new stable release of nbdkit, our flexible Network Block Device server, is out. You can read the announcement and release notes here.

The big new features are SSH support, the linuxdisk plugin, writing plugins in Rust, and extents. Extents allows NBD clients to work out which parts of a disk are sparse or zeroes and skip reading them. It was hellishly difficult to write because of the number of obscure corner cases.

Also in this release, are a couple of interesting filters. The rate filter lets you add a bandwidth limit to connections. We will use this in virt-v2v to allow v2v instances to be rate limited (even dynamically). The readahead filter makes sequential copying and scanning of plugins more efficient by prefetching data ahead of time. It is self-configuring and in most cases simply adding the filter into your filter stack is sufficient to get a nice performance boost, assuming your client’s access patterns are mostly sequential.

1 Comment

Filed under Uncategorized

Tip: Edit grub kernel command line in RHEL 7 or CentOS 7

Easy with virt-customize. In this example I’m adding the nosmt option to the command line:

$ virt-customize -a rhel7.img \
    --edit '/etc/default/grub:
      s/^GRUB_CMDLINE_LINUX="/GRUB_CMDLINE_LINUX="nosmt /' \
    --run-command 'grub2-mkconfig -o /boot/grub2/grub.cfg'

Leave a comment

Filed under Uncategorized

nbdkit / FOSDEM test presentation about better loop mounts for Linux

I’ve submitted a talk about nbdkit, our flexible pluggable NBD server, to FOSDEM next February. This is going to be about using NBD as a better way to do loop mounts in Linux.

In preparation I gave a very early version of the talk to a small Red Hat audience.

Video link: http://oirase.annexia.org/rwmj.wp.com/rjones-nbdkit-tech-talk-2018-11-19.mp4

Sorry about the slow start. You may want to skip to 2 mins to get past the intro.

Summary of what’s in the talk:

  1. Demo of regular, plain loop mounting.
  2. Demo of loop mounting an XZ-compressed disk image using NBD + nbdkit.
  3. Slides about how loop device compares to NBD.
  4. Slides about nbdkit plugins and filters.
  5. Using VMware VDDK to access a VMDK file.
  6. Creating a giant disk costing EUR 300 million(!)
  7. Visualizing a single filesystem.
  8. Visualizing RAID 5.
  9. Writing a plugin in shell script (live demo).
  10. Summary.

Screenshot_2018-11-26_17-18-16

2 Comments

Filed under Uncategorized