Linux Kernel Library backend for libguestfs

For a very long time I’ve been keeping an eye on rump kernels. These are a fancy way of saying ordinary kernel code which has been ported or altered in some way so you can link it to userspace programs, usually as a library. There have been claims that rump kernels make libguestfs obsolete, which is a load of nonsense for reasons I’ll talk about in a minute. But rump kernels are interesting and finally I’ve been able to add a rump kernel library to libguestfs.

The first time I encountered rump kernels was in 2010 with Prof. Renzo Davoli’s virtualsquare View-OS project. The more recent rump kernel project was based on NetBSD. Since NetBSD doesn’t use Linux drivers, but reimplementations, it was never very interesting. We couldn’t have included it in RHEL, and it would be only a matter of time before someone found an important feature or data-corrupting difference between the real ext2/3 driver from Linux and the reimplementation in BSD. Last I looked, NetBSD didn’t even support ext4. The choice of NetBSD made it a non-starter for us.

libos arrived earlier this year and, being based on Linux, was a lot more interesting, but their aim was only the Linux network stack so it wasn’t directly relevant to libguestfs, which is only concerned with disk and filesystem access.

However last week Richard Weinberger (Linux UML maintainer) CC’d me on a much more interesting project. LKL (Linux Kernel Library) is a relatively small patchset on top of Linux that turns Linux into a library. Quite literally you just link your program with -llkl and you can start a Linux kernel and make Linux system calls. A few example programs can be found here.

Today I wrote a work-in-progress LKL backend for libguestfs. [Note for those keen to jump in and use it, it doesn’t do very much at the moment. There is a small to-do list. But it’s only a few hours more work to have it as a useful backend for libguestfs.]

What could an LKL-enabled libguestfs do? Well it could open a raw disk image that contains only a Linux filesystem, or possibly a partitioned disk image with a Linux filesystem. And it can do that very quickly – in about 0.03 seconds in my tests (compared to 2.5 seconds for the qemu-based libguestfs).

There are a long list of things that LKL cannot do however. It cannot open anything containing LVM (because LVM requires userspace tools, that would all have to be modified to use liblkl, and there would be all kinds of synchronization problems between the LVM-linked liblkl kernel and the libguestfs-linked kernel). It cannot run anything that requires a userspace tool. That includes btrfs, ntfs-3g (FUSE based), some MD devices, encrypted disks, dm-thinp and Windows Dynamic Disks.

It cannot create new LVs or filesystems or any of the other disk structures that libguestfs can create.

It cannot open disk formats like qcow2 and vmdk. The qcow2 code in particular is very specific to qemu, and cannot possibly be ported to the kernel.

It cannot open remote devices like Ceph, Gluster, https, ssh or ftp (although there is an nbd client in the kernel, so one day that may be possible).

It cannot run commands in the context of the guest.

It’s also less secure than libguestfs with the qemu backend, because it doesn’t isolate your host from exploits in the guest or filesystem using virtualization and sVirt.

All of those things can be done by libguestfs today, but may never be possible (or only in limited cases) with LKL.

But within the limits of what LKL can do, it should become an excellent choice (assuming it gets upstream). Adding an LKL backend to libguestfs brings to the table the large, but clean and well-tested libguestfs API and the language bindings and virt tools, so libguestfs gives you the best of both worlds: the performance of LKL by using that backend, or the massive feature set of libguestfs by switching over to the qemu or libvirt backend, which for most programs is a single line code change.


Filed under Uncategorized

13 responses to “Linux Kernel Library backend for libguestfs

  1. > It cannot run anything that requires a userspace tool. That includes btrfs
    Pardon my ignorance, but what is the userspace tool that is a requirement to run btrfs? I’m assuming xfs and ext4 don’t have this limitation.

    • rich

      While simply opening and reading/writing btrfs would work, as soon as you try to do anything even mildly complicated (eg. listing subvolumes, or showing the “true” space usage of a filesystem — btrfs filesystem df) then you need to use the external btrfs program.

      How would that work? Well, we could link btrfs to liblkl.a (it would also require extensive modifications to the tool). But then you’ve got to somehow coordinate between the Linux kernel library which is running linked to btrfs and the Linux kernel library which is running at the same time linked to libguestfs. And both trying to update the same disk image. How that would work is very unclear to me. And this is not some sort of libguestfs problem, this is an LKL problem that affects any LKL user.

      • Is that because a lot of the btrfs functionality and features aren’t mapped directly to system calls, but rather are expressed as patterns and flows of system calls wrapped into a bunch of userspace logic in the btrfs tool?

      • rich

        Yes, that’s essentially the problem.

        The issue is a lot more clear cut if we talk about LVM, where there is a kernel component (device-mapper, dmsetup table etc), but in order to use LVM at all you have to have all the userspace infrastructure to parse the on-disk metadata and work out which DM table commands to send to the kernel to set it all up.

        Given a disk image, LVM isn’t going to work at all until you run the relevant lvm vgchange commands which scan and parse metadata.

        But again, how’s that actually going to work? You modify the lvm binary extensively so it can be linked with liblkl.a, but it is running its own copy of the Linux kernel. How does that get to talk to our copy of the Linux kernel running in another process?

        And the story is the same for a bunch of other formats like MD and Windows Dynamic Disks. (But Windows Dynamic Disks aren’t really a problem, since our only NTFS support uses ntfs-3g which requires FUSE, which unlocks a whole new level of userspace complexity!)

  2. DDD

    Back in the 70s, it was a test of “theoretical purity” to see if an OS could run itself as a nested process. I am a little surprised that UML, chroot jails, docker and virtualisation all seem to circle around this idea without ever taking it on fully…

    Any process should be able to create a subprocess and provide a comprehensive selection of low-level services to it.

  3. Conrad

    How did you build lkl? Is it involved or straightforward? If it’s complicated, would you write a short post about that?

  4. How about running a process inside lkl kernel? If that becomes possible then you can run anything in the context of the same kernel. But that would mean having all the userspace packaged somewhere. Or maybe lkl can be made to run processes from standard file system tree (delegating fs reading to host kernel)?
    In any case, I think it would be worth if docker could run processes inside lkl such that one could run debian kernel with debian userspace in docker while host is rhel/centos.
    But then how would that be different from para-virtualization?

  5. For Btrfs, the system calls for initialisation are very simple, and can easily be moved to a library (which has already been done, see udev/udev-builtin-btrfs.c). For the rest of the Btrfs API, while the btrfs CLI tool isn’t reentrant, most of the code could probably be extracted to a library. That library can be linked into the backend process, along with LKL.

    The LVM userspace is a lot more complicated. If I had to port it to work with LKL, I would maybe extract a platform support library and perform IPC so that the actual work is done on the LKL backend. No idea how large the useful syscall surface area is.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s