Half-baked ideas: inject syscalls into virtual machines

For more half-baked ideas, see my “ideas” tag.

After we wrote virt-df and later libguestfs, what customers were asking me about was to be able to read out of /proc and /sys in a running virtual machine.

Of course that’s not possible with libguestfs. libguestfs reads the filesystem. /proc is a synthetic “filesystem” that only exists in the living C structs in the Linux kernel. What’s worse, those structs change with every release and every vendor specific patch. Following C structs is not easy, although we did it (with the help of a giant database) for virt-mem.

The prize of being able to “read” /proc is great — reading out statistics, process tables, network configuration, and much more information besides.

To do this tractably, what we need is to be able to inject syscalls into the virtual machine. If we could inject the following sequence of syscalls, we’d be able to read /proc in a completely portable manner without needing to chase kernel structs:

open ("/proc", O_RDONLY);
getdents (fd, ...);
read (fd, ...);

Here is my half-baked idea for how to do it.

  1. Wait for a userspace program to be running. Then pause the VM.
  2. Fork qemu, so we have a complete copy of the VM, its state, memory and so on. The original (parent) process can now be resumed, and hence the VM resumes. The rest of this discussion concerns only the child process.
  3. Disconnect qemu from any outside influence. This means disconnecting any block devices, network devices, and perhaps other devices. This ensures our private copy of the VM can’t accidentally overwrite any state from the real, running VM.
  4. At this point we have a “captured” userspace process in the VM. It doesn’t particularly matter which process we happened to capture. We now set up the stack frame and registers for the system call we want to execute. Any previous contents of the memory and registers can be discarded.
  5. Set the emulation running. (The captured userspace process now runs and performs the syscall).
  6. Trap back into qemu when the syscall exits.
  7. Capture the return value from the syscall, which might be a status code, error or read buffer. In any case, we’ve successfully injected a syscall into the VM and this has allowed us to read something out of /proc.
  8. Discard the qemu child process.

We make the modest assumption that the syscall we chose will run without scheduling. Even if it does schedule, the fact that we have disconnected qemu from any block devices (writes effectively go to /dev/null) should mean at least it won’t damage anything.

Notice that we’re using the public syscall interface to the Linux kernel, not depending on the details of changing internal structures.

As ideas go this seems tractable, although the implementation is both technically difficult and probably hard to get upstream. We need a way to trap-and-pause when a VM switches to userspace. We need to be able to fork the VM and do all sorts of modifications on our copy. Then we would need some nice wrappers around this so the user just has to type “virt-ifconfig myvm” (note: previously we implemented virt-ifconfig as part of the virt-mem project by chasing kernel structs).



Filed under Uncategorized

6 responses to “Half-baked ideas: inject syscalls into virtual machines

  1. Wow, the most expensive system call ever =)

    I wonder though…if you’re at the point of running arbitrary code in the guest VM, why not just inject virt-guest-tools.rpm?

    • rich

      The problem is that guests mostly don’t have any special tools. I’ve been working on this for a couple of years and there’s been no change at all on this.

  2. René

    Sounds like the job of some daemon running on the guest.

    Or maybe it would be possible to write a driver for getting this kind of information. Like a hardware driver that does something with an interrupt and some memory which makes the guest OS report the status of whatever you want.
    I must admit I don’t know enough about the other virtual drivers to know if this is easier. It just seems easier and cleaner to me.

    Cheers 🙂

  3. Rich-

    VMWare engineers have documented the difficulties of inspecting kernel structures “from the outside” at http://stackframe.blogspot.com/ See especially the references to “getlinuxoffsets”. It sounds like you’ve done something similar for virt-mem.

    I’m ignorant of KVM’s support for kernel debugging but I /do/ know that VMWare includes a robust GDB stub in their free products. Powerful stuff.

    I think the “non-intrusive” approach (perhaps aided by some heuristics tailored for “recent” kernels) is slick, but the method you outlined above certainly sounds more powerful.



    • rich

      I don’t think it’s so hard. It’s actually easy if you have kernel symbols, the hard/research bit is doing it blind on an unknown Linux kernel.

      Our “crash” analysis tool can already inspect live kernels (with symbols supplied) to do things like listing processes..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.