For more half-baked ideas, see my “ideas” tag.
After we wrote virt-df and later libguestfs, what customers were asking me about was to be able to read out of /proc and /sys in a running virtual machine.
Of course that’s not possible with libguestfs. libguestfs reads the filesystem. /proc is a synthetic “filesystem” that only exists in the living C structs in the Linux kernel. What’s worse, those structs change with every release and every vendor specific patch. Following C structs is not easy, although we did it (with the help of a giant database) for virt-mem.
The prize of being able to “read” /proc is great — reading out statistics, process tables, network configuration, and much more information besides.
To do this tractably, what we need is to be able to inject syscalls into the virtual machine. If we could inject the following sequence of syscalls, we’d be able to read /proc in a completely portable manner without needing to chase kernel structs:
open ("/proc", O_RDONLY); getdents (fd, ...); read (fd, ...);
Here is my half-baked idea for how to do it.
- Wait for a userspace program to be running. Then pause the VM.
- Fork qemu, so we have a complete copy of the VM, its state, memory and so on. The original (parent) process can now be resumed, and hence the VM resumes. The rest of this discussion concerns only the child process.
- Disconnect qemu from any outside influence. This means disconnecting any block devices, network devices, and perhaps other devices. This ensures our private copy of the VM can’t accidentally overwrite any state from the real, running VM.
- At this point we have a “captured” userspace process in the VM. It doesn’t particularly matter which process we happened to capture. We now set up the stack frame and registers for the system call we want to execute. Any previous contents of the memory and registers can be discarded.
- Set the emulation running. (The captured userspace process now runs and performs the syscall).
- Trap back into qemu when the syscall exits.
- Capture the return value from the syscall, which might be a status code, error or read buffer. In any case, we’ve successfully injected a syscall into the VM and this has allowed us to read something out of /proc.
- Discard the qemu child process.
We make the modest assumption that the syscall we chose will run without scheduling. Even if it does schedule, the fact that we have disconnected qemu from any block devices (writes effectively go to /dev/null) should mean at least it won’t damage anything.
Notice that we’re using the public syscall interface to the Linux kernel, not depending on the details of changing internal structures.
As ideas go this seems tractable, although the implementation is both technically difficult and probably hard to get upstream. We need a way to trap-and-pause when a VM switches to userspace. We need to be able to fork the VM and do all sorts of modifications on our copy. Then we would need some nice wrappers around this so the user just has to type “virt-ifconfig myvm” (note: previously we implemented virt-ifconfig as part of the virt-mem project by chasing kernel structs).