Tag Archives: inspection

Inspection, now with added Prolog

You can give libguestfs an unknown disk image, and it tries to guess what’s on it, in terms of operating systems, Linux distro, Windows drive letter mappings and so on, a process that we call inspection. This is an important part of many of the virt tools, because when you type a command like

$ virt-cat -a linux.img /var/log/messages

how is virt-cat to know that on this particular disk image the sysadmin put /var on a separate partition? Because, inspection.

Given that inspection is such an important part of many tools, and vital for standalone programs like virt-inspector you might wonder how it works.

The answer, right now, is 6000+ lines of hairy, intricate C code, which is difficult to maintain and a source of hard to fix bugs and hard to implement feature requests.

$ wc -l src/inspect*.c
   823 src/inspect-apps.c
   725 src/inspect.c
   777 src/inspect-fs.c
   543 src/inspect-fs-cd.c
  2092 src/inspect-fs-unix.c
   704 src/inspect-fs-windows.c
   600 src/inspect-icon.c
  6264 total

How can we make this better?

Getting back to basics, inspection is really a lot of heuristics. Things like:

  • If this filesystem contains a file /etc/fstab then it could be a Linux root filesystem. And:
  • If this thing we think is a Linux root filesystem contains /etc/debian_version then it could be a Debian root filesystem.

These heuristics can be expressed in a logic language inspired by Prolog:

LinuxRoot(fs) :-
    Filesystem(fs),
    File(fs, "/etc/fstab").
DebianRoot(fs) :-
    LinuxRoot(fs),
    File(fs, "/etc/debian_version").

What we’re doing here is collecting a set of facts (Prolog calls them “compound terms”), like:

Filesystem("/dev/sda1").
File("/dev/sda1", "/etc/fstab").
File("/dev/sda1", "/etc/debian_version").

and deriving new facts using the rules:

LinuxRoot("/dev/sda1").
DebianRoot("/dev/sda1").

(I should say at this point that I’m simplifying things a bit. If you want to get a flavour of what the inspection rules might finally look like, then take a look at this file.)

So far I have written a compiler that compiles inspection rules into fairly efficient C code (and hence to binaries), using a forward chaining strategy. It has some nice features like transparently embedding C code into the rules, allowing you to do more complicated operations directly in C:

Distro(fs, distro) :-
    LinuxRootWithOSRelease(fs), /* has /etc/os-release */
    (distro)?={{
      int r;
      CLEANUP_FREE char *distro = NULL;
      if ((r = get_distro_from_os_release (fs, &distro))
           <= 0)
        return r;
      set_distro (distro);
      return 0;
    }}.

and

BlockDevice(dev) :-
    (dev)*={{
      CLEANUP_FREE_STRING_LIST char **devs =
        get_all_block_devices ();
      if (devs == NULL) return -1;
      for (size_t i = 0; devs[i] != NULL; ++i)
        set_dev (devs[i]);
      return 0;
    }}.

My inspection rules run to < 500 lines of code so far, although it’s hard to compare that to the current code because (a) the inspection rules will likely double or triple in size once they are able to do everything that the current code can do, and (b) there’s a lot of supporting runtime code like get_all_block_devices above.

Nevertheless I hope the new rules system will be faster, more supportable and extensible, and easier to understand than the current code. It will also be 100% backwards compatible with existing libguestfs users (since we never break compatibility).

You can follow development in this branch on github.

Update: Hacker News discussion of this article.

Leave a comment

Filed under Uncategorized

Fuzz-testing libguestfs inspection code

There are a lot of security issues with dealing with untrusted disk images especially since for historical reasons a lot of the code used to parse filesystems sits in the kernel. Libguestfs avoids these by wrapping the kernel code inside a VM (and that VM inside an sVirt container if you’re using Fedora or RHEL).

However the library side of things could still be vulnerable, especially complicated operations like inspection. Last week we found several vulnerabilities in inspection which could allow an untrusted guest to perform a denial of service attack on a host.

The first vulnerability was identified by Coverity. The second was found by Olaf Hering by looking at similar code paths.

This made me wonder if we could find more inspection bugs semi-automatically. To do this I’ve written an inspection fuzz tester.

The idea is we run inspection on an empty disk image. Normally this wouldn’t find any operating systems. But we intercept certain libguestfs calls (which happen as a side-effect of inspection) and use them to create fake operating system files on the fly.

To give you an example: Inspection might look for a file called /etc/redhat-release and then try to parse it. To do this it will first test if the file exists (guestfs_is_file ("/etc/redhat-release")) and if it does read it. In the empty disk this file won’t exist, but we capture the is_file call, randomly create a file, and then see what happens when inspection tries to parse it.

Libguestfs has a trace mechanism but if we decided to do this sort of thing regularly we’d probably want to add a cleaner way to find the arguments and perhaps even replace the return value from a method call.

The result is a fuzz tester which now runs as part of the ordinary test suite.

I also ran many tens of thousands of iterations over the weekend. The test found Olaf Hering’s bug, which is encouraging, but it didn’t find any other bugs, which means there is room for refinement of the test. In particular I think we could push more malformed registry hives at the inspection code to see what it does.

Leave a comment

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 4

[Part 1, part 2, part 3.]

Finally I modified the test to do some representative work: We now load a real Windows XP guest, inspect it (a heavyweight operation), and mount and stat each filesystem. I won’t reproduce the entire test program again because only the test subroutine has changed:

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/tmp/winxp.img");
    $g->launch ();

    # Inspect the guest (ignore the result).
    $g->inspect_os ();

    # Approximate what virt-df does.
    my %fses = $g->list_filesystems ();
    foreach (keys %fses) {
        my $mounted = 0;
        eval { $g->mount_ro ($_, "/"); $mounted = 1; };
        if ($mounted) {
            $g->statvfs ("/");
            $g->umount_all ();
        }
    }

    return $g;
}

Even with all that work going on, I was able to inspect more than 1 disk per second on my laptop, and run 60 threads in parallel with good performance and scalability:

data

Leave a comment

Filed under Uncategorized