Inspection, now with added Prolog

You can give libguestfs an unknown disk image, and it tries to guess what’s on it, in terms of operating systems, Linux distro, Windows drive letter mappings and so on, a process that we call inspection. This is an important part of many of the virt tools, because when you type a command like

$ virt-cat -a linux.img /var/log/messages

how is virt-cat to know that on this particular disk image the sysadmin put /var on a separate partition? Because, inspection.

Given that inspection is such an important part of many tools, and vital for standalone programs like virt-inspector you might wonder how it works.

The answer, right now, is 6000+ lines of hairy, intricate C code, which is difficult to maintain and a source of hard to fix bugs and hard to implement feature requests.

$ wc -l src/inspect*.c
   823 src/inspect-apps.c
   725 src/inspect.c
   777 src/inspect-fs.c
   543 src/inspect-fs-cd.c
  2092 src/inspect-fs-unix.c
   704 src/inspect-fs-windows.c
   600 src/inspect-icon.c
  6264 total

How can we make this better?

Getting back to basics, inspection is really a lot of heuristics. Things like:

  • If this filesystem contains a file /etc/fstab then it could be a Linux root filesystem. And:
  • If this thing we think is a Linux root filesystem contains /etc/debian_version then it could be a Debian root filesystem.

These heuristics can be expressed in a logic language inspired by Prolog:

LinuxRoot(fs) :-
    Filesystem(fs),
    File(fs, "/etc/fstab").
DebianRoot(fs) :-
    LinuxRoot(fs),
    File(fs, "/etc/debian_version").

What we’re doing here is collecting a set of facts (Prolog calls them “compound terms”), like:

Filesystem("/dev/sda1").
File("/dev/sda1", "/etc/fstab").
File("/dev/sda1", "/etc/debian_version").

and deriving new facts using the rules:

LinuxRoot("/dev/sda1").
DebianRoot("/dev/sda1").

(I should say at this point that I’m simplifying things a bit. If you want to get a flavour of what the inspection rules might finally look like, then take a look at this file.)

So far I have written a compiler that compiles inspection rules into fairly efficient C code (and hence to binaries), using a forward chaining strategy. It has some nice features like transparently embedding C code into the rules, allowing you to do more complicated operations directly in C:

Distro(fs, distro) :-
    LinuxRootWithOSRelease(fs), /* has /etc/os-release */
    (distro)?={{
      int r;
      CLEANUP_FREE char *distro = NULL;
      if ((r = get_distro_from_os_release (fs, &distro))
           <= 0)
        return r;
      set_distro (distro);
      return 0;
    }}.

and

BlockDevice(dev) :-
    (dev)*={{
      CLEANUP_FREE_STRING_LIST char **devs =
        get_all_block_devices ();
      if (devs == NULL) return -1;
      for (size_t i = 0; devs[i] != NULL; ++i)
        set_dev (devs[i]);
      return 0;
    }}.

My inspection rules run to < 500 lines of code so far, although it’s hard to compare that to the current code because (a) the inspection rules will likely double or triple in size once they are able to do everything that the current code can do, and (b) there’s a lot of supporting runtime code like get_all_block_devices above.

Nevertheless I hope the new rules system will be faster, more supportable and extensible, and easier to understand than the current code. It will also be 100% backwards compatible with existing libguestfs users (since we never break compatibility).

You can follow development in this branch on github.

Update: Hacker News discussion of this article.

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s