Tag Archives: computer science

Goals – an experimental new tool which generalizes “make”

For the past few weeks I’ve been working on a new tool called goals which generalizes make. I did a quick talk at Red Hat about this tool which you can download from the link below:

Video link (MP4 format, 31 minutes, 217 MB)

There are also my written notes from the talk here.

3 Comments

Filed under Uncategorized

Inspection, now with added Prolog

You can give libguestfs an unknown disk image, and it tries to guess what’s on it, in terms of operating systems, Linux distro, Windows drive letter mappings and so on, a process that we call inspection. This is an important part of many of the virt tools, because when you type a command like

$ virt-cat -a linux.img /var/log/messages

how is virt-cat to know that on this particular disk image the sysadmin put /var on a separate partition? Because, inspection.

Given that inspection is such an important part of many tools, and vital for standalone programs like virt-inspector you might wonder how it works.

The answer, right now, is 6000+ lines of hairy, intricate C code, which is difficult to maintain and a source of hard to fix bugs and hard to implement feature requests.

$ wc -l src/inspect*.c
   823 src/inspect-apps.c
   725 src/inspect.c
   777 src/inspect-fs.c
   543 src/inspect-fs-cd.c
  2092 src/inspect-fs-unix.c
   704 src/inspect-fs-windows.c
   600 src/inspect-icon.c
  6264 total

How can we make this better?

Getting back to basics, inspection is really a lot of heuristics. Things like:

  • If this filesystem contains a file /etc/fstab then it could be a Linux root filesystem. And:
  • If this thing we think is a Linux root filesystem contains /etc/debian_version then it could be a Debian root filesystem.

These heuristics can be expressed in a logic language inspired by Prolog:

LinuxRoot(fs) :-
    Filesystem(fs),
    File(fs, "/etc/fstab").
DebianRoot(fs) :-
    LinuxRoot(fs),
    File(fs, "/etc/debian_version").

What we’re doing here is collecting a set of facts (Prolog calls them “compound terms”), like:

Filesystem("/dev/sda1").
File("/dev/sda1", "/etc/fstab").
File("/dev/sda1", "/etc/debian_version").

and deriving new facts using the rules:

LinuxRoot("/dev/sda1").
DebianRoot("/dev/sda1").

(I should say at this point that I’m simplifying things a bit. If you want to get a flavour of what the inspection rules might finally look like, then take a look at this file.)

So far I have written a compiler that compiles inspection rules into fairly efficient C code (and hence to binaries), using a forward chaining strategy. It has some nice features like transparently embedding C code into the rules, allowing you to do more complicated operations directly in C:

Distro(fs, distro) :-
    LinuxRootWithOSRelease(fs), /* has /etc/os-release */
    (distro)?={{
      int r;
      CLEANUP_FREE char *distro = NULL;
      if ((r = get_distro_from_os_release (fs, &distro))
           <= 0)
        return r;
      set_distro (distro);
      return 0;
    }}.

and

BlockDevice(dev) :-
    (dev)*={{
      CLEANUP_FREE_STRING_LIST char **devs =
        get_all_block_devices ();
      if (devs == NULL) return -1;
      for (size_t i = 0; devs[i] != NULL; ++i)
        set_dev (devs[i]);
      return 0;
    }}.

My inspection rules run to < 500 lines of code so far, although it’s hard to compare that to the current code because (a) the inspection rules will likely double or triple in size once they are able to do everything that the current code can do, and (b) there’s a lot of supporting runtime code like get_all_block_devices above.

Nevertheless I hope the new rules system will be faster, more supportable and extensible, and easier to understand than the current code. It will also be 100% backwards compatible with existing libguestfs users (since we never break compatibility).

You can follow development in this branch on github.

Update: Hacker News discussion of this article.

Leave a comment

Filed under Uncategorized

“make” and queuing theory

One directory in the libguestfs sources has 101 source files with a wide range of sizes:

$ ls -gGS *.c
-r--r--r--. 1 498686 Aug  4 20:01 stubs.c
-rw-rw-r--. 2 203439 Sep 18 14:52 guestfs_protocol.c
-rw-rw-r--. 1  51723 Jul 28 14:15 btrfs.c
-rw-rw-r--. 1  36644 Jul 28 14:15 guestfsd.c
-rw-rw-r--. 1  32477 Jul 28 14:15 ext2.c
                 ...
-rw-rw-r--. 1   1120 Feb 14  2015 rename.c
-rw-rw-r--. 1   1073 Feb 14  2015 sleep.c
-rw-rw-r--. 1   1065 Feb 14  2015 echo-daemon.c
-rw-rw-r--. 1    961 Feb 14  2015 pingdaemon.c

If we take file size as a proxy for compilation time, stubs.c is probably going to take the longest time to compile.

The current Makefile builds them in alphabetical order. Unfortunately because stubs.c is near the end of the list, this means the final link step has to wait for stubs.c to finish. While waiting, only a single core is being used and all the other cores are idle.

Can we organize builds to get an overall faster compile?

Simple queuing theory suggests two (obvious) possibilities: starting the builds from the shortest first to the longest last will minimize the amount of time we have to wait for any job to complete.

But what we care about is overall compile time, so starting the longest jobs first should be better, since that way the final link shouldn’t need to wait for a late-started long compile.

Alphabetical 10.5 s
Shortest (smallest) job first 10.3 s
Longest (largest) job first 8.5 s
A random order 8.5 s

So my conclusion is that make could do better by having some way to reorder the list of input files by file size. Even randomly reordering the files could improve some compiles (although that would depend on the particular output of your PRNG on the day you ran the compile).

GNU make has no $(sort_by_file_size) function, but we can get the same effect by using $(shell ls -1S list of source files).

Unfortunately using GNU make functions is incompatible with automake. Grrrrrr. This is the partial solution I added to libguestfs.

An earlier version of this post had the wrong times in the table, leading to the wrong conclusion.

4 Comments

Filed under Uncategorized