Tag Archives: make

“make” and queuing theory

One directory in the libguestfs sources has 101 source files with a wide range of sizes:

$ ls -gGS *.c
-r--r--r--. 1 498686 Aug  4 20:01 stubs.c
-rw-rw-r--. 2 203439 Sep 18 14:52 guestfs_protocol.c
-rw-rw-r--. 1  51723 Jul 28 14:15 btrfs.c
-rw-rw-r--. 1  36644 Jul 28 14:15 guestfsd.c
-rw-rw-r--. 1  32477 Jul 28 14:15 ext2.c
                 ...
-rw-rw-r--. 1   1120 Feb 14  2015 rename.c
-rw-rw-r--. 1   1073 Feb 14  2015 sleep.c
-rw-rw-r--. 1   1065 Feb 14  2015 echo-daemon.c
-rw-rw-r--. 1    961 Feb 14  2015 pingdaemon.c

If we take file size as a proxy for compilation time, stubs.c is probably going to take the longest time to compile.

The current Makefile builds them in alphabetical order. Unfortunately because stubs.c is near the end of the list, this means the final link step has to wait for stubs.c to finish. While waiting, only a single core is being used and all the other cores are idle.

Can we organize builds to get an overall faster compile?

Simple queuing theory suggests two (obvious) possibilities: starting the builds from the shortest first to the longest last will minimize the amount of time we have to wait for any job to complete.

But what we care about is overall compile time, so starting the longest jobs first should be better, since that way the final link shouldn’t need to wait for a late-started long compile.

Alphabetical 10.5 s
Shortest (smallest) job first 10.3 s
Longest (largest) job first 8.5 s
A random order 8.5 s

So my conclusion is that make could do better by having some way to reorder the list of input files by file size. Even randomly reordering the files could improve some compiles (although that would depend on the particular output of your PRNG on the day you ran the compile).

GNU make has no $(sort_by_file_size) function, but we can get the same effect by using $(shell ls -1S list of source files).

Unfortunately using GNU make functions is incompatible with automake. Grrrrrr. This is the partial solution I added to libguestfs.

An earlier version of this post had the wrong times in the table, leading to the wrong conclusion.

4 Comments

Filed under Uncategorized

Goaljobs, part 3

In part 2 I introduced an example goaljobs script that can rebuild a set of packages in Fedora in the right order.

It’s time to take a closer look at targets — the promise that you make that some condition will be true by the time a goal has run.

In the Fedora rebuild script the goal targets looked like this:

let goal rebuilt pkg =
  target (koji_build_state (fedora_verrel pkg branch)
               == `Complete);
  ...

koji_build_state is a regular function. It’s implemented using the koji buildinfo command line tool for querying the Koji build system. (The koji command line tool is annoyingly hard to automate, but as we’ve got a complete programming language available — not just bash — the implementation of koji_build_state is tedious and long, but doable).

Querying Koji takes a few seconds and we don’t want to do it every time we check a goal. Goaljobs offers a feature called “The Memory” which lets you memoize functions. “The Memory” is just a fancy name for a key/value store which is kept in ~/.goaljobs-memory and persists across goaljobs sessions:

let koji_build_state verrel =
  let key = sprintf "koji_build_complete_%s" verrel in
  if memory_exists key then
    `Complete
  else (
    (* tedious code to query koji *)
    if state == `Complete then
      memory_set key "1";
    state
  )

With strategic use of memoization, evaluating goaljobs goals can be very fast and doesn’t change the fundamental contract of targets.

Finally in this part: a note on how targets are implemented.

A target is a boolean expression which is evaluated once near the beginning of the goal. If it evaluates to true at the beginning of the goal then the rest of the goal can be skipped because the goal has already been achieved / doesn’t need to be repeated.

And since targets are just general expressions, they can be anything at all, from accessing a remote server (as here) to checking the existence of a local file (like make). As long as something can be tested quickly, or can be tested slowly and memoized, it’s suitable to be a target.

1 Comment

Filed under Uncategorized

Goaljobs, part 2

In part 1 I showed how a simple make rule could be converted to a special “goal” function and I hinted that we were not limited to just the “file is older than” semantics implicit in make.

So let’s have a look at the goals I wrote to automate the recent OCaml rebuild in Fedora.

Recall from part 1: Targets are a contractual promise that you make in goaljobs. They are a promise that some condition will be true after running the goal. Requirements are conditions that must be true before the goal can start running.

For a Fedora package to achieve the goal of being rebuilt, the target is that the Koji build state of the current release must be “Completed”. The requirements are that every dependency of the package has been rebuilt. So:

let goal rebuilt pkg =
  target (koji_build_state (fedora_verrel pkg branch)
               == `Complete);

  (* Require the rebuild to have started: *)
  require (rebuild_started pkg);

  ... some code to wait for the build to finish ...

The above code is not complete (it’s a complex, real-world working example after all).

I split the rebuilt goal into two separate goals for reasons that will become clear later. But the first goal above says that the package rebuild must have been started off, and we’ll wait for the package build to complete.

Note that once the build is complete, the target promise is true.

The subgoal rebuild_started is defined like this:

let goal rebuild_started pkg =
  (* The dependencies of this package: *)
  let deps = List.assoc pkg pkg_deps in

  target (
     match koji_build_state (fedora_verrel pkg branch) with
          | `Building | `Complete -> true
          | _ -> false
    );

  (* All dependent packages must have been fully rebuilt: *)
  List.iter (fun dep -> require (rebuilt dep)) deps;

  (* Rebuild the package in Koji. *)
  koji_build pkg branch

It’s saying that the target (promise) will be that the Koji package will either be building or may even be complete. And that we first of all require that every build dependency of this package has been completely, successfully rebuilt. If those requirements are met, we tell Koji to start building the package (but in this goal we don’t need to wait for it to complete).

Why did I split the goal into two parts?

The reason is that I want to define a make-like all goal:

let goal all () =
  List.iter (fun pkg -> require (rebuild_started pkg))
    source_packages

This iterates over all my source packages and starts rebuilding them.

Note it doesn’t wait for each one to be rebuilt … unless they are required as dependencies of another package, in which case the require (rebuilt dep) will kick in and wait for them before rebuilding the dependent package.

In other words, this code automatically resolves dependencies, waiting where necessary, but otherwise just kicking off builds, which is exactly what I wanted.


Finally a bit about how you use a goaljobs script. Unlike make you have to compile the script into a binary. To compile the script, use the convenient wrapper goaljobs (it’s a simple shell script that invokes the OCaml compiler):

goaljobs fedora_ocaml_rebuild.ml

This makes a binary called fedora_ocaml_rebuild which is the program for mass-rebuilding the whole of Fedora’s OCaml subset.

When you run it with no arguments, it searches for a goal called all and “requires” that goal (just like make).

You can also run other goals directly. Any goal which is “published” can be run from the command line. All goals that have no parameters — such as all — are published automatically.

For goals that take parameters, if you want to use them from the command line you have to publish them manually. The reason is that you have to provide a small code snippet to convert the command line parameters to goal parameters, which may involve type conversion or other checks (since OCaml is strongly typed and parameters can be any type, not just strings or filenames).

1 Comment

Filed under Uncategorized

Goaljobs, part 1

A little more than a year ago I released whenjobs which was an attempt to create a practical language for automating complex “business rules”. The kind of thing I’m talking about is managing the many diverse steps between me tagging a libguestfs commit with a version number and a fully tested tarball appearing on the website. Or the hundreds of steps that go into 100 OCaml packages being updated and rebuilt for Rawhide.

Whenjobs wasn’t the right answer. Goaljobs [very early alpha] might possibly be.

What I need is something which is flexible, can deal with failures (both hard and intermittent), and can be killed and restarted at any point.

The first observation is that make is nearly the right tool. It’s goal-based, meaning that you set down a target that you want to have happen, and some rules to make that happen, and this lets you break down a problem from the larger goal (“build my program!”) to smaller subgoals (“compile this source file”).

program: main.o utils.o
  cc $^ -o $@

The goal is “program is built”. There are some requirements (main.o, utils.o), and there’s a recipe (run cc). You can also kill make in the middle and restart it, and it’ll usually continue from where it left off.

Make also lets you parameterize goals, although only in very simple ways:

%.o: %.c
  cc -c $< -o $@

Implicit in the “:” (colon) character is make’s one simple rule, which is roughly this: “if the target file doesn’t exist, or the prerequisite files are newer than the target, run the recipe below”.

In fact you could translate the first make rule into an ordinary function which would look something like this:

function build_program ()
{
  if (!file_exists ("program") ||
      file_older ("program", "main.o") ||
      file_older ("program", "utils.o")) {
    shell ("cc -c %s -o %s", "main.o utils.o",
           "program");
  }
}

Some points arise here:

  • Why can’t we change the target test to something other than “file exists or is newer”?
    How about “remote URL exists” (and if not, we need to upload a file)?
    How about “Koji build completed successfully” (and if not we need to do a Fedora build)?
  • What could happen if we could add parameters to build_program?

Goaljobs attempts to answer these questions by turning make-style rules into “goals”, where goals are specialized functions similar to the one above that have a target, requirement(s), a recipe to implement them, and any number of parameters.

For example, a “compile *.c to *.o” goal looks like this:

let goal compiled c_file =
  (* convert c_file "foo.c" -> "foo.o": *)
  let o_file = change_file_extension "o" c_file in

  target (more_recent [o_file] [c_file]);

  sh "
    cd $builddir
    cc -c %s -o %s
  " c_file o_file

The goal is called compiled and it has exactly one parameter, the name of the C source file that must be compiled.

The target is a promise that after the recipe has been run the *.o file will be more recent than the *.c file. The target is both a check used to skip the rule if it’s already true, but also a contractual promise that the developer makes (and which is checked by goaljobs) that some condition holds true at the end of the goal.

sh is a lightweight way to run a shell script fragment, with printf-like semantics.

And the whole thing is wrapped in a proper programming language (preprocessed OCaml) so you can do things which are more complicated than are easily done in shell.

6 Comments

Filed under Uncategorized

Still looking for good workflow management

whenjobs [previous posting] was an attempt to implement workflow management for libguestfs builds initially, then for other stuff. It’s not worked out too well, so time to start thinking about what to replace it with.

If you concentrate on the narrow task of how to build, test and deploy libguestfs, then it could be broken down into goals (here written in pseudocode):

goal (the tarball is available on the website)
goal (the website has been updated)

and rules for how to fulfil those goals:

rule (the tarball is available on the website) :=
  requires (the tarball has been tested);
  rsync tarball ssh://website.example.com/

rule (the tarball has been tested) :=
  rule (the tarball has been built);
  tar zxf tarball;
  cd libguestfs-*;
  ./configure;
  make;
  make check;
  set_flag (the tarball has been tested)

So far so similar to make. But I tried to encode something like this into (GNU) make and didn’t get too far.

One problem is that make can only test the existence of local files. It can’t test any condition more complex than “local file exists” or “local file is newer than other local file”. Which is why people use “stamp” files frequently, although even those have limits.

Another problem is that make patterns are limited to basically file extensions. ie. You can write:

%.o: %.c

but not a lot else. I also tried Peter Miller’s cook and although it’s certainly cleaner than make, it too is really about doing local software builds.

What it’d be nice to write is something like this:

let guestfs_version = ` git describe --abbrev=0 --tags --match '1.*' `

let make_url v =
  sprintf "http://libguestfs.org/download/1.23-development/%s.tar.gz" v

let http_exists url =
  // some function that checks URL exists

goal (http_exists (make_url guestfs_version))

rule (http_exists url when prefixed url "http://libguestfs.org/download/") (
  let v = get_version_from_url url in
  require (tested v);
  rsync tarball ssh://website.example.com/
)

1 Comment

Filed under Uncategorized