Tag Archives: ocaml

Writing a Planner to solve a tricky programming optimization problem

Suppose a monkey is in one corner of a room, a box is in another corner of a room, and a banana is hanging from the ceiling in the middle of the room. The monkey can’t reach the banana without standing on the box, but he first has to move the box under the banana. The problem of how to get a computer to work out that the monkey has to move the box first, then climb on the box second, was solved by Nils Nilsson’s STRIPS system in 1971. STRIPS is now an A.I. standard, and is used in game A.I. and elsewhere.

Suppose you have a disk image template that you want to uncompress, convert to another format, and resize. You can run xzcat, followed by qemu-img convert followed by virt-resize. But virt-resize can also do format conversion, so you don’t need to run qemu-img convert. Unless the user was happy with the original size, in which case qemu-img convert is faster than virt-resize. But what if the original template isn’t compressed and is already in the right format and size? You can just run cp.

How can a computer work out the right sequence of steps to convert the disk image most efficiently? Virt-builder has exactly this problem, and it solves it using a STRIPS-inspired planner.

The STRIPS planner in virt-builder is only 50 lines of code, was easy to write, finds the near optimal plan for almost any user input, and is a useful technique that can be applied to many programming problems. This article will explain how it works. I have changed some of the academic terms and simplified things to make this easier to understand.

First of all I’ll introduce tags on the original template. These define the state of that template:

Input tags: ✚xz ✚template ✚size=4G ✚format=raw

Secondly I’ll set up my goal state:

Goal tags: ❌xz ❌template ✚size=4G ✚format=qcow2

where means the tag MUST NOT exist in the final state.

I want my planner to find me the best path from my input state to my goal. As it can’t go straight from the input to the goal in one step, I have to tell the planner what transitions are possible, using a function:

transitions (input_tags) {
  if ✚xz then {
    you could run 'xzcat'
        which will ❌xz and ❌template;
  else /* no xz tag */ {
    you could run 'virt-resize'
       which will change ✚format and ✚size, and ❌template;
    you could run 'qemu-img convert'
       which will change ✚format, and ❌template;

  you could run 'cp'
      which will ❌template;

Notice that the transitions function returns a list of all possible transitions from the input state. It’s not judgemental about which one should be taken, although it won’t return impossible transitions (for example, running virt-resize is not possible on xz-compressed files). The actual transitions function also returns a weight for each transition, so that the planner can choose the least expensive plan if there are several plans possible.

The ✚template tag may appear a bit mysterious. It’s there to make sure that the planner always copies the original template, even if the original template already has the desired goal size and format. Since xzcat, virt-resize and qemu-img convert always copy the disk image, they drop the template tag (❌template).

The transitions function in virt-builder can be found here.

The planner does a breadth-first search over the tree of transitions, starting with the input state, finishing when it finds any branch that satisfies the output goals, or when it reaches a maximum depth in which case it gives up (and the user sees an error message).

The planner in virt-builder (50 lines of code) can be found here.

If the planner finds several paths that satisfy the goals, the planner chooses the one with the smallest weight. However my planner is not clever enough to look deeper in the tree to see if a longer path might have a smaller weight (it’s not very likely in virt-builder).

Also my planner is not smart enough to prune bogus paths. For example, if a path runs cp in adjacent steps, then that path should be pruned.

Nevertheless the planner always gets the right result, and it is considerably simpler than the original hand-written code. The old code had become unmaintainable and wasn’t even efficient: it sometimes made unnecessary copies in order to make the code simpler, wasting end-user time. Because of the ease of maintenance I was able to add new functionality: virt-builder can now run qemu-img resize to expand a disk by < 256 MB, a case where virt-resize doesn’t work (previously the user would have got an error).

Applying old academic techniques like this one doesn’t need to be hard and can help with real world problems. I hope this technique helps others with similar optimization problems.

Edit: The Hacker News discussion includes links to alternative solving tools.


Filed under Uncategorized

Why is virt-builder written in OCaml?

Docker is written in Go. virt-builder is written in OCaml. Why? (Or as someone at work asked me — apparently seriously — why did you write it in a language which only you can use?)

Virt-builder is a fairly thin wrapper around libguestfs and libguestfs has bindings for a dozen languages, and I’m pretty handy in most programming languages, so it could have been done in Python or C or even Go. In this case there are reasons why OCaml is a much better choice:

  • It’s a language I’m familiar with and happy programming in. Never underestimate how much that matters.
  • OCaml is strongly typed, helping to eliminate many errors. If it had been written in Python we’d be running into bugs at customer sites that could have been eliminated by the compiler before anything shipped. That doesn’t mean virt-builder is bug free, but if the compiler can help to remove a bug, why not have the compiler do that?
  • Virt-builder has to be fast, and OCaml is fast. Python is fucking slow.
  • I had some C code for doing parallel xzcat and with OCaml I can just link the C code and the OCaml code together directly into a single native binary. Literally you just mix C object files and OCaml object files together on the linker command line. Doing this in, say, Perl/Python/Ruby would be far more hassle. We would have ended up with either a slow interpreted implementation, or having to ship a separate .so file and have the virt-builder program find it and dynamically load it. Ugh.
  • There was a little bit of common code used by another utility called virt-sysprep which started out as a shell script but is now also written in OCaml. Virt-sysprep regularly gets outside contributions, despite being written in OCaml. I could have written the small amount of common code in C to get around this, but every little helps.

Is OCaml a language that only I can understand? Judge for yourself by looking at the source code. I think if you cannot understand that enough to at least make small changes, you should hand in your programmer’s card at the door.

Edit: Hacker News discussion of this article.


Filed under Uncategorized

Goaljobs, part 4

In part 3 I described how to write targets which can access network resources, and how to use memoization to make them run fast. In this (last) part of the series, I’ll describe the final feature of goaljobs — periodic jobs.

If you wanted to use make to monitor a git repository and do a build when a new commit appears there would I guess be three choices: You could just run the make command manually over and over again. You could have a git hook that runs make. Or you have a cron job the periodically checks the git repository.

The git hook is the ideal solution for goaljobs too, but goaljobs also has cron-like periodic jobs built in, and they are very easy to use:

every 30 minutes (fun () ->
  let commit =
    shout "cd %s && git rev-parse HEAD" repo in
  require (git_commit_tested commit)

every 30 minutes is self-explanatory (right?). The function that runs every half-an-hour is two lines of code. The first line uses shout to run a shell command and capture the output. In this case git prints the current commit. The second command requires that the git_commit_tested goal is reached for this commit.

One way to implement this goal would be:

let goal git_commit_tested commit =
  let key = sprintf "repo-tested-%s" commit in
  target (memory_exists key);

  sh "
      git clone %s test
      cd test
      make check
  " repo_url;

  memory_set key "1"

This code clones the repository and runs make check to test it. It uses the Memory (ie. memoization) to ensure that the tests are run at most once per commit.

Actually this is not quite true: the tests run successfully once, but if the test fails, it will keep running every 30 minutes and nag you about it. It’s trivial to change the memoization to remember failures as well as successes, or you could consider the repeated nagging to be a feature not a bug …

That’s it! The goaljobs website will be this (I’ve not uploaded it yet, but will do in the next day or two):


You can also download the code from the git repository and the goals I’ve written from this repository.

Leave a comment

Filed under Uncategorized

Goaljobs, part 3

In part 2 I introduced an example goaljobs script that can rebuild a set of packages in Fedora in the right order.

It’s time to take a closer look at targets — the promise that you make that some condition will be true by the time a goal has run.

In the Fedora rebuild script the goal targets looked like this:

let goal rebuilt pkg =
  target (koji_build_state (fedora_verrel pkg branch)
               == `Complete);

koji_build_state is a regular function. It’s implemented using the koji buildinfo command line tool for querying the Koji build system. (The koji command line tool is annoyingly hard to automate, but as we’ve got a complete programming language available — not just bash — the implementation of koji_build_state is tedious and long, but doable).

Querying Koji takes a few seconds and we don’t want to do it every time we check a goal. Goaljobs offers a feature called “The Memory” which lets you memoize functions. “The Memory” is just a fancy name for a key/value store which is kept in ~/.goaljobs-memory and persists across goaljobs sessions:

let koji_build_state verrel =
  let key = sprintf "koji_build_complete_%s" verrel in
  if memory_exists key then
  else (
    (* tedious code to query koji *)
    if state == `Complete then
      memory_set key "1";

With strategic use of memoization, evaluating goaljobs goals can be very fast and doesn’t change the fundamental contract of targets.

Finally in this part: a note on how targets are implemented.

A target is a boolean expression which is evaluated once near the beginning of the goal. If it evaluates to true at the beginning of the goal then the rest of the goal can be skipped because the goal has already been achieved / doesn’t need to be repeated.

And since targets are just general expressions, they can be anything at all, from accessing a remote server (as here) to checking the existence of a local file (like make). As long as something can be tested quickly, or can be tested slowly and memoized, it’s suitable to be a target.

1 Comment

Filed under Uncategorized

Goaljobs, part 2

In part 1 I showed how a simple make rule could be converted to a special “goal” function and I hinted that we were not limited to just the “file is older than” semantics implicit in make.

So let’s have a look at the goals I wrote to automate the recent OCaml rebuild in Fedora.

Recall from part 1: Targets are a contractual promise that you make in goaljobs. They are a promise that some condition will be true after running the goal. Requirements are conditions that must be true before the goal can start running.

For a Fedora package to achieve the goal of being rebuilt, the target is that the Koji build state of the current release must be “Completed”. The requirements are that every dependency of the package has been rebuilt. So:

let goal rebuilt pkg =
  target (koji_build_state (fedora_verrel pkg branch)
               == `Complete);

  (* Require the rebuild to have started: *)
  require (rebuild_started pkg);

  ... some code to wait for the build to finish ...

The above code is not complete (it’s a complex, real-world working example after all).

I split the rebuilt goal into two separate goals for reasons that will become clear later. But the first goal above says that the package rebuild must have been started off, and we’ll wait for the package build to complete.

Note that once the build is complete, the target promise is true.

The subgoal rebuild_started is defined like this:

let goal rebuild_started pkg =
  (* The dependencies of this package: *)
  let deps = List.assoc pkg pkg_deps in

  target (
     match koji_build_state (fedora_verrel pkg branch) with
          | `Building | `Complete -> true
          | _ -> false

  (* All dependent packages must have been fully rebuilt: *)
  List.iter (fun dep -> require (rebuilt dep)) deps;

  (* Rebuild the package in Koji. *)
  koji_build pkg branch

It’s saying that the target (promise) will be that the Koji package will either be building or may even be complete. And that we first of all require that every build dependency of this package has been completely, successfully rebuilt. If those requirements are met, we tell Koji to start building the package (but in this goal we don’t need to wait for it to complete).

Why did I split the goal into two parts?

The reason is that I want to define a make-like all goal:

let goal all () =
  List.iter (fun pkg -> require (rebuild_started pkg))

This iterates over all my source packages and starts rebuilding them.

Note it doesn’t wait for each one to be rebuilt … unless they are required as dependencies of another package, in which case the require (rebuilt dep) will kick in and wait for them before rebuilding the dependent package.

In other words, this code automatically resolves dependencies, waiting where necessary, but otherwise just kicking off builds, which is exactly what I wanted.

Finally a bit about how you use a goaljobs script. Unlike make you have to compile the script into a binary. To compile the script, use the convenient wrapper goaljobs (it’s a simple shell script that invokes the OCaml compiler):

goaljobs fedora_ocaml_rebuild.ml

This makes a binary called fedora_ocaml_rebuild which is the program for mass-rebuilding the whole of Fedora’s OCaml subset.

When you run it with no arguments, it searches for a goal called all and “requires” that goal (just like make).

You can also run other goals directly. Any goal which is “published” can be run from the command line. All goals that have no parameters — such as all — are published automatically.

For goals that take parameters, if you want to use them from the command line you have to publish them manually. The reason is that you have to provide a small code snippet to convert the command line parameters to goal parameters, which may involve type conversion or other checks (since OCaml is strongly typed and parameters can be any type, not just strings or filenames).

1 Comment

Filed under Uncategorized

Goaljobs, part 1

A little more than a year ago I released whenjobs which was an attempt to create a practical language for automating complex “business rules”. The kind of thing I’m talking about is managing the many diverse steps between me tagging a libguestfs commit with a version number and a fully tested tarball appearing on the website. Or the hundreds of steps that go into 100 OCaml packages being updated and rebuilt for Rawhide.

Whenjobs wasn’t the right answer. Goaljobs [very early alpha] might possibly be.

What I need is something which is flexible, can deal with failures (both hard and intermittent), and can be killed and restarted at any point.

The first observation is that make is nearly the right tool. It’s goal-based, meaning that you set down a target that you want to have happen, and some rules to make that happen, and this lets you break down a problem from the larger goal (“build my program!”) to smaller subgoals (“compile this source file”).

program: main.o utils.o
  cc $^ -o $@

The goal is “program is built”. There are some requirements (main.o, utils.o), and there’s a recipe (run cc). You can also kill make in the middle and restart it, and it’ll usually continue from where it left off.

Make also lets you parameterize goals, although only in very simple ways:

%.o: %.c
  cc -c $< -o $@

Implicit in the “:” (colon) character is make’s one simple rule, which is roughly this: “if the target file doesn’t exist, or the prerequisite files are newer than the target, run the recipe below”.

In fact you could translate the first make rule into an ordinary function which would look something like this:

function build_program ()
  if (!file_exists ("program") ||
      file_older ("program", "main.o") ||
      file_older ("program", "utils.o")) {
    shell ("cc -c %s -o %s", "main.o utils.o",

Some points arise here:

  • Why can’t we change the target test to something other than “file exists or is newer”?
    How about “remote URL exists” (and if not, we need to upload a file)?
    How about “Koji build completed successfully” (and if not we need to do a Fedora build)?
  • What could happen if we could add parameters to build_program?

Goaljobs attempts to answer these questions by turning make-style rules into “goals”, where goals are specialized functions similar to the one above that have a target, requirement(s), a recipe to implement them, and any number of parameters.

For example, a “compile *.c to *.o” goal looks like this:

let goal compiled c_file =
  (* convert c_file "foo.c" -> "foo.o": *)
  let o_file = change_file_extension "o" c_file in

  target (more_recent [o_file] [c_file]);

  sh "
    cd $builddir
    cc -c %s -o %s
  " c_file o_file

The goal is called compiled and it has exactly one parameter, the name of the C source file that must be compiled.

The target is a promise that after the recipe has been run the *.o file will be more recent than the *.c file. The target is both a check used to skip the rule if it’s already true, but also a contractual promise that the developer makes (and which is checked by goaljobs) that some condition holds true at the end of the goal.

sh is a lightweight way to run a shell script fragment, with printf-like semantics.

And the whole thing is wrapped in a proper programming language (preprocessed OCaml) so you can do things which are more complicated than are easily done in shell.


Filed under Uncategorized

OCaml 4.01.0 entering Rawhide

After using OCaml for around 10 years it is still my favourite language, and it’s amazing how far ahead of other programming languages it remains to this day.

OCaml 4.01.0 was released on Thursday and I’m putting it into Fedora Rawhide over this weekend.

Debuginfo is now (partially) enabled. The OCaml code generator has produced good quality DWARF information for a while, and now you are able to debug OCaml programs in gdb under Fedora:

$ sudo debuginfo-install ocaml ocaml-findlib
$ gdb /usr/bin/ocamlfind
Reading symbols from /usr/bin/ocamlfind...
Reading symbols from /usr/lib/debug/usr/bin/ocamlfind.debug...done.
(gdb) break frontend.ml:469
Breakpoint 1 at 0x432500: file frontend.ml, line 469.
(gdb) run query findlib -l
Starting program: /usr/bin/ocamlfind query findlib -l

Breakpoint 1, camlFrontend__query_package_1199 () at frontend.ml:469
469	let query_package () =
(gdb) bt
#0  camlFrontend__query_package_1199 () at frontend.ml:469
#1  0x000000000043a4b4 in camlFrontend__main_1670 () at frontend.ml:2231
#2  0x000000000043aa86 in camlFrontend__entry () at frontend.ml:2283
#3  0x000000000042adc9 in caml_program ()
#4  0x00000000004834be in caml_start_program ()
#5  0x000000000048365d in __libc_csu_init ()
#6  0x0000003979821b75 in __libc_start_main (main=0x42aa60 <main>, argc=4, 
    ubp_av=0x7fffffffde38, init=<optimized out>, fini=<optimized out>, 
    rtld_fini=<optimized out>, stack_end=0x7fffffffde28) at libc-start.c:258
#7  0x000000000042aaa9 in _start ()
(gdb) list
464	;;
467	(************************** QUERY SUBCOMMAND ***************************)
469	let query_package () =
471	  let long_format =
472	    "package:     %p\ndescription: %D\nversion:     %v\narchive(s):  %A\nlinkopts:    %O\nlocation:    %d\n" in
473	  let i_format =

GDB only understands location data at the moment, so you can’t yet query variables (although I understand OCaml generates the correct DWARF info for this, GDB just doesn’t know how to print OCaml expressions).

There will also be some limitations on the debuginfo built at first. At the moment it doesn’t include debuginfo for OCaml libraries called from an OCaml program, because of problems that need to be worked out with the toolchain. Mixed OCaml binary / C library debuginfo does work.


Filed under Uncategorized