Tag Archives: ocaml

PG’OCaml 2.0 has been released

PG’OCaml is a type-safe macro binding to PostgreSQL from OCaml that I wrote many moons ago.

You can write code like:

let hostid = 33 in
let name = "john.smith" in
let rows = PGSQL(dbh)
    "select id, subject from contacts
     where hostid = $hostid and name = $name"

and the compiler checks (at compile time) that hostid and name have the correct types in the program to match the database schema. And it’ll ensure that the type of rows is something like (int * string) list, and integrate that with type inference in the rest of the program.

The program won’t compile if you use the wrong types. It integrates OCaml’s type safety and type inference with the PostgreSQL database engine.

It also avoids SQL injection by automatically creating a safe prepared statement. What is executed when the program runs will have: ... where hostid = ? and name = ?.

As a side-effect of the type checking, it also verifies that the SQL code is syntactically correct.



Filed under Uncategorized

Fedora 21 has a working OCaml ARM64

Update: Thanks to Peter Robinson, there is now a build of OCaml for aarch64 in the Fedora repository.

I have backported the upstream ARM64 support into Fedora 21’s OCaml, so you can now use it to generate native ARM64/AArch64 binaries. If you don’t have hardware, use qemu to emulate it instead.


Filed under Uncategorized

Writing a Planner to solve a tricky programming optimization problem

Suppose a monkey is in one corner of a room, a box is in another corner of a room, and a banana is hanging from the ceiling in the middle of the room. The monkey can’t reach the banana without standing on the box, but he first has to move the box under the banana. The problem of how to get a computer to work out that the monkey has to move the box first, then climb on the box second, was solved by Nils Nilsson’s STRIPS system in 1971. STRIPS is now an A.I. standard, and is used in game A.I. and elsewhere.

Suppose you have a disk image template that you want to uncompress, convert to another format, and resize. You can run xzcat, followed by qemu-img convert followed by virt-resize. But virt-resize can also do format conversion, so you don’t need to run qemu-img convert. Unless the user was happy with the original size, in which case qemu-img convert is faster than virt-resize. But what if the original template isn’t compressed and is already in the right format and size? You can just run cp.

How can a computer work out the right sequence of steps to convert the disk image most efficiently? Virt-builder has exactly this problem, and it solves it using a STRIPS-inspired planner.

The STRIPS planner in virt-builder is only 50 lines of code, was easy to write, finds the near optimal plan for almost any user input, and is a useful technique that can be applied to many programming problems. This article will explain how it works. I have changed some of the academic terms and simplified things to make this easier to understand.

First of all I’ll introduce tags on the original template. These define the state of that template:

Input tags: ✚xz ✚template ✚size=4G ✚format=raw

Secondly I’ll set up my goal state:

Goal tags: ❌xz ❌template ✚size=4G ✚format=qcow2

where means the tag MUST NOT exist in the final state.

I want my planner to find me the best path from my input state to my goal. As it can’t go straight from the input to the goal in one step, I have to tell the planner what transitions are possible, using a function:

transitions (input_tags) {
  if ✚xz then {
    you could run 'xzcat'
        which will ❌xz and ❌template;
  else /* no xz tag */ {
    you could run 'virt-resize'
       which will change ✚format and ✚size, and ❌template;
    you could run 'qemu-img convert'
       which will change ✚format, and ❌template;

  you could run 'cp'
      which will ❌template;

Notice that the transitions function returns a list of all possible transitions from the input state. It’s not judgemental about which one should be taken, although it won’t return impossible transitions (for example, running virt-resize is not possible on xz-compressed files). The actual transitions function also returns a weight for each transition, so that the planner can choose the least expensive plan if there are several plans possible.

The ✚template tag may appear a bit mysterious. It’s there to make sure that the planner always copies the original template, even if the original template already has the desired goal size and format. Since xzcat, virt-resize and qemu-img convert always copy the disk image, they drop the template tag (❌template).

The transitions function in virt-builder can be found here.

The planner does a breadth-first search over the tree of transitions, starting with the input state, finishing when it finds any branch that satisfies the output goals, or when it reaches a maximum depth in which case it gives up (and the user sees an error message).

The planner in virt-builder (50 lines of code) can be found here.

If the planner finds several paths that satisfy the goals, the planner chooses the one with the smallest weight. However my planner is not clever enough to look deeper in the tree to see if a longer path might have a smaller weight (it’s not very likely in virt-builder).

Also my planner is not smart enough to prune bogus paths. For example, if a path runs cp in adjacent steps, then that path should be pruned.

Nevertheless the planner always gets the right result, and it is considerably simpler than the original hand-written code. The old code had become unmaintainable and wasn’t even efficient: it sometimes made unnecessary copies in order to make the code simpler, wasting end-user time. Because of the ease of maintenance I was able to add new functionality: virt-builder can now run qemu-img resize to expand a disk by < 256 MB, a case where virt-resize doesn’t work (previously the user would have got an error).

Applying old academic techniques like this one doesn’t need to be hard and can help with real world problems. I hope this technique helps others with similar optimization problems.

Edit: The Hacker News discussion includes links to alternative solving tools.


Filed under Uncategorized

Why is virt-builder written in OCaml?

Docker is written in Go. virt-builder is written in OCaml. Why? (Or as someone at work asked me — apparently seriously — why did you write it in a language which only you can use?)

Virt-builder is a fairly thin wrapper around libguestfs and libguestfs has bindings for a dozen languages, and I’m pretty handy in most programming languages, so it could have been done in Python or C or even Go. In this case there are reasons why OCaml is a much better choice:

  • It’s a language I’m familiar with and happy programming in. Never underestimate how much that matters.
  • OCaml is strongly typed, helping to eliminate many errors. If it had been written in Python we’d be running into bugs at customer sites that could have been eliminated by the compiler before anything shipped. That doesn’t mean virt-builder is bug free, but if the compiler can help to remove a bug, why not have the compiler do that?
  • Virt-builder has to be fast, and OCaml is fast. Python is fucking slow.
  • I had some C code for doing parallel xzcat and with OCaml I can just link the C code and the OCaml code together directly into a single native binary. Literally you just mix C object files and OCaml object files together on the linker command line. Doing this in, say, Perl/Python/Ruby would be far more hassle. We would have ended up with either a slow interpreted implementation, or having to ship a separate .so file and have the virt-builder program find it and dynamically load it. Ugh.
  • There was a little bit of common code used by another utility called virt-sysprep which started out as a shell script but is now also written in OCaml. Virt-sysprep regularly gets outside contributions, despite being written in OCaml. I could have written the small amount of common code in C to get around this, but every little helps.

Is OCaml a language that only I can understand? Judge for yourself by looking at the source code. I think if you cannot understand that enough to at least make small changes, you should hand in your programmer’s card at the door.

Edit: Hacker News discussion of this article.


Filed under Uncategorized

Goaljobs, part 4

In part 3 I described how to write targets which can access network resources, and how to use memoization to make them run fast. In this (last) part of the series, I’ll describe the final feature of goaljobs — periodic jobs.

If you wanted to use make to monitor a git repository and do a build when a new commit appears there would I guess be three choices: You could just run the make command manually over and over again. You could have a git hook that runs make. Or you have a cron job the periodically checks the git repository.

The git hook is the ideal solution for goaljobs too, but goaljobs also has cron-like periodic jobs built in, and they are very easy to use:

every 30 minutes (fun () ->
  let commit =
    shout "cd %s && git rev-parse HEAD" repo in
  require (git_commit_tested commit)

every 30 minutes is self-explanatory (right?). The function that runs every half-an-hour is two lines of code. The first line uses shout to run a shell command and capture the output. In this case git prints the current commit. The second command requires that the git_commit_tested goal is reached for this commit.

One way to implement this goal would be:

let goal git_commit_tested commit =
  let key = sprintf "repo-tested-%s" commit in
  target (memory_exists key);

  sh "
      git clone %s test
      cd test
      make check
  " repo_url;

  memory_set key "1"

This code clones the repository and runs make check to test it. It uses the Memory (ie. memoization) to ensure that the tests are run at most once per commit.

Actually this is not quite true: the tests run successfully once, but if the test fails, it will keep running every 30 minutes and nag you about it. It’s trivial to change the memoization to remember failures as well as successes, or you could consider the repeated nagging to be a feature not a bug …

That’s it! The goaljobs website will be this (I’ve not uploaded it yet, but will do in the next day or two):


You can also download the code from the git repository and the goals I’ve written from this repository.

Leave a comment

Filed under Uncategorized

Goaljobs, part 3

In part 2 I introduced an example goaljobs script that can rebuild a set of packages in Fedora in the right order.

It’s time to take a closer look at targets — the promise that you make that some condition will be true by the time a goal has run.

In the Fedora rebuild script the goal targets looked like this:

let goal rebuilt pkg =
  target (koji_build_state (fedora_verrel pkg branch)
               == `Complete);

koji_build_state is a regular function. It’s implemented using the koji buildinfo command line tool for querying the Koji build system. (The koji command line tool is annoyingly hard to automate, but as we’ve got a complete programming language available — not just bash — the implementation of koji_build_state is tedious and long, but doable).

Querying Koji takes a few seconds and we don’t want to do it every time we check a goal. Goaljobs offers a feature called “The Memory” which lets you memoize functions. “The Memory” is just a fancy name for a key/value store which is kept in ~/.goaljobs-memory and persists across goaljobs sessions:

let koji_build_state verrel =
  let key = sprintf "koji_build_complete_%s" verrel in
  if memory_exists key then
  else (
    (* tedious code to query koji *)
    if state == `Complete then
      memory_set key "1";

With strategic use of memoization, evaluating goaljobs goals can be very fast and doesn’t change the fundamental contract of targets.

Finally in this part: a note on how targets are implemented.

A target is a boolean expression which is evaluated once near the beginning of the goal. If it evaluates to true at the beginning of the goal then the rest of the goal can be skipped because the goal has already been achieved / doesn’t need to be repeated.

And since targets are just general expressions, they can be anything at all, from accessing a remote server (as here) to checking the existence of a local file (like make). As long as something can be tested quickly, or can be tested slowly and memoized, it’s suitable to be a target.

1 Comment

Filed under Uncategorized

Goaljobs, part 2

In part 1 I showed how a simple make rule could be converted to a special “goal” function and I hinted that we were not limited to just the “file is older than” semantics implicit in make.

So let’s have a look at the goals I wrote to automate the recent OCaml rebuild in Fedora.

Recall from part 1: Targets are a contractual promise that you make in goaljobs. They are a promise that some condition will be true after running the goal. Requirements are conditions that must be true before the goal can start running.

For a Fedora package to achieve the goal of being rebuilt, the target is that the Koji build state of the current release must be “Completed”. The requirements are that every dependency of the package has been rebuilt. So:

let goal rebuilt pkg =
  target (koji_build_state (fedora_verrel pkg branch)
               == `Complete);

  (* Require the rebuild to have started: *)
  require (rebuild_started pkg);

  ... some code to wait for the build to finish ...

The above code is not complete (it’s a complex, real-world working example after all).

I split the rebuilt goal into two separate goals for reasons that will become clear later. But the first goal above says that the package rebuild must have been started off, and we’ll wait for the package build to complete.

Note that once the build is complete, the target promise is true.

The subgoal rebuild_started is defined like this:

let goal rebuild_started pkg =
  (* The dependencies of this package: *)
  let deps = List.assoc pkg pkg_deps in

  target (
     match koji_build_state (fedora_verrel pkg branch) with
          | `Building | `Complete -> true
          | _ -> false

  (* All dependent packages must have been fully rebuilt: *)
  List.iter (fun dep -> require (rebuilt dep)) deps;

  (* Rebuild the package in Koji. *)
  koji_build pkg branch

It’s saying that the target (promise) will be that the Koji package will either be building or may even be complete. And that we first of all require that every build dependency of this package has been completely, successfully rebuilt. If those requirements are met, we tell Koji to start building the package (but in this goal we don’t need to wait for it to complete).

Why did I split the goal into two parts?

The reason is that I want to define a make-like all goal:

let goal all () =
  List.iter (fun pkg -> require (rebuild_started pkg))

This iterates over all my source packages and starts rebuilding them.

Note it doesn’t wait for each one to be rebuilt … unless they are required as dependencies of another package, in which case the require (rebuilt dep) will kick in and wait for them before rebuilding the dependent package.

In other words, this code automatically resolves dependencies, waiting where necessary, but otherwise just kicking off builds, which is exactly what I wanted.

Finally a bit about how you use a goaljobs script. Unlike make you have to compile the script into a binary. To compile the script, use the convenient wrapper goaljobs (it’s a simple shell script that invokes the OCaml compiler):

goaljobs fedora_ocaml_rebuild.ml

This makes a binary called fedora_ocaml_rebuild which is the program for mass-rebuilding the whole of Fedora’s OCaml subset.

When you run it with no arguments, it searches for a goal called all and “requires” that goal (just like make).

You can also run other goals directly. Any goal which is “published” can be run from the command line. All goals that have no parameters — such as all — are published automatically.

For goals that take parameters, if you want to use them from the command line you have to publish them manually. The reason is that you have to provide a small code snippet to convert the command line parameters to goal parameters, which may involve type conversion or other checks (since OCaml is strongly typed and parameters can be any type, not just strings or filenames).

1 Comment

Filed under Uncategorized