Docker is written in Go. virt-builder is written in OCaml. Why? (Or as someone at work asked me — apparently seriously — why did you write it in a language which only you can use?)
Virt-builder is a fairly thin wrapper around libguestfs and libguestfs has bindings for a dozen languages, and I’m pretty handy in most programming languages, so it could have been done in Python or C or even Go. In this case there are reasons why OCaml is a much better choice:
- It’s a language I’m familiar with and happy programming in. Never underestimate how much that matters.
- OCaml is strongly typed, helping to eliminate many errors. If it had been written in Python we’d be running into bugs at customer sites that could have been eliminated by the compiler before anything shipped. That doesn’t mean virt-builder is bug free, but if the compiler can help to remove a bug, why not have the compiler do that?
- Virt-builder has to be fast, and OCaml is fast. Python is fucking slow.
- I had some C code for doing parallel xzcat and with OCaml I can just link the C code and the OCaml code together directly into a single native binary. Literally you just mix C object files and OCaml object files together on the linker command line. Doing this in, say, Perl/Python/Ruby would be far more hassle. We would have ended up with either a slow interpreted implementation, or having to ship a separate .so file and have the virt-builder program find it and dynamically load it. Ugh.
- There was a little bit of common code used by another utility called virt-sysprep which started out as a shell script but is now also written in OCaml. Virt-sysprep regularly gets outside contributions, despite being written in OCaml. I could have written the small amount of common code in C to get around this, but every little helps.
Is OCaml a language that only I can understand? Judge for yourself by looking at the source code. I think if you cannot understand that enough to at least make small changes, you should hand in your programmer’s card at the door.
7 responses to “Why is virt-builder written in OCaml?”
Frankly, I was just about to start working on a patch for virt-builder, then I realized it’s written in OCaml and tried to read some tutorials about that. Language is beautiful and I totally understand your reasoning. I like the idea it is fast, statically linked and everything.
After half an hour I just gave up. I just wanted to send small patch, not start learning new language. Now OCaml is in my TODO, because this is kind of language that is not that obvious. My message is – it is limiting. I see you have contributors which is cool, but do you really know how many contributors maybe gave up?
Anyway, I am interested what is your opinion about this: What would be your 2nd and 3rd language for virt-builder (and others) if there is no OCaml?
It’s really not too hard, and if you get an error message post it and we’ll see what we can do.
Second would be C. Not sure what the third would be. I would be tempted to say Perl, since many earlier libguestfs tools started out in Perl, but we’re trying to move away from it for various reasons (not necessarily to do with Perl itself).
In my experience most of the effort in contributing a small patch involves grepping the git log effectively, finding a similar existing feature or patch. The exact syntax is the same as the one two lines before, once you’ve found the right place. I’ve contributed to libguestfs, but also to projects in languages I was completely unfamiliar with.
Frankly, I must say that’s a pretty weird way to reason;
1. When contributing a piece of code to a system, as part of a team, choice of programming language shouldn’t be and usually isn’t a matter of individual personal preference. For many good reasons;
– Runtime complexity (footprint, number of dependencies, etc.)
2. I’m also a fan of “strong” typing (http://en.wikipedia.org/wiki/Strong_typing), but I highly doubt that it would catch many errors in this particular application, primarily because of the simple fact that it doesn’t exploit strong typing. The API’s it’s using are not strongly typed, so the application ends up dealing mostly with strings, and the occasional integer anyway.
3. The majority of the processing time is spent outside virt-builder (i.e. inside the libraries it’s using), so I fail to understand how the choice of language for virt-builder would have any meaningful performance impact from a user experience point-of-view. Hence, I don’t understand your statement that it “has to be fast”.
Although I’m not a huge fan of Python, the claim that it’s “fucking slow” is just stupid. Maybe if you were doing high performance computing, interrupt processing or something, but not in this kind of context.
4. That is indeed one of the benefits of static compilation. In this particular case, however, the problem could have been solved in other ways though. Such as simply firing up a bunch of xz commands in parallel (obvious solution for a shell script), or simply rewriting the pxzcat wrapper in the target language.
5. While this indeed proves that you’re not the /only/ one who can use OCaml – looking at the history tells us we’re talking about a whopping 5 contributors here, of which 3 only contributed minor changes.
I’d say it would just as well support the view that OCaml is a bad language of choice if you’re looking for a broad base of contributors. After all, it is a big difference between understanding what the code does, and being able to make quality contributions to the code.
For this kind of wrapper/glue stuff, I’d stick to a shell script. Second choice C, though it’s quite obvious that it doesn’t really cut it any more as a standard language for implementing system tools.
Having said that, I do find OCaml an interesting alternative. Just not for those reasons and perhaps not for these kind of applications.
On (1) it doesn’t add any problems with consistency, maintainability or runtime complexity because libguestfs already has virt tools written in OCaml.
(2) While the guestfs API does use strings for simplicity, that’s hardly the whole story here. Internal strong typing of the program itself matters a great deal. Look at the use of
modefor one (of many) examples. Why would you want to introduce the possibility of errors, when the compiler can check for these errors and get rid of them?
(3) True, to some extent. But the ability to call C (for pxzcat) directly from OCaml, and easily, has made virt-builder much faster.
(4) Except no, that doesn’t work.
(5) Compare this to the number of contributors who write C extensions to libguestfs and I think you’ll find it’s of the same order.
On (3), Python is good at systems programming thanks to CFFI. It has some rough edges wrt distribution (the default model is compile on demand), but otherwise it’s impressively simple and elegant.
I really don’t like solving errors at customer sites which could easily be detected by the compiler.