We’ve been using this patch in Fedora since Nov 2016.
Tag Archives: ocaml
libguestfs is a C library for creating and editing disk images. In the most common (but not the only) configuration, it uses KVM to sandbox access to disk images. The C library talks to a separate daemon running inside a KVM appliance, as in this Unicode-art diagram taken from the fine manual:
┌───────────────────┐ │ main program │ │ │ │ │ child process / appliance │ │ ┌──────────────────────────┐ │ │ │ qemu │ ├───────────────────┤ RPC │ ┌─────────────────┐ │ │ libguestfs ◀╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍▶ guestfsd │ │ │ │ │ ├─────────────────┤ │ └───────────────────┘ │ │ Linux kernel │ │ │ └────────┬────────┘ │ └───────────────│──────────┘ │ │ virtio-scsi ┌──────┴──────┐ │ Device or │ │ disk image │ └─────────────┘
The library has to be written in C because it needs to be linked to any main program. The daemon (
guestfsd in the diagram) is also written in C. But there’s not so much a specific reason for that, except that’s what we did historically.
The daemon is essentially a big pile of functions, most corresponding to a libguestfs API. Writing the daemon in C is painful to say the least. Because it’s a long-running process running in a memory-constrained environment, we have to be very careful about memory management, religiously checking every return from
strdup etc., making even the simplest task non-trivial and full of untested code paths.
So last week I modified libguestfs so you can now write APIs in OCaml if you want to. OCaml is a high level language that compiles down to object files, and it’s entirely possible to link the daemon from a mix of C object files and OCaml object files. Another advantage of OCaml is that you can call from C ↔ OCaml with relatively little glue code (although a disadvantage is that you still need to write that glue mostly by hand). Most simple calls turn into direct CALL instructions with just a simple bitshift required to convert between ints and bools on the C and OCaml sides. More complex calls passing strings and structures are not too difficult either.
OCaml also turns memory errors into a single exception, which unwinds the stack cleanly, so we don’t litter the code with memory handling. We can still run the mixed C/OCaml binary under valgrind.
Code gets quite a bit shorter. For example the case_sensitive_path API — all string handling and directory lookups — goes from 183 lines of C code to 56 lines of OCaml code (and much easier to understand too).
I’m reimplementing a few APIs in OCaml, but the plan is definitely not to convert them all. I think we’ll have C and OCaml APIs in the daemon for a very long time to come.
- New, upstream POWER (ppc64, ppc64le) backend, replacing the downstream one that we have maintained for a few years. I was quite apprehensive about this change because I had tried the new backend during the OCaml 4.03 release cycle and found it to be quite unstable. However the latest version looks rock solid and has no problem compiling the entire Fedora+OCaml software suite.
- New, upstream S/390x backend. I actually found and fixed a bug, go me!
- New, non-upstream RISC-V backend. I found a bug in this backend too, but it proved to be easy to fix. You can now install and run most of the OCaml packages on Fedora/RISC-V.
And talking about Fedora/RISC-V, it took a month, but the mass-rebuild of all Fedora packages completed, and now we’ve got about ⅔rds of all Fedora packages available for RISC-V. That’s quite a lot:
For more half-baked ideas, see the ideas tag.
If you prefer just to see the code, then it’s here.
Chris notes an alternative is a length + string representation, as used in Pascal. Although there are libraries for this in C, there are several drawbacks and approximately no one uses them.
However it’s possible to have the best of both worlds: Strings using an implicit length field that takes up no extra storage. These strings are backwards compatible with ordinary C strings — you can literally pass them to legacy functions or cast them to
char * — yet the equivalent of a strlen operation is O(1).
There are two ideas here: Firstly, when you use the C malloc function, malloc stashes some extra metadata about your allocation, and with most malloc implementations there is a function to obtain the size of the allocation from a pointer. In glibc, the function is called
malloc_usable_size. Note that because of alignment concerns, the amount allocated is usually larger than the amount you originally requested.
The second idea comes from OCaml. OCaml stores strings in a clever internal representation which is both backwards compatible with C (a fancy way to say they are null terminated), and it allows you to get the real length of the string even though OCaml — like C — allocates more than requested for alignment reasons.
So here’s how we do it: When allocating an “implicit length string” (
ilenstr) we store extra data in the final byte of the “full” malloced space, in the byte marked B in the diagram below:
+-------------------------+----+------------+----+ | the string | \0 | .... | B | +-------------------------+----+------------+----+ <----- malloc we requested ----> <----------- malloc actually allocated ---------->
If malloc allocated exactly the same amount of space as is used by our string + terminating null, then B is simply the terminating
+-------------------------+----+ | the string | \0 | +-------------------------+----+
If malloc allocated 1 spare byte, we store B = 1:
+-------------------------+----+----+ | the string | \0 | 1 | +-------------------------+----+----+
If malloc allocated 4 spare bytes, we store B = 4:
+-------------------------+----+----+----+----+----+ | the string | \0 | .... | 4 | +-------------------------+----+----+----+----+----+
Getting the true length of the string is simply a matter of asking malloc for the allocated length (ie. calling
malloc_usable_size), finding the last byte (B) and subtracting it. So we can get the true string length in an O(1) operation (usually, although this may depend on your malloc implementation).
ilenstr strings can contain
\0 characters within the string.
ilenstr strings are also backwards compatible, in that we can pass one to any “legacy” C function, and assuming the string itself doesn’t contain any
\0 inside it, everything just works.
Alright. This is terrible. DO NOT USE IT IN PRODUCTION CODE! It breaks all kinds of standards, is unportable etc. There are security issues with allowing \0-containing strings to be passed to legacy functions. Still, it’s a nice idea. With proper cooperation from libc, standards authorities and so on, it could be made to work.
Here is my git repo:
Of course I could use OpenStack RDO but OpenStack is a vast box of somewhat working bits and pieces. I think for a small cluster like mine you can get the essential functionality of OpenStack a lot more simply — in 1300 lines of code as it turns out.
The first thing that small cluster management software doesn’t need is any permanent daemon running on the nodes. The reason is that we already have sshd (for secure management access) and libvirtd (to manage the guests) out of the box. That’s quite sufficient to manage all the state we care about. My Mini Cloud/Cluster software just goes out and queries each node for that information whenever it needs it (in parallel of course). Nodes that are switched off are handled by ignoring them.
The second thing is that for a small cloud we can toss features that aren’t needed at all: multi-user/multi-tenant, failover, VLANs, a nice GUI.
The old mclu (Mini Cluster) v1.0 was written in Python and used Ansible to query nodes. If you’re not familiar with Ansible, it’s basically parallel ssh on steroids. This was convenient to get the implementation working, but I ended up rewriting this essential feature of Ansible in ~ 60 lines of code.
The huge down-side of Python is that even such a small program has loads of hidden bugs, because there’s no safety at all. The rewrite (in OCaml) is 1,300 lines of code, so a fraction larger, but I have a far higher confidence that it is mostly bug free.
I also changed around the way the software works to make it more “cloud like” (and hence the name change from “Mini Cluster” to “Mini Cloud”). Guests are now created from templates using virt-builder, and are stateless “cattle” (although you can mix in “pets” and mclu will manage those perfectly well because all it’s doing is remote libvirt-over-ssh commands).
$ mclu status ham0 on total: 8pcpus 15.2G used: 8vcpus 8.0G by 2 guest(s) free: 6.2G ham1 on total: 8pcpus 15.2G free: 14.2G ham2 on total: 8pcpus 30.9G free: 29.9G ham3 off
You can grab mclu v2.0 from the git repository.
If you ever used the old version of virt-v2v, our software that converts guests to run on KVM, then you probably found it slow, but worse still it was slow and could fail at the end of the conversion (after possibly an hour or more). No one liked that, least of all the developers and support people who had to help people use it.
A V2V conversion is intrinsically going to take a long time, because it always involves copying huge disk images around. These can be gigabytes or even terabytes in size.
My main aim with the rewrite was to do all the work up front (and if the conversion is going to fail, then fail early), and leave the huge copy to the last step. The second aim was to work much harder to minimize the amount of data that we need to copy, so the copy is quicker. I achieved both of these aims using a lot of new technology that we developed for qemu in RHEL 7.
Virt-v2v works (now) by putting an overlay on top of the source disk. This overlay protects the source disk from being modified. All the writes done to the source disk during conversion (eg. modifying config files and adding device drivers) are saved into the overlay. Then we qemu-img convert the overlay to the final target. Although this sounds simple and possibly obvious, none of this could have been done when we wrote old virt-v2v. It is possible now because:
- qcow2 overlays can now have virtual backing files that come from HTTPS or SSH sources. This allows us to place the overlay on top of (eg) a VMware vCenter Server source without having to copy the whole disk from the source first.
- qcow2 overlays can perform copy-on-read. This means you only need to read each block of data from the source once, and then it is cached in the overlay, making things much faster.
- qemu now has excellent discard and trim support. To minimize the amount of data that we copy, we first fstrim the filesystems. This causes the overlay to remember which bits of the filesystem are used and only copy those bits.
- I added support for fstrim to ntfs-3g so this works for Windows guests too.
- libguestfs has support for remote storage, cachemode, discard, copy-on-read and more, meaning we can use all these features in virt-v2v.
- We use OCaml — not C, and not type-unsafe languages — to ensure that the compiler is helping us to find bugs in the code that we write, and also to ensure that we end up with an optimized, standalone binary that requires no runtime support/interpreters and can be shipped everywhere.
Pictured above is my 64 bit ARM server. It’s under NDA so I cannot tell you who supplied it or even show you a proper photo.
However it runs Fedora 21 & Rawhide:
Linux arm64.home.annexia.org 3.16.0-0.rc6.git1.1.efirtcfix1.fc22.aarch64 #1 SMP Wed Jul 23 12:15:58 BST 2014 aarch64 aarch64 aarch64 GNU/Linux
libvirt and libguestfs run fine, with full KVM acceleration, although right now you have to use qemu from git as the Rawhide version of qemu is not new enough.
Also OCaml 4.02.0 beta works (after we found and fixed a few bugs in the arm64 native code generator last week).
PG’OCaml is a type-safe macro binding to PostgreSQL from OCaml that I wrote many moons ago.
You can write code like:
let hostid = 33 in let name = "john.smith" in let rows = PGSQL(dbh) "select id, subject from contacts where hostid = $hostid and name = $name"
and the compiler checks (at compile time) that
name have the correct types in the program to match the database schema. And it’ll ensure that the type of
rows is something like
(int * string) list, and integrate that with type inference in the rest of the program.
The program won’t compile if you use the wrong types. It integrates OCaml’s type safety and type inference with the PostgreSQL database engine.
It also avoids SQL injection by automatically creating a safe prepared statement. What is executed when the program runs will have:
... where hostid = ? and name = ?.
As a side-effect of the type checking, it also verifies that the SQL code is syntactically correct.
Update: Thanks to Peter Robinson, there is now a build of OCaml for aarch64 in the Fedora repository.