Tag Archives: ocaml internals

OCaml internals part 5: Garbage collection

In part 1 and part 2 I talked about values and blocks, and in part 3 and part 4 I discussed where those are stored in the OCaml heap.

In this part I want to talk about garbage collection, but actually not about how the heap is scanned or colouring algorithms because there are much better books out there on the subject.

Instead I want to talk about some peculiarities of the OCaml garbage collector that might affect the performance of your OCaml programs.

The refs list

Update Although the code calls this the refs list, it’s more commonly known in GC terminology as the remembered set.

The first is mutable references which point “backwards” from the major to minor heap. In a functional language which doesn’t allow any mutable types, there’s one guarantee you can make which is there could never be a pointer going from the major heap to something in the minor heap. The reason should be fairly clear: You can’t create a new object that points to a future object that’s not been created yet, and because the language doesn’t allow mutability you couldn’t update the old object when the new object gets created. So when an object in an immutable language graduates from the minor heap to the major heap, it is fixed forever (until it becomes unreachable), and can’t ever point back to the minor heap.

Of course OCaml allows mutable objects:

type a = { mutable mb : b }
and b = { m : int }

let set a b =
  a.mb <- b

So in OCaml we can indeed imagine a situation where an object a has migrated to the major heap, and we call set to point its mb field at an object b on the minor heap.

What happens next? If the minor heap collection worked exactly as I described it in part 4, then the outcome wouldn’t be good. b isn’t pointed at by any local root, so it would be “unreachable” and would disappear, leaving a dangling pointer.

One solution would be to check the major heap, but that would be massively time-consuming: minor collections are supposed to be very quick.

What OCaml does instead is to have a separate “refs” list. This contains a list of pointers that point from the major heap to the minor heap. During a minor heap collection, the refs list is consulted for additional roots (and after the minor heap collection, the refs list can be started anew).

The refs list however has to be updated, and it gets updated potentially every time we modify a mutable field in a struct, such as a above. The code calls the C function caml_modify which both mutates the struct and decides whether this is a major->minor pointer to be added to the refs list.

Needless to say, if you use mutable fields then this is much slower than a simple assignment.

The compiler helps out somewhat. Mutable integers are OK and don’t trigger the extra call. Of course non-mutable objects don’t pay any penalty (but on the other hand, if you have to “modify” one, then that entails a complete copy). You can also mutate fields yourself, eg. from C functions or using Obj, provided you can guarantee that this won’t generate a pointer between the major and minor heaps.

Slicing and dicing

The OCaml garbage collector doesn’t collect the major heap in one go. It spreads the work over small slices, and slices are grouped into whole phases of work.

A slice is just a defined amount of work, normally calculated automatically by the garbage collector based on how fast it thinks the program is allocating (and other factors).

The phases are mark and sweep, and some additional subphases dealing with weak pointers and finalization.

Finally there is a compaction phase which is triggered when there is no other work to do and the estimate of free space in the heap has reached some threshold.

There is a penalty to compacting the heap. If you don’t want to pay that, you can disable it completely using a tunable in the Gc module. Alternatively you can schedule when to compact the heap — while waiting for a keypress or between frames in a live simulation.

There is also a penalty for doing a slice of the major heap. This cannot be avoided so easily — for example if the minor heap is exhausted, then some activity in the major heap is unavoidable. However if you make the minor heap large enough, and schedule time to do some slice of work on the major heap, you can completely[?] control when GC work is done. You can also move large structures out of the major heap entirely (thus they won’t be scanned) using my ancient heap module.

That concludes our discussion of OCaml internals. Part 6 to follow tomorrow will supply further reading, interesting links, and a guide to the source code if you want to explore further.

5 Comments

Filed under Uncategorized

OCaml internals part 4: The major heap

[What’s this all about? Start at part 1]

In part 3 we looked at the minor heap. In this part we’ll look at the other bit, not surprisingly called the major heap.

The major heap is the main OCaml memory store. It is composed of some number of large chunks of memory (allocated by C malloc).

When the minor heap runs out, it triggers a minor collection.

The minor collection starts at all the local roots and “oldifies” them, basically copies them by reallocting those objects (recursively) to the major heap. After this, any objects left in the minor heap are unreachable, so the minor heap can be reused just by resetting caml_young_ptr.

           reachable        reachable
+---------+---------+------+----+----+-----------------+
|         |/////////|      |///-->///|                 |
+---------+---------+------+----+----+-----------------+
           ^                ^
           |                |
        local root       local root

Most objects in a typical program are short-lived and never make it out of the minor heap. At least, that is the assumption, and allocation in functional programs is only cheap if this assumption holds. A few programs won’t work like this at all, and some programs may require adjustment of the minor heap using tunables in the OCaml Gc module.

One thing you should notice from the discussion of values and blocks in parts 1 and 2 is that at runtime the garbage collector always knows what is a pointer, and what is an int or opaque data (like a string). Pointers get scanned so the GC can find unreachable blocks. Ints and opaque data must not be scanned. This is the reason for having a tag bit for integer-like values, and one of the uses of the tag byte in the header.

                  "Tag byte space"
+----------+
| 0        | Array, tuple, etc.
+----------+
| 1        |
+----------+
| 2        |
~          ~
|          | Tags in the range 0..245 are used for variants
~          ~
| 245      |
+----------+
| 246      | Lazy (before being forced)
+----------+
| 247      | Closure
+----------+
| 248      | Object                            ^
+----------+                                   |  Block contains
| 249      | Used to implement closures        |  values which the
+----------+                                   |  GC should scan
| 250      | Used to implement lazy values     |
+----------+ <---------------------------  No_scan_tag
| 251      | Abstract data                     |
+----------+                                   |  Block contains
| 252      | String                            |  opaque data
+----------+                                   |  which GC must
| 253      | Double                            V  not scan
+----------+
| 254      | Array of doubles
+----------+
| 255      | Custom block
+----------+

So in the normal course of events, a small, long-lived object will start on the minor heap and be copied once into the major heap. Large objects go straight to the major heap.

The major heap as I said is composed of large chunks of memory allocated through C malloc. Inside these chunks there is quite a complex allocation scheme used, which I won’t go into. But there is another important structure used in the major heap, called the page table. The garbage collector must at all times know which pieces of memory belong to the major heap, and which pieces of memory do not, and it uses the page table to track this.

One reason why we always want to know where the major heap lies is so we can avoid scanning pointers which point to C structs outside the OCaml heap. The GC will not stray beyond its own heap, and treats all pointers outside as opaque (it doesn’t touch them or follow them).

In OCaml ≤ 3.10 the page table was implemented as a simple bitmap, with 1 bit per page of virtual memory (major heap chunks are always page-aligned). This was unsustainable for 64 bit address spaces where memory allocations can be very very far apart, so in OCaml 3.11 this was changed to a sparse hash table.

bitmap       pages of main memory
+-----+      +-------------------+
| 1   |      | chunk of major    |
+-----+      + heap              +
| 1   |      |                   |
+-----+      +-------------------+
| 0   |        other memory
+-----+      +-------------------+
| 1   |      | chunk of major    |
+-----+      + heap              +
| 1   |      |                   |
+-----+      +                   +
| 1   |      |                   |
+-----+      +-------------------+

Because of the page table, C pointers can be stored directly as values, which saves time and space. (However if your C pointer later gets freed, you must NULL out the value — the reason is that the same memory address might later get malloc’d for the OCaml major heap, thus suddenly becoming a “valid” address again. This usually results in a crash).

In part 5 I’ll look at the workings of the garbage collector in a little more detail.

1 Comment

Filed under Uncategorized

OCaml internals part 3: The minor heap

In part 1 we learned about values and in part 2 we learned about blocks. Values and blocks are the toolkit used to make every OCaml type. But where do OCaml blocks live? In this part we’ll start to look at the OCaml heap.

Most OCaml blocks are created in the minor (or young) heap.

The minor heap is a very simple structure, easy to understand but also very fast when used. It is simply a block of memory, allocated using the C malloc function. The memory is 32 K-words by default, which is 128K or 256K on 32 and 64 bit platforms respectively (you can change this size if you want).

So it’s just a block of memory. This is how it works:

+---------------------------------------------------------+
| unallocated                  |///allocated part/////////|
+---------------------------------------------------------+
 ^                              ^
 |                              |
caml_young_limit             caml_young_ptr
                      <----- allocation proceeds
                            in this direction

Remember from part 2 that blocks contain a header and some words. Consider an array of two elements. The total size of this object will be three words (header + 2 words), so 12 bytes on an i386. To allocate this object is very simple — you just subtract 12 from caml_young_ptr and that gives you the address to store the new object.

Of course it’s not quite so simple because you have to check that you don’t run out of space, so the fast path for allocations is:

  1. Subtract size from caml_young_ptr.
  2. If caml_young_ptr < caml_young_limit, then take the slow path through the garbage collector.
  3. Erm, that’s it!

The fast path is just five machine instructions and no branches. This is a good thing because functional languages do a lot more allocations than imperative languages, so allocation has to be very cheap.

The OCaml compiler (ocamlopt) on the other hand compiles your code very literally. For example if you write:

let f (a,b) = (a,b)

then you are asking the compiler to pattern match (a, b) and construct (ie. allocate) another tuple (a, b), and that’s exactly what the compiler will do.

These sorts of “hidden” allocations are a common problem when people are trying to wring the last drop of performance from their OCaml inner loops. Even 5 instructions in an inner loop can be costly, particularly when it’s unnecessary as in the function above.

You are better off writing this function like so:

let f (x : ('a*'b)) = x                                                         

which has the same type signature but will be probably an order of magnitude faster.

Tomorrow, in part 4, I’ll look at what happens when the minor heap needs to be garbage collected, and introduce the major heap.

3 Comments

Filed under Uncategorized

OCaml internals part 2: Strings and other types

In part 1 we saw how integers and some integer-like things are represented at runtime in OCaml.

Objects which are large or more complex than simple integers are stored in OCaml blocks. An OCaml block consists of a header followed by an array of words. Word in this context means a 4 or 8 byte quantity (for 32 and 64 bit platforms respectively) which can contain a value or something else.

Actually if you have a value which is a pointer to an OCaml block, then (for performance reasons) it points to the zeroth element of the array, and the header is at a negative offset, -4 or -8.

+---------------+---------------+---------------+- - - - -
| header        | word[0]       | word[1]       | ....
+---------------+---------------+---------------+- - - - -
                  ^
                  |
              pointer (a value)

What we’ve described so far (values and blocks) turns out to be a complete description of how everything is stored in an OCaml program. [OK, not 100% true: you can also have pointers to C structs]

So how do all the OCaml types map to values and blocks? At the end of this part you’ll find a comprehensive table showing the mapping for every type. Let’s start with some easy ones first.

An OCaml string is just stored as a byte array, with enough words allocated for the required size of the string:

+---------------+----------------+------------------+
| header        | 'a' 'b' 'c' 'd' 'e' 'f' '\O' '\1' |
+---------------+----------------+------------------+
                  ^
                  |
              an OCaml string

(There’s a bit of extra cleverness with strings described here).

An array is just stored as an array of values, with each word corresponding to one value:

+---------------+---------------+---------------+- - - - -
| header        | value[0]      | value[1]      | ....
+---------------+---------------+---------------+- - - - -
                  ^
                  |
              an OCaml array

As I described in the previous part, simple variants are just stored as integers, but if a variant has any parameter, then it’s stored as a block:

+---------------+---------------+
| header        | arg[0]        |
+---------------+---------------+
                  ^
                  |
              a variant with one arg

Let’s take a closer look at the header:

+---------------+---------------+----------+--+--+---------------+
| size of the block in words               | col | tag byte      |
+---------------+---------------+----------+--+--+---------------+
 ^                                         <- 2b-><--- 8 bits --->
 |
offset -4 or -8

The size part is simple enough – it just records the number of words in the block, eg. the length of the array. Note on 32 bit platforms, this is 22 bits long, which is the reason for the annoying 16 MByte limit on strings (16 MBytes = 4 M-words = 222 words). Solution: use a 64 bit platform for serious OCaml.

The tag byte is multipurpose. For example, in the variant-with-parameter example above, it tells you which variant it is. In the string case, it contains a little bit of runtime type information (it’s a string). In other cases it can tell the garbage collector that it’s a lazy value, or opaque data that the garbage collector shouldn’t scan.

“col” (color) is two bits used by the garbage collector which I won’t go into, but you can find out about coloring algorithms in any good book on garbage collection.

One interesting little optimization that OCaml does is for arrays of floating point numbers. So that OCaml does well at numeric programs, arrays of floats are stored inline like this:

+---------------+---------------+---------------+- - - - -
| header        | float[0]                      | ....
+---------------+---------------+---------------+- - - - -
                  ^
                  |
              an OCaml float array

Where do these OCaml blocks live? That’s the subject of part 3

Reference table

Chapter 18 of the OCaml manual describes the representation of blocks in some detail, but not very clearly, so here is a table showing how OCaml objects map into values and blocks.


OCaml Representation
any int, char stored directly as a value, shifted left by 1 bit, with LSB = 1
(), [], false stored as OCaml int 0 (native int 1)
true stored as OCaml int 1 (native int 3)
variant type t = Foo | Bar | Baz (no parameters) stored as OCaml int 0, 1, 2
variant type t = Foo | Bar of int The variants with no parameters are stored as OCaml int 0, 1, 2, etc. counting from the leftmost, counting just the variants that have no parameters.
The variants with parameters are stored as blocks, with tag 0, 1, 2, etc. counting from leftmost, counting just the variants with parameters. The parameters are stored as words in the block itself.
Note there is a limit around 240 variants with parameters that applies to each type, but no limit on the number of variants without parameters you can have. This limit arises because of the size of the tag byte and the fact that some of the high numbered tags are reserved.
list [1; 2; 3] This is represented as 1 :: 2 :: 3 :: [] where [] is a value OCaml int 0, and h :: t is a block with tag 0 and two parameters. This representation is exactly the same as if list was a variant.
tuples, struct and array These are all represented identically, as a simple array of values. The tag is 0. The only difference is that an array can be allocated with variable size, but structs and tuples always have a fixed size.
struct or array where every element is a float These are treated as a special case, as described above. The tag has the special value Double_array_tag (254) so that the garbage collector knows how to deal with these.

Note this exception does not apply to tuples that contain floats. Beware anyone who would declare a vector as (1.0, 2.0).
any string Strings are byte arrays in OCaml, but they have quite a clever representation to make it very efficient to get their length, and at the same time make them directly compatible with C strings. The tag is set to String_tag (252).

For full details of blocks, read the comments in the <caml/mlvalues.h> C header file that comes with OCaml.

8 Comments

Filed under Uncategorized

A beginners guide to OCaml internals

In this 6 part series, I’m going to introduce the internals of the OCaml programming language (tutorial and other references here). This isn’t going to be very comprehensive or in-depth. It’s more of a digest of the readily available information that I found by reading the manual, header files, and some of the compiler code. It’s definitely pitched at beginners, not experts.

By the end of it, you should be able to look at small pieces of OCaml and work out in your head how they get translated into machine code.

I’m definitely not a great expert on the internals of OCaml. This means a couple of things: (a) I’ve probably made some mistakes (post a comment if you see any). (b) I might “reveal too much” of the specifics of the current implementation: anything could change in a future version of OCaml, although the internals described here have been pretty stable for a long time.

Without further ado …

Part 1: Values

In a running OCaml program, a value is either an “integer-like thing” or a pointer.

What’s an “integer-like thing”? Any OCaml int or char, or some things which are stored like integers for speed, which include the constants true and false, the empty list [], unit (), and some (but not all) variants.

For reasons that I’ll explain later, OCaml stores values which are integer-like and values which are pointers differently.

A value containing a pointer is just stored as the pointer. Because addresses are always word-aligned, the bottom 2 bits of every pointer are always 0 0 (the bottom 3 bits are 0 0 0 for a 64 bit machine). OCaml therefore stores integers by setting the bottom bit to 1 and shifting the integer over 1 place:

+----------------------------------------+---+---+
| pointer                                | 0 | 0 |
+----------------------------------------+---+---+

+--------------------------------------------+---+
| integer (31 or 63 bits)                    | 1 |
+--------------------------------------------+---+

You can quickly tell if a value is an integer-like thing (ie. is the bottom bit 1?).

OCaml gives you some C macros and OCaml functions you can use on values. The C macros are Is_long (is it an integer?) and Is_block (is it a pointer?) in the header file . The OCaml functions are is_int and is_block (for pointers) in the “infamous” Obj module.

# let what_is_it foo =
    let foo_repr = Obj.repr foo in
    if Obj.is_int foo_repr then "integer-like" else "pointer" ;;
val what_is_it : 'a -> string = 
# what_is_it 42 ;;
- : string = "integer-like"
# what_is_it () ;;
- : string = "integer-like"
# what_is_it [1;2;3] ;;
- : string = "pointer"
# type bar = Bar | Baz of int ;;
type bar = Bar | Baz of int
# what_is_it Bar ;;
- : string = "integer-like"
# what_is_it (Baz 5) ;;
- : string = "pointer"

Are ints slow? After all it seems like you have to do a lot of bit shifting to do any arithmetic with them.

It turns out that they are a little bit slower, but not by very much. The OCaml compiler can perform most arithmetic operations in between one and four machine instructions, and usually just one or two, making it the same or fractionally slower than a straight native integer. (Of course, there is a more serious penalty in the cases where you really need the full width of a 32 or 64 bit integer, for example in crypto or compression algorithms).

Increment addl $2, %eax
Add a constant addl (constant*2), %eax
Add two values lea -1(%eax, %ebx), %eax
Subtract subl %ebx, %eax
incl %eax

That these are equivalent isn’t totally obvious, but this post explains why.

That concludes part 1. Tune in tomorrow for part 2, or if you’re reading this in the future, use this handy table of contents:

Update

This guide is discussed on reddit here. Also on Hacker News here.

13 Comments

Filed under Uncategorized