Eric and I are writing a Linux NBD client library. There were lots of requirements but the central one for this post is it has to be a library callable from programs written in C and other programming languages (Python, OCaml and Rust being important), and we don’t control those programs so they may be single or multithreaded, or may use non-blocking main loops like gio and glib.
An NBD command involves sending a request over a socket to a remote server and receiving a reply. You can also have multiple requests “in flight” and the reply can be received in multiple parts. On top of this the “fixed newstyle” NBD protocol has a complex multi-step initial handshake. Complicating it further we might be using a TLS transport which has its own handshake.
It’s complicated and we mustn’t block the caller.
There are a few ways to deal with this in my experience — one is to ignore the problem and insist that the main program uses a thread for each NBD connection, but that pushes complexity onto someone else. Another way is to use some variation of coroutines or call/cc — if we get to a place where we would block then we save the stack, return to the caller, and have some way to restore the stack later. However this doesn’t necessarily work well with non-C programming languages. It likely won’t work with either OCaml or Ruby’s garbage collectors since they both involve stack walking to find GC roots. I’d generally want to avoid “tricksy” stuff in a library.
The final way that I know about is to implement a state machine. However large state machines are hellishly difficult to write. Our state machine has 75 states (so far — it’s nowhere near finished). So we need a lot of help.
I came up with a slightly nicer way to write state machines.
The first idea is that states in a large state machine could be grouped. You can consider each group like a mini state machine — it has its own namespace, lives in a single file (example), and may only be entered via a single
START state (so you don’t need to consider what happens if another part of the state machine jumps into the middle of the group).
Secondly groups can be hierarchical. This lets us organise the groups logically, so for example there is a group for “fixed newstyle handshaking” and within that there are separate groups for negotiating each newstyle option. States can refer to each other using either relative or absolute paths in this hierarchy.
Thirdly all states and transitions are defined and checked in a single place, allowing us to enforce rules about what transitions are permitted.
Fourthly the final C code that implements the state machine is generated (mostly). This lets us generate helper functions to (eg) turn state transitions into debug messages, or return whether the connection is in a mode where it’s expecting to read or write from the socket (making it easier to integrate with main loops).
The final code looks like this and generates currently 173K lines of C (although as it’s mostly large switch statements it compiles down to a reasonably small size).
Has anyone else implemented a large state machine in a similar way?