Which foreign function interface is the best?

I’ve written libguestfs language bindings for Perl, Python, Ruby, Java, OCaml, PHP, Haskell, Erlang and C#. But which of these is the best? Which is the easiest? What makes this hard? Grubbing around in the internals of a language reveals mistakes made by the language designers, but what are the worst mistakes?

Note: There is source that goes with this. Download libguestfs-1.13.13.tar.gz and look in the respective directories.

The best

It’s going to be a controversial choice, but in my opinion: C#. You just add some simple annotations to your functions and structs, and you can call into shared libraries (or “DllImport”s as Microsoft insisted on calling them) directly. It’s just about as easy as directly calling C and that is no simple achievement considering how the underlying runtime of C# is very different from C.

Example: a C struct:

[StructLayout (LayoutKind.Sequential)]
public class _int_bool {
  int i;
  int b;
}

The worst

There are two languages in the doghouse: Haskell and PHP. PHP first because their method of binding is just very broken. For example, 64 bit types aren’t possible on a 32 bit platform. It requires a very complex autoconf setup. And the quality of their implementation is very poor verging on broken — it makes me wonder if the rest of PHP can be this bad.

Haskell: even though I’m an experienced functional programmer and have done a fair bit of Haskell programming in the past, the FFI is deeply strange and very poorly documented. I simply could not work out how to return anything other than integers from my functions. You end up with bindings that look like this:

write_file h path content size = do
  r <- withCString path $ \path -> withCString content $ \content -> withForeignPtr h (\p -> c_write_file p path content (fromIntegral size))
  if (r == -1)
    then do
      err <- last_error h
      fail err
    else return ()

The middle tier

There’s not a lot to choose between OCaml, Ruby, Java and Erlang. For all of them: you write bindings in C, there’s good documentation, it’s a bit tedious but basically mechanical, and in 3 out of 4 you’re dealing with a reasonable garbage collector so you have to be aware of GC issues.

Erlang is slightly peculiar because the method I chose (out of many possible) is to write an external process that talks to the Erlang over stdin/stdout. But I can’t fault their documentation, and the rest of it is sensible.

Example: Here is a function binding in OCaml, but with mechanical changes this could be Ruby, Java or Erlang too:

CAMLprim value
ocaml_guestfs_add_drive_ro (value gv, value filenamev)
{
  CAMLparam2 (gv, filenamev);
  CAMLlocal1 (rv);

  guestfs_h *g = Guestfs_val (gv);
  if (g == NULL)
    ocaml_guestfs_raise_closed ("add_drive_ro");

  char *filename = guestfs_safe_strdup (g, String_val (filenamev));
  int r;

  caml_enter_blocking_section ();
  r = guestfs_add_drive_ro (g, filename);
  caml_leave_blocking_section ();
  free (filename);
  if (r == -1)
    ocaml_guestfs_raise_error (g, "add_drive_ro");

  rv = Val_unit;
  CAMLreturn (rv);
}

The ugly

Perl: Get reading. You’d better start with perlxs because Perl uses its own language — C with bizarre macros on top so your code looks like this:

SV *
is_config (g)
      guestfs_h *g;
PREINIT:
      int r;
   CODE:
      r = guestfs_is_config (g);
      if (r == -1)
        croak ("%s", guestfs_last_error (g));
      RETVAL = newSViv (r);
 OUTPUT:
      RETVAL

After that, get familiar with perlguts. Perl has only 3 structures and you’ll be using them a lot. There are some brilliant things about Perl which shouldn’t be overlooked, including POD which libguestfs uses to make effortless manual pages.

Python: Best described as half arsed. Rather like the language itself.

Python, Ruby, Erlang: If your language depends on “int”, “long”, “long long” without defining what those mean, and differing based on your C compiler and platform, then you’ve made a big mistake that will unfortunately dog you throughout the runtime, FFIs and the language itself. It’s better either to define them precisely (like Java) or to just use int32 and int64 (like OCaml).

And finally, reference counting (Perl, Python). It’s tremendously easy to make mistakes that are fiendishly difficult to track down. It’s a poor way to do GC and it indicates to me that the language designer didn’t know any better.

18 Comments

Filed under Uncategorized

18 responses to “Which foreign function interface is the best?

  1. Daniel Svensson

    When it comes to Python, you should really have a look at the Cython project over at http://cython.org/. The bare C-bindings are not that user friendly. Also, if the library you bind is ref-counted, binding it to Python fits very well. I do however agree that it’s flawed to use refcounting as the benefits of freeing data predictable is defeated when you get a VM-stall due to freeing a ton of data at the time it goes out of scope.. although as long as Azul Systems JVM is proprietary, I’d say all VM’s are flawed.

  2. Oh joy, I also had my fun with the Haskell FFI. In my case, I was trying to make the Grammatical Framework (GF) runtime in Haskell accessible to C code, i.e. I was using it in the opposite direction as you. Now the good news is that, unlike for many other FFIs (which are only unidirectional), that direction is actually supported. The bad news is that it’s even more poorly documented and painful, and that there are some implementation peculiarities: In particular, the most popular compiler (GHC) requires you to call some initialization function under #ifdef if you’re calling any GHC-compiled code from a main function in C, and there’s no telling whether less popular compilers won’t require some other non-standard code to actually work. (If an initialization function is needed in practice, why isn’t it part of the FFI spec?) Another annoyance is that a StablePtr (which is pretty much the only sane way to return a Haskell object to C code) cannot officially be NULL: (castPtrToStablePtr nullPtr) works in practice (at least with GHC), but the Haskell FFI spec explicitly warns that this is not guaranteed to work. Yet returning NULL is the best way to report an error, or to return a Maybe with Nothing in it. It’s possible to use Ptr () instead of StablePtr a in all the interfaces, which will lead to the same API on the C side, but that’s not going to make the code more readable. I went for the (castPtrToStablePtr nullPtr) hack and hoped for the best.

  3. 1. Is C#’s DllImport better than Python’s ctypes?
    http://docs.python.org/library/ctypes.html

    2. Re: GC and FFI, although reference counting is tedious, many C programmers are not used to interfacing with automatic garbage collection and if the FFI designer doesn’t take special care to make interacting with GC “hard to get wrong” you can get bugs like:

  4. Try node.js with a callback-based API like libflac, I bet you’ll put it at the bottom right away…

  5. Adrien

    (I shouldn’t rely only on the cwn, it makes me lag a bit …)

    In OCaml, you can automate the process with cowboy (I know, bad name) which I wrote and use for glib-based stuff (it uses gobject-introspection in-code data but not runtime at all). It’s no magical solution but I gave up on magical solutions because you need a quite good knowledge of the library API anyway (like knowing how it handles memory).

  6. dan

    I personally love the LuaJIT ffi: http://luajit.org/ext_ffi.html

    local ffi = require(“ffi”)
    ffi.cdef[[
    int printf(const char *fmt, …);
    ]]
    ffi.C.printf(“Hello %s!”, “world”)

    The bit between the [[ and ]] is basically a C header file. LuaJIT parses it and does whatever it needs to convert your types. You can use strusts or whatever you need – you can even use the ffi to isntanciate C structs (which can have a performance benefit over Lua types.

    • rich

      How do I tell it whether or not strings are nullable?

      • dan

        It just uses C declarations, so you can only tell it what you can tell C. AFAIK LuaJIT will always pass in null terminated C strings. Outside of that, I’m not sure if you can give it additional information, nor have I ever needed to.

        Details on what you can and can’t do, what declarations are supported, how types are converted and so on are found here: http://luajit.org/ext_ffi_semantics.html

      • rich

        That sucks a lot. Not being able to associate buffer pointer with size. Not being able to tell whether a string can be NULL or not. Not being able to pass int ranges, preconditions, etc.

      • dan

        Honestly, I have never had an occasion where I needed to. Maybe I’m just too used to programming in C and C++ that I’m used to not having the language protect me from these things. If I need it, I can easily wrap the FFI calls in Lua functions which validate the arguments before passing them in.

        I’ve also not seen these features in any other FFI’s I’ve used, though its certainly possible I just overlooked it (CPython’s C API, Python’s ctypes, Java JNA: https://github.com/twall/jna/blob/master/www/Mappings.md and Java JNI). C declarations won’t tell you these details on their own anyway.

        Quoting from the document I linked:

        The FFI library has been designed as a low-level library. The goal is to interface with C code and C data types with a minimum of overhead. This means you can do anything you can do from C: access all memory, overwrite anything in memory, call machine code at any memory address and so on.

        Again: the FFI library is a low-level library. This implies it needs to be used with care, but it’s flexibility and performance often outweigh this concern.

  7. dan

    As it was mentioned, one thing I really like about the LuaJIT FFI is that you can attach a finalizer to an object created through the FFI, which can be used to, eg, call free() when Lua garbage collects a native object. Eg, from the docs:

    local p = ffi.gc(ffi.C.malloc(n), ffi.C.free)

    p = nil — Last reference to p is gone.
    — GC will eventually run finalizer: ffi.C.free(p)

    Of course, ffi.C.malloc and ffi.C.free can be replaced with any function, native or Lua.

  8. Pingback: My rant about Haskell | Richard WM Jones

  9. Paulo

    Perl a bad language for FFI? And what about Inline::C and Inline:CPP and Inline::”Whatsoever do you want”? These packages are very, very good wrappers on the ugly XS extensions… and aren’t they FFI?

    • rich

      Let’s take a deep breath.

      Why does Perl have two (or more) FFIs? That’s already a failure. If Inline::CPP is good enough for everyone, it should be the one true FFI. Likely it’s not because it either can’t do quite everything that XS can (for example, it doesn’t do any mapping from non-basic types, which immediately means it’s hard to use for libguestfs bindings). Or because performance sucks because the first time it has to call out to the C compiler. Or because the tooling around it doesn’t exist (like makefiles, distro packaging etc).

      So solve all that and make Inline::C / Inline::CPP be the one true packaging solution, then we’ll talk.

  10. Stephen M.

    I’ve noticed announcements for new libguestfs bindings; e.g. Lua, Go. Would you please update this post with your current FFI opinions? Thanks.

    • rich

      Go is horrible. We disabled it in the Fedora libguestfs package since forever because it just doesn’t work well with Linux distros. The main problem (apart from Go being an awful language in general) is that it constantly wants to recompile the whole world.

      Lua was basically weird, but once I understood the stack architecture it wasn’t a problem. I highly doubt anyone uses the Lua libguestfs bindings.

      The one I really want to do are Rust bindings. I actually started on them, but the borrow checker is hard to understand, particularly as I’m not learning the language but immediately jumping into writing bindings for the language (so I have to interact with the borrow-checker in difficult and undocumented ways). One day I’ll get around to it …

Leave a reply to Ferry Huberts (@fhuberts) Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.