Tag Archives: FFI

Which foreign function interface is the best?

I’ve written libguestfs language bindings for Perl, Python, Ruby, Java, OCaml, PHP, Haskell, Erlang and C#. But which of these is the best? Which is the easiest? What makes this hard? Grubbing around in the internals of a language reveals mistakes made by the language designers, but what are the worst mistakes?

Note: There is source that goes with this. Download libguestfs-1.13.13.tar.gz and look in the respective directories.

The best

It’s going to be a controversial choice, but in my opinion: C#. You just add some simple annotations to your functions and structs, and you can call into shared libraries (or “DllImport”s as Microsoft insisted on calling them) directly. It’s just about as easy as directly calling C and that is no simple achievement considering how the underlying runtime of C# is very different from C.

Example: a C struct:

[StructLayout (LayoutKind.Sequential)]
public class _int_bool {
  int i;
  int b;
}

The worst

There are two languages in the doghouse: Haskell and PHP. PHP first because their method of binding is just very broken. For example, 64 bit types aren’t possible on a 32 bit platform. It requires a very complex autoconf setup. And the quality of their implementation is very poor verging on broken — it makes me wonder if the rest of PHP can be this bad.

Haskell: even though I’m an experienced functional programmer and have done a fair bit of Haskell programming in the past, the FFI is deeply strange and very poorly documented. I simply could not work out how to return anything other than integers from my functions. You end up with bindings that look like this:

write_file h path content size = do
  r <- withCString path $ \path -> withCString content $ \content -> withForeignPtr h (\p -> c_write_file p path content (fromIntegral size))
  if (r == -1)
    then do
      err <- last_error h
      fail err
    else return ()

The middle tier

There’s not a lot to choose between OCaml, Ruby, Java and Erlang. For all of them: you write bindings in C, there’s good documentation, it’s a bit tedious but basically mechanical, and in 3 out of 4 you’re dealing with a reasonable garbage collector so you have to be aware of GC issues.

Erlang is slightly peculiar because the method I chose (out of many possible) is to write an external process that talks to the Erlang over stdin/stdout. But I can’t fault their documentation, and the rest of it is sensible.

Example: Here is a function binding in OCaml, but with mechanical changes this could be Ruby, Java or Erlang too:

CAMLprim value
ocaml_guestfs_add_drive_ro (value gv, value filenamev)
{
  CAMLparam2 (gv, filenamev);
  CAMLlocal1 (rv);

  guestfs_h *g = Guestfs_val (gv);
  if (g == NULL)
    ocaml_guestfs_raise_closed ("add_drive_ro");

  char *filename = guestfs_safe_strdup (g, String_val (filenamev));
  int r;

  caml_enter_blocking_section ();
  r = guestfs_add_drive_ro (g, filename);
  caml_leave_blocking_section ();
  free (filename);
  if (r == -1)
    ocaml_guestfs_raise_error (g, "add_drive_ro");

  rv = Val_unit;
  CAMLreturn (rv);
}

The ugly

Perl: Get reading. You’d better start with perlxs because Perl uses its own language — C with bizarre macros on top so your code looks like this:

SV *
is_config (g)
      guestfs_h *g;
PREINIT:
      int r;
   CODE:
      r = guestfs_is_config (g);
      if (r == -1)
        croak ("%s", guestfs_last_error (g));
      RETVAL = newSViv (r);
 OUTPUT:
      RETVAL

After that, get familiar with perlguts. Perl has only 3 structures and you’ll be using them a lot. There are some brilliant things about Perl which shouldn’t be overlooked, including POD which libguestfs uses to make effortless manual pages.

Python: Best described as half arsed. Rather like the language itself.

Python, Ruby, Erlang: If your language depends on “int”, “long”, “long long” without defining what those mean, and differing based on your C compiler and platform, then you’ve made a big mistake that will unfortunately dog you throughout the runtime, FFIs and the language itself. It’s better either to define them precisely (like Java) or to just use int32 and int64 (like OCaml).

And finally, reference counting (Perl, Python). It’s tremendously easy to make mistakes that are fiendishly difficult to track down. It’s a poor way to do GC and it indicates to me that the language designer didn’t know any better.

Advertisements

18 Comments

Filed under Uncategorized