September 21, 2011 · 9:15 pm

Which foreign function interface is the best?

I’ve written libguestfs language bindings for Perl, Python, Ruby, Java, OCaml, PHP, Haskell, Erlang and C#. But which of these is the best? Which is the easiest? What makes this hard? Grubbing around in the internals of a language reveals mistakes made by the language designers, but what are the worst mistakes?

Note: There is source that goes with this. Download libguestfs-1.13.13.tar.gz and look in the respective directories.

The best

It’s going to be a controversial choice, but in my opinion: C#. You just add some simple annotations to your functions and structs, and you can call into shared libraries (or “DllImport”s as Microsoft insisted on calling them) directly. It’s just about as easy as directly calling C and that is no simple achievement considering how the underlying runtime of C# is very different from C.

Example: a C struct:

[StructLayout (LayoutKind.Sequential)]
public class _int_bool {
  int i;
  int b;
}

The worst

There are two languages in the doghouse: Haskell and PHP. PHP first because their method of binding is just very broken. For example, 64 bit types aren’t possible on a 32 bit platform. It requires a very complex autoconf setup. And the quality of their implementation is very poor verging on broken — it makes me wonder if the rest of PHP can be this bad.

Haskell: even though I’m an experienced functional programmer and have done a fair bit of Haskell programming in the past, the FFI is deeply strange and very poorly documented. I simply could not work out how to return anything other than integers from my functions. You end up with bindings that look like this:

write_file h path content size = do
  r <- withCString path $ \path -> withCString content $ \content -> withForeignPtr h (\p -> c_write_file p path content (fromIntegral size))
  if (r == -1)
    then do
      err <- last_error h
      fail err
    else return ()

The middle tier

There’s not a lot to choose between OCaml, Ruby, Java and Erlang. For all of them: you write bindings in C, there’s good documentation, it’s a bit tedious but basically mechanical, and in 3 out of 4 you’re dealing with a reasonable garbage collector so you have to be aware of GC issues.

Erlang is slightly peculiar because the method I chose (out of many possible) is to write an external process that talks to the Erlang over stdin/stdout. But I can’t fault their documentation, and the rest of it is sensible.

Example: Here is a function binding in OCaml, but with mechanical changes this could be Ruby, Java or Erlang too:

CAMLprim value
ocaml_guestfs_add_drive_ro (value gv, value filenamev)
{
  CAMLparam2 (gv, filenamev);
  CAMLlocal1 (rv);

  guestfs_h *g = Guestfs_val (gv);
  if (g == NULL)
    ocaml_guestfs_raise_closed ("add_drive_ro");

  char *filename = guestfs_safe_strdup (g, String_val (filenamev));
  int r;

  caml_enter_blocking_section ();
  r = guestfs_add_drive_ro (g, filename);
  caml_leave_blocking_section ();
  free (filename);
  if (r == -1)
    ocaml_guestfs_raise_error (g, "add_drive_ro");

  rv = Val_unit;
  CAMLreturn (rv);
}

The ugly

Perl: Get reading. You’d better start with perlxs because Perl uses its own language — C with bizarre macros on top so your code looks like this:

SV *
is_config (g)
      guestfs_h *g;
PREINIT:
      int r;
   CODE:
      r = guestfs_is_config (g);
      if (r == -1)
        croak ("%s", guestfs_last_error (g));
      RETVAL = newSViv (r);
 OUTPUT:
      RETVAL

After that, get familiar with perlguts. Perl has only 3 structures and you’ll be using them a lot. There are some brilliant things about Perl which shouldn’t be overlooked, including POD which libguestfs uses to make effortless manual pages.

Python: Best described as half arsed. Rather like the language itself.

Python, Ruby, Erlang: If your language depends on “int”, “long”, “long long” without defining what those mean, and differing based on your C compiler and platform, then you’ve made a big mistake that will unfortunately dog you throughout the runtime, FFIs and the language itself. It’s better either to define them precisely (like Java) or to just use int32 and int64 (like OCaml).

And finally, reference counting (Perl, Python). It’s tremendously easy to make mistakes that are fiendishly difficult to track down. It’s a poor way to do GC and it indicates to me that the language designer didn’t know any better.

18 Comments

Filed under Uncategorized

Tagged as c++, erlang, FFI, haskell, java, libguestfs, ocaml, perl, php, programming languages, python, ruby

18 responses to “Which foreign function interface is the best?”

Ferry Huberts (@fhuberts)

September 21, 2011 at 9:42 pm

Actually, Java is pretty good at this too if you use JNA, see http://en.wikipedia.org/wiki/Java_Native_Access and https://github.com/twall/jna.

Reply
- rich
  
  September 21, 2011 at 9:57 pm
  
  Yes, this looks like it gets why C# is so good. Thanks for pointing it out.
  
  Reply
Daniel Svensson

September 21, 2011 at 10:04 pm

When it comes to Python, you should really have a look at the Cython project over at http://cython.org/. The bare C-bindings are not that user friendly. Also, if the library you bind is ref-counted, binding it to Python fits very well. I do however agree that it’s flawed to use refcounting as the benefits of freeing data predictable is defeated when you get a VM-stall due to freeing a ton of data at the time it goes out of scope.. although as long as Azul Systems JVM is proprietary, I’d say all VM’s are flawed.

Reply
Kevin Kofler

September 22, 2011 at 5:13 am

Oh joy, I also had my fun with the Haskell FFI. In my case, I was trying to make the Grammatical Framework (GF) runtime in Haskell accessible to C code, i.e. I was using it in the opposite direction as you. Now the good news is that, unlike for many other FFIs (which are only unidirectional), that direction is actually supported. The bad news is that it’s even more poorly documented and painful, and that there are some implementation peculiarities: In particular, the most popular compiler (GHC) requires you to call some initialization function under #ifdef if you’re calling any GHC-compiled code from a main function in C, and there’s no telling whether less popular compilers won’t require some other non-standard code to actually work. (If an initialization function is needed in practice, why isn’t it part of the FFI spec?) Another annoyance is that a StablePtr (which is pretty much the only sane way to return a Haskell object to C code) cannot officially be NULL: (castPtrToStablePtr nullPtr) works in practice (at least with GHC), but the Haskell FFI spec explicitly warns that this is not guaranteed to work. Yet returning NULL is the best way to report an error, or to return a Maybe with Nothing in it. It’s possible to use Ptr () instead of StablePtr a in all the interfaces, which will lead to the same API on the C side, but that’s not going to make the code more readable. I went for the (castPtrToStablePtr nullPtr) hack and hoped for the best.

Reply
Scott Tsai (@scottttw)

September 22, 2011 at 5:18 am

1. Is C#’s DllImport better than Python’s ctypes?
http://docs.python.org/library/ctypes.html

2. Re: GC and FFI, although reference counting is tedious, many C programmers are not used to interfacing with automatic garbage collection and if the FFI designer doesn’t take special care to make interacting with GC “hard to get wrong” you can get bugs like:

The Broken Promises of MRI/REE/YARV – Now you have something to blame for unexplicable ruby behavior.
byu/cronus42 inprogramming

Reply
Romain Beauxis

September 22, 2011 at 8:57 am

Try node.js with a callback-based API like libflac, I bet you’ll put it at the bottom right away…

Reply
Adrien

September 27, 2011 at 11:29 am

(I shouldn’t rely only on the cwn, it makes me lag a bit …)

In OCaml, you can automate the process with cowboy (I know, bad name) which I wrote and use for glib-based stuff (it uses gobject-introspection in-code data but not runtime at all). It’s no magical solution but I gave up on magical solutions because you need a quite good knowledge of the library API anyway (like knowing how it handles memory).

Reply
dan

November 8, 2011 at 11:53 pm

I personally love the LuaJIT ffi: http://luajit.org/ext_ffi.html

local ffi = require(“ffi”)
ffi.cdef[[
int printf(const char *fmt, …);
]]
ffi.C.printf(“Hello %s!”, “world”)

The bit between the [[ and ]] is basically a C header file. LuaJIT parses it and does whatever it needs to convert your types. You can use strusts or whatever you need – you can even use the ffi to isntanciate C structs (which can have a performance benefit over Lua types.

Reply
- rich
  
  November 9, 2011 at 10:17 am
  
  How do I tell it whether or not strings are nullable?
  
  Reply
  - dan
    
    November 9, 2011 at 1:12 pm
    
    It just uses C declarations, so you can only tell it what you can tell C. AFAIK LuaJIT will always pass in null terminated C strings. Outside of that, I’m not sure if you can give it additional information, nor have I ever needed to.
    
    Details on what you can and can’t do, what declarations are supported, how types are converted and so on are found here: http://luajit.org/ext_ffi_semantics.html
  - rich
    
    November 9, 2011 at 1:14 pm
    
    That sucks a lot. Not being able to associate buffer pointer with size. Not being able to tell whether a string can be NULL or not. Not being able to pass int ranges, preconditions, etc.
  - dan
    
    November 9, 2011 at 1:35 pm
    
    Honestly, I have never had an occasion where I needed to. Maybe I’m just too used to programming in C and C++ that I’m used to not having the language protect me from these things. If I need it, I can easily wrap the FFI calls in Lua functions which validate the arguments before passing them in.
    
    I’ve also not seen these features in any other FFI’s I’ve used, though its certainly possible I just overlooked it (CPython’s C API, Python’s ctypes, Java JNA: https://github.com/twall/jna/blob/master/www/Mappings.md and Java JNI). C declarations won’t tell you these details on their own anyway.
    
    Quoting from the document I linked:
    
    The FFI library has been designed as a low-level library. The goal is to interface with C code and C data types with a minimum of overhead. This means you can do anything you can do from C: access all memory, overwrite anything in memory, call machine code at any memory address and so on.
    
    Again: the FFI library is a low-level library. This implies it needs to be used with care, but it’s flexibility and performance often outweigh this concern.
dan

November 9, 2011 at 1:44 pm

As it was mentioned, one thing I really like about the LuaJIT FFI is that you can attach a finalizer to an object created through the FFI, which can be used to, eg, call free() when Lua garbage collects a native object. Eg, from the docs:

local p = ffi.gc(ffi.C.malloc(n), ffi.C.free)
…
p = nil — Last reference to p is gone.
— GC will eventually run finalizer: ffi.C.free(p)

Of course, ffi.C.malloc and ffi.C.free can be replaced with any function, native or Lua.

Reply
Pingback: My rant about Haskell | Richard WM Jones
Paulo

March 18, 2013 at 11:01 am

Perl a bad language for FFI? And what about Inline::C and Inline:CPP and Inline::”Whatsoever do you want”? These packages are very, very good wrappers on the ugly XS extensions… and aren’t they FFI?

Reply
- rich
  
  March 18, 2013 at 12:43 pm
  
  Let’s take a deep breath.
  
  Why does Perl have two (or more) FFIs? That’s already a failure. If Inline::CPP is good enough for everyone, it should be the one true FFI. Likely it’s not because it either can’t do quite everything that XS can (for example, it doesn’t do any mapping from non-basic types, which immediately means it’s hard to use for libguestfs bindings). Or because performance sucks because the first time it has to call out to the C compiler. Or because the tooling around it doesn’t exist (like makefiles, distro packaging etc).
  
  So solve all that and make Inline::C / Inline::CPP be the one true packaging solution, then we’ll talk.
  
  Reply
Stephen M.

April 5, 2017 at 4:11 am

I’ve noticed announcements for new libguestfs bindings; e.g. Lua, Go. Would you please update this post with your current FFI opinions? Thanks.

Reply
- rich
  
  April 5, 2017 at 9:15 am
  
  Go is horrible. We disabled it in the Fedora libguestfs package since forever because it just doesn’t work well with Linux distros. The main problem (apart from Go being an awful language in general) is that it constantly wants to recompile the whole world.
  
  Lua was basically weird, but once I understood the stack architecture it wasn’t a problem. I highly doubt anyone uses the Lua libguestfs bindings.
  
  The one I really want to do are Rust bindings. I actually started on them, but the borrow checker is hard to understand, particularly as I’m not learning the language but immediately jumping into writing bindings for the language (so I have to interact with the borrow-checker in difficult and undocumented ways). One day I’ll get around to it …
  
  Reply

	Jonathan on SSH from RHEL 9 to RHEL 5 or R…
	Steve on BLKDISCARD, BLKZEROOUT, BLKDIS…
	Tony on guestfish now supports 502…
	Tony on New guestfish -N options in…
	Kitty on SSH from RHEL 9 to RHEL 5 or R…
	rich on BLKDISCARD, BLKZEROOUT, BLKDIS…
	Steve on BLKDISCARD, BLKZEROOUT, BLKDIS…
	Joachim on New tool: virt-customize
	rich on New tool: virt-customize
	Joachim on New tool: virt-customize

Which foreign function interface is the best?

18 responses to “Which foreign function interface is the best?”

Leave a comment Cancel reply

Recent Posts

Recent Comments

About the author

Which foreign function interface is the best?

Share this:

Related

18 responses to “Which foreign function interface is the best?”

Leave a comment Cancel reply

Recent Posts

Recent Comments

About the author