Half-baked ideas: ELF VM

For more half-baked ideas, see the ideas tag.

dlopen and LD_PRELOAD are crude tools.

It should be possible to load any ELF binary or library into a program, introspect it to find out what functions it calls, individually intercept or replace those function calls, and then run it in a controlled, isolated environment where it cannot harm the host program.

I’m going to call this idea “ELF VM”. I’m using “VM” here in the “JVM” sense (not the “KVM” sense).

One particular use I have for this is to run programs using an alternate API; for example replacing POSIX API calls like open/read/write/chmod/… with libguestfs API calls. One would load up the target binary into the ELF VM, introspect it to find out what functions it uses, and then replace them or give an error if we don’t know how to replace them. And finally run the binary, but safely and controllably (it wouldn’t be able to overwrite the host program and you could control how much CPU time it could use).

Valgrind already runs binaries in a VM/emulator like this, although not (AFAIK) in a way that I can take that work and apply it in other areas.

11 Comments

Filed under Uncategorized

11 responses to “Half-baked ideas: ELF VM

  1. Conrad

    As I understand it, valgrind actually runs a full x86 emulator… syscalls are wrapped in order to update internal state (memory read/written) for use-before-write, write-out-of-bounds, and other program issue analyses.

  2. TooTea

    The main problem here is that this would only work for fully dynamically linked binaries. You can’t interpose symbols local (static) to the binary, only those handled by ld.so.

    For example, if you had the read() function from libc linked statically into the binary, from an outside perspective all you would see is that the binary is doing the read syscall directly. This could lead to epic failures if half of the application was using your special interposed implementations and the rest was doing the real syscalls.

    Of course, for special purposes you can just put up a big fat warning saying “Don’t ever try to apply this tool to programs that aren’t fully dynamically linked”.

    As Conrad already mentioned, Valgrind runs a full virtual CPU (and that’s where its performance impact comes from), otherwise it wouldn’t be able to track memory access.

    • rich

      Fair enough, but let’s assume as you say (with almost no loss of generality) that the binary and any libraries it needs are dynamically linked. Do you know of a shared library, widely used on a common Linux distro, that actually does direct syscalls bypassing glibc?

      Tracking memory access is indeed tricky. I was hoping that because we are always running same-arch-on-same-arch (ie. x86-64 on x86-64) that we could get away with some sort of page marking + code inspection as in VMware. If we can’t do that, perhaps we can have two run “levels” in ELF VM, one where we run the binary directly and it could do Bad Stuff, and another where we care about memory protection.

  3. Paul D.

    This reminds me of some reading I made recently about Linux kernel. New 3.x kernels have some kind of way to write a program in some opcode similar to the way to intercept TCP traffic, but for intercepting syscalls. Sadly I don’t remember the name of this new kernel feature. Seems to do mostly the same thing, but in a non-virtual way.

    • rich

      seccomp, while an interesting technology in its own right, doesn’t particularly help us here because we want to replace calls to libraries at any level not just syscalls.

  4. Frank Ch. Eigler

    See also DynInst.

  5. Frank Ch. Eigler

    Sorry about their website. We’re using it systemtap as an experimental pure-userspace backend. It is a library that reverse-engineers a binary / shlib, lets your code call into it, inject instrumentation whereever, let it run fast otherwise.

  6. you most probably already uncovered the powers of kcachegrind to visualize callgrind profiling output?
    http://kcachegrind.sourceforge.net/html/Home.html

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s