Half-baked ideas: feedback-directed fuzz testing of filesystems

For more half-baked ideas, see my ideas tag

Fuzz-testing (a.k.a random testing) is an automatic method of testing where you feed in random data and try to make the program crash.

You might, for example, write random bytes to a disk, then ask Linux to mount it. This is not very sophisticated since Linux is unlikely to find a filesystem signature in pure random bytes, so the test will almost always fail to test anything. A better approach is to take an existing disk image and write random bits to parts of it. This is what Steve Grubb’s fsfuzzer does, but it’s still not the state of the art, although Steve found several bugs with it.

Suppose we think of a filesystem as just an array of integers:

4, 5, 13, 0, 2, ...

The code in the kernel to read this filesystem might start off:

if ((fs[0] & 3) != 0) {
  printk ("invalid filesystem");
  return -EINVAL;
}

If you generate random filesystems, then only 1 in 4 filesystems will get past this first test in the code, so only 1 in 4 of your random tests is testing anything beyond the first statement.

A better approach is to use feedback from the kernel code to evolve your fuzz tests. You score each fuzz test based on how far into the kernel code it gets, then you use that score to evolve your tests using standard genetic algorithm techniques. The idea is that your fuzz tests evolve to test more and more of the kernel code you are interested in testing, instead of just randomly falling down at the first hurdle.

(This technique is well-known in research as feedback-directed fuzz testing, feedback-directed random testing, or evolutionary fuzz testing. As far as I can tell no one is using it on Linux.)

Here’s the half-baked idea: Use systemtap probes as a way to score the fuzz-tests.

We write a systemtap module which inserts probes at every possible point in the file we are interested in testing, say, fs/minix/*.c. These probes just print a simple “I am here” type message.

When we run our fuzz test, we score it according to how much output it produces (by measuring “dmesg” before and after the test). Hitting probe points scores 1. Causing a kernel oops scores a lot more. You’d probably want to do this in a VM, rather than on your host kernel …

Starting with real, but small filesystems, we randomly fuzz them to generate our initial test cases, then evolve those according to how much they score. The test cases are individually tested using the same method as Steve’s fsfuzzer — ie. mounting the filesystem and running through a few simple system calls (read the directory, read files, read extended attrs ..)

Test cases which hit the equivalent of the 1-in-4 problem above are quickly evolved out, and the test cases we are left with hopefully explore and test more of the code.

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.