If you want to simulate how your filesystem behaves with a bad drive underneath you have a few options like the kernel dm-flakey device, writing a bash nbdkit plugin, kernel fault injection or a few others. We didn’t have that facility in nbdkit however so last week I started the “evil filter”.
The evil filter can add data corruption to an existing nbdkit plugin. Types of corruption include “cosmic rays” (ie. random bit flips), but more realisticly it can simulate stuck bits. Stuck bits are the only failure mode I can remember seeing in real disks and RAM.
One challenge with writing a filter like this is to make the stuck bits persistent across accesses, without requiring us to maintain a large bitmap in the filter keeping track of their location. The solution is fairly elegant: split the underlying disk into blocks. When we read from a block, reconstruct the stuck bits within that block from a fixed seed (calculated from a global PRNG seed + the block’s offset), and iterate across the block incrementing by random intervals. The intervals are derived from the block’s seed so they are the same each time they are calculated. We size the blocks so that each one will have about 100 corrupted bits so this reconstruction doesn’t take very long. Nothing is stored except one global PRNG seed.
The filter isn’t upstream yet but hopefully it can be another way to test filesystems and distributed storage in future.
Cool! Have you found any interesting bugs with this new filter?
Not upstream yet!