If you saw my posting from two days ago you’ll know I’m working on visualizing what happens on block devices when you perform various operations. Last time we covered basics like partitioning a disk, creating a filesystem, creating files, and fstrim.
This time I’ve tied together 5 of the nbdcanvas widgets into a bigger Tcl application that can show what’s happening on a RAID 5 disk set. As with the last posting there’s a video followed by commentary on what happens at each step.
- 00:00: I start guestfish connected to all 5 nbdkit servers. Also of note I’ve added
raid456.devices_handle_discard_safely=1to the appliance kernel command line, which is required for discards to work through MD RAID devices (I didn’t know that before yesterday).
- 00:02: When the appliance starts up, the black flashes show the kernel probing for possible partitions or filesystems. The disks are blank so nothing is found.
- 00:16: As in the previous post I’m partitioning the disks using GPT. Each ends up with a partition table at the start and end of the disk (two red blocks of pixels).
- 00:51: Now I use
mdadm --create(via the guestfish
md-createcommand) to make a RAID 5 array across the first 4 disks. The 4th disk is the parity disk — you can see disks 1 through 3 being scanned and the parity information being written to the 4th disk. The 5th disk is a hot spare. Notice how the scanning continues after the
mdadmcommand has returned. In real arrays this can go on for hours or days.
- 01:11: I create a filesystem. The first action that
mkfsperforms is discarding previous data (indicated by light purple). Notice that the parity data is also discarded, which surprised me, but does make sense.
- 01:27: The RAID array is mounted and I unpack a tarball into it.
- 01:40: I delete the files and fstrim, which discards the underlying blocks again.
- 01:48: Now I’m going to inject errors at the block layer into the 3rd disk. The Error checkbox in the Tcl widget simply creates a file. We’re using the nbdkit error filter which monitors for the named file and when it is created starts injecting errors into any read or write operation. Almost immediately the RAID array notices the damage and starts rebuilding on to the hot spare. Notice the black flashes where it reads the working disks (including old parity disk) to construct the redundant information on the spare.
- 01:55: While reconstruction is under way, the RAID array can be used normally.
- 02:14: Examining
/proc/mdstatshows that the third disk has been marked failed.
- 02:24: Now I’m going to inject errors into the 4th disk as well. This RAID array can survive this, operating in a “degraded state”, but there is no more redundancy.
- 02:46: Finally we can examine the kernel messages which show that the RAID array is continuing on 3 devices.
In case you want to reproduce the results yourself, the full command to run nbdkit (repeated 5 times) is:
$ rm /tmp/sock1 /tmp/error1 $ ./nbdkit -fv -U /tmp/sock1 \ --filter=error --filter=log --filter=delay \ memory size=$((64*1024*1024)) \ logfile=/tmp/log1 \ rdelay=40ms wdelay=40ms \ error-rate=100% error-file=/tmp/error1
And the nbdraid viewing program:
$ ./nbdraid.tcl 5 $((64*1024*1024)) /tmp/log%d /tmp/error%d