Ever wondered what is really happening when you write to a disk? What blocks the filesystem writes to and so on? With our flexible, plug-in based NBD server called nbdkit and a little Tcl/Tk program I wrote you can now visualise this.
… which shows me opening a blank disk, partitioning it, creating an ext4 filesystem and writing some files.
There’s a lot going on in this video, which I’ll explain below. But first to say that each pixel corresponds to a 4K block on disk — the total disk size is 64M which is 128×128 pixels, and each row is therefore half a megabyte. Red pixels are writes. Black flashing pixels show reads. Light purple is for trim requests, and white pixels are zero requests.
nbdkit was run with the following command line:
$ nbdkit -fv \ --filter=log \ --filter=delay \ memory size=$((64*1024*1024)) \ logfile=/tmp/log \ rdelay=40ms wdelay=40ms
This means that we’re using the memory plugin to create a throwaway blank disk of 64M. In front of this plugin we place two filters: The delay filter delays all reads and writes by 40ms. This makes it easier to see what’s going on. The second filter is the log filter which records all requests in a log file (
The log file is what the second command reads asynchronously to generate the graphical image:
$ ./nbdview.tcl /tmp/log $((64*1024*1024))
- 00:07: I start guestfish connected to the NBD server. This boots a Linux appliance, and you can see from the flashes of black how the Linux kernel probes the disk every which way to try to detect any kind of partition or filesystem signature. (Recall that I’m intentionally delaying all read requests which is why the appliance boot and probing seems to take so long. In reality these probes happen near instantaneously.) Of course the disk is all zeroes at this point, so nothing is found.
- 00:23: I partition the disk using GPT. The partitioning is done under the hood by GNU parted and as you can see there is a considerable amount of probing going on by both parted and then the kernel still looking for filesystem signatures. Eventually we end up with two blocks of red (written) data, because GPT creates both a primary and secondary partition table at the beginning and end of the disk.
- 00:36: I create an ext4 filesystem inside the partition. After even more probing by mkfs the first major operation is to trim/discard all data on the disk (shown by the disk filling up with light purple). Then mkfs writes a large block of data in the middle of the disk which I believe is the journal, followed by four dots which I believe could be backup superblocks.
- 00:48: Interestingly filesystem creation has not finished. ext4 (as well as other modern filesystems) defer a lot of work to the kernel, and this is obvious when I mount the disk. Notice that a few seconds after the mount (around 00:59) the kernel starts zeroing parts of the disk. I believe this is the inode table and block free bitmap for the first block group. For larger disks this lazy initialization could go on for a long time.
- 01:05: I unpack a tarball into the filesystem. As expected the operation finishes almost instantaneously, and nothing is actually written to disk. However issuing an explicit sync at 01:11 causes the files and directories to be written, filling first the data blocks and then the inodes and block free bitmap (is there a reason these are written last, or is it just coincidence? Also does the Linux page cache retain the order that the filesystem wrote the blocks?)
- 01:18: I delete the directory tree I just created. As you’d expect nothing is written to disk, and even after a sync nothing much changes.
- 01:26: In contrast when I fstrim the filesystem, all the now-deleted data blocks are discarded (light purple). This is the same principle which
virt-sparsify --in-placeuses to make a disk image sparse.
- 01:32: Finally after unmounting the filesystem I issue a blkdiscard command which throws the whole thing away. Even after this Linux is probing the partition to see if somehow a filesystem signature could be present.