What happens to filesystems and programs when the disk is slow? You can test this using nbdkit and the delay filter. This command creates a 4G virtual disk in memory and injects a 2 second delay into every read operation:
$ nbdkit --filter=delay memory size=4G rdelay=2
You can loopback mount this as a device on the host:
# modprobe nbd
# nbd-client -b 512 localhost /dev/nbd0
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 4096MB
Partitioning and formatting is really slow!
# sgdisk -n 1 /dev/nbd0
Creating new GPT entries in memory.
... sits here for about 10 seconds ...
The operation has completed successfully.
# mkfs.ext4 /dev/nbd0p1
mke2fs 1.44.3 (10-July-2018)
Actually I killed it and decided to restart the test with a smaller delay. Since the memory plugin was rewritten to use a sparse array, we’re serializing all requests as an easy way to lock the sparse array data structure. This doesn’t matter normally because requests to the memory plugin are extremely fast, but once you inject delays this means that every request into nbdkit is serialized. Thus for example two reads issued in parallel at the same time by the kernel are delayed by 2+2 = 4 seconds instead of 2 seconds in total.
However shutting down the NBD connection reveals likely kernel bugs in the NBD driver:
[74176.112087] block nbd0: NBD_DISCONNECT
[74176.112148] block nbd0: Disconnected due to user request.
[74176.112151] block nbd0: shutting down sockets
[74176.112183] print_req_error: I/O error, dev nbd0, sector 6144
[74176.112252] print_req_error: I/O error, dev nbd0, sector 6144
[74176.112257] Buffer I/O error on dev nbd0p1, logical block 4096, async page read
[74176.112260] Buffer I/O error on dev nbd0p1, logical block 4097, async page read
[74176.112263] Buffer I/O error on dev nbd0p1, logical block 4098, async page read
[74176.112265] Buffer I/O error on dev nbd0p1, logical block 4099, async page read
[74176.112267] Buffer I/O error on dev nbd0p1, logical block 4100, async page read
[74176.112269] Buffer I/O error on dev nbd0p1, logical block 4101, async page read
[74176.112271] Buffer I/O error on dev nbd0p1, logical block 4102, async page read
[74176.112274] Buffer I/O error on dev nbd0p1, logical block 4103, async page read
Note nbdkit did not return any I/O errors, but the connection was closed with in-flight delayed requests. Well at least our testing is finding bugs!
I tried again with a 500ms delay and using the file plugin which is fully parallel:
$ rm -f temp
$ truncate -s 4G temp
$ nbdkit --filter=delay file file=temp rdelay=500ms
I was able to partition and create a filesystem more easily on this because of the shorter delay and the fact that parallel kernel requests are delayed “in parallel” [same steps as above], and then mount it on a temp directory:
# mount /dev/nbd0p1 /tmp/mnt
The effect is rather strange, like using an NFS mount from a remote server. Initial file reads are slow, and then they are fast (as they are cached in memory). If you drop Linux caches:
# echo 3 > /proc/sys/vm/drop_caches
then everything becomes slow again.
Confident that parallel requests were being delayed in parallel, I also increased the delay back up to 2 seconds (still using the file plugin). This is like swimming in treacle or what I imagine it would be like to mount an NFS filesystem from the other side of the world over a 56K modem.
I wasn’t able to find any further bugs, but this should be useful for someone who wants to test this kind of thing.