Tag Archives: disk images

nbdkit for loopback pt 7: a slow disk

What happens to filesystems and programs when the disk is slow? You can test this using nbdkit and the delay filter. This command creates a 4G virtual disk in memory and injects a 2 second delay into every read operation:

$ nbdkit --filter=delay memory size=4G rdelay=2

You can loopback mount this as a device on the host:

# modprobe nbd
# nbd-client -b 512 localhost /dev/nbd0
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 4096MB
Connected /dev/nbd0

Partitioning and formatting is really slow!

# sgdisk -n 1 /dev/nbd0
Creating new GPT entries in memory.
... sits here for about 10 seconds ...
The operation has completed successfully.
# mkfs.ext4 /dev/nbd0p1
mke2fs 1.44.3 (10-July-2018)
waiting ...

Actually I killed it and decided to restart the test with a smaller delay. Since the memory plugin was rewritten to use a sparse array, we’re serializing all requests as an easy way to lock the sparse array data structure. This doesn’t matter normally because requests to the memory plugin are extremely fast, but once you inject delays this means that every request into nbdkit is serialized. Thus for example two reads issued in parallel at the same time by the kernel are delayed by 2+2 = 4 seconds instead of 2 seconds in total.

However shutting down the NBD connection reveals likely kernel bugs in the NBD driver:

[74176.112087] block nbd0: NBD_DISCONNECT
[74176.112148] block nbd0: Disconnected due to user request.
[74176.112151] block nbd0: shutting down sockets
[74176.112183] print_req_error: I/O error, dev nbd0, sector 6144
[74176.112252] print_req_error: I/O error, dev nbd0, sector 6144
[74176.112257] Buffer I/O error on dev nbd0p1, logical block 4096, async page read
[74176.112260] Buffer I/O error on dev nbd0p1, logical block 4097, async page read
[74176.112263] Buffer I/O error on dev nbd0p1, logical block 4098, async page read
[74176.112265] Buffer I/O error on dev nbd0p1, logical block 4099, async page read
[74176.112267] Buffer I/O error on dev nbd0p1, logical block 4100, async page read
[74176.112269] Buffer I/O error on dev nbd0p1, logical block 4101, async page read
[74176.112271] Buffer I/O error on dev nbd0p1, logical block 4102, async page read
[74176.112274] Buffer I/O error on dev nbd0p1, logical block 4103, async page read

Note nbdkit did not return any I/O errors, but the connection was closed with in-flight delayed requests. Well at least our testing is finding bugs!

I tried again with a 500ms delay and using the file plugin which is fully parallel:

$ rm -f temp
$ truncate -s 4G temp
$ nbdkit --filter=delay file file=temp rdelay=500ms

I was able to partition and create a filesystem more easily on this because of the shorter delay and the fact that parallel kernel requests are delayed “in parallel” [same steps as above], and then mount it on a temp directory:

# mount /dev/nbd0p1 /tmp/mnt

The effect is rather strange, like using an NFS mount from a remote server. Initial file reads are slow, and then they are fast (as they are cached in memory). If you drop Linux caches:

# echo 3 > /proc/sys/vm/drop_caches

then everything becomes slow again.

Confident that parallel requests were being delayed in parallel, I also increased the delay back up to 2 seconds (still using the file plugin). This is like swimming in treacle or what I imagine it would be like to mount an NFS filesystem from the other side of the world over a 56K modem.

I wasn’t able to find any further bugs, but this should be useful for someone who wants to test this kind of thing.

Advertisements

Leave a comment

Filed under Uncategorized

Mapping files to disk

Wouldn’t it be cool if you could watch a virtual machine booting, and at the same time see what files it is accessing on disk:

reading /dev/sda1 master boot record
reading /dev/sda1 /grub2/i386-pc/boot.img
reading /dev/sda1 /grub2/i386-pc/ext2.mod
reading /dev/sda1 /vmlinuz
...

You can already observe what disk blocks it is accessing pretty easily. There are several methods, but a quick one would be to use nbdkit’s file plugin with the -f -v flags (foreground and verbose). The problem is how to map disk blocks to the files and other interesting objects that exist in the disk image.

How do you map between files and disk blocks? For simple filesystems like ext4 you can use the FIBMAP ioctl, and perhaps adjust the answer by adding the offset of the start of the partition. However as you get further into the boot process you’ll probably encounter complexities like LVM. There may not even be a 1-1 mapping since RAID means that multiple blocks can store a single file block, and tail packing and deduplication mean that a block can belong to multiple files. And of course there are things other than plain files: directories, swap partitions, master boot records, and boot loaders, that live in and between filesystems.

To solve this I have written a tool called virt-bmap. It takes a disk image and outputs a block map. To do this it uses libguestfs (patched) to control an nbdkit instance, reading each file and recording what blocks in the disk image are accessed. (It sounds complicated, but virt-bmap wraps it up in a simple command line tool.) The beauty of this is that the kernel takes care of the mapping for us, and it works no matter how many layers of filesystem/LVM/RAID are between the file and the underlying device. This doesn’t quite solve the “RAID problem” since the RAID layers in Linux are free to only read a single copy of the file, but is generally accurate for everything else.

$ virt-bmap fedora-20.img
virt-bmap: examining /dev/sda1 ...
virt-bmap: examining /dev/sda2 ...
virt-bmap: examining /dev/sda3 ...
virt-bmap: examining filesystem on /dev/sda1 (ext4) ...
virt-bmap: examining filesystem on /dev/sda3 (ext4) ...
virt-bmap: writing /home/rjones/d/virt-bmap/bmap
virt-bmap: successfully examined 3 partitions, 0 logical volumes,
           2 filesystems, 3346 directories, 20585 files
virt-bmap: output written to /home/rjones/d/virt-bmap/bmap

The output bmap file is a straightforward map from disk byte offset to file / files / object occupying that space:

1 541000 541400 d /dev/sda1 /
1 541400 544400 d /dev/sda1 /lost+found
1 941000 941400 f /dev/sda1 /.vmlinuz-3.11.10-301.fc20.x86_64.hmac
1 941400 961800 f /dev/sda1 /config-3.11.10-301.fc20.x86_64
1 961800 995400 f /dev/sda1 /initrd-plymouth.img
1 b00400 ef1c00 f /dev/sda1 /grub2/themes/system/background.png
1 f00400 12f1c00 f /dev/sda1 /grub2/themes/system/fireworks.png
1 1300400 1590400 f /dev/sda1 /System.map-3.11.10-301.fc20.x86_64

[The 1 that appears in the first column means “first disk”. Unfortunately virt-bmap can only map single disk virtual machines at present.]

The second part of this, which I’m still writing, will be another nbdkit plugin which takes these maps and produces a nice log of accesses as the machine boots.

3 Comments

Filed under Uncategorized

New tool: virt-builder

New in libguestfs 1.24 will be a simple tool called virt-builder. This builds virtual machines of various free operating systems quickly and securely:

$ virt-builder fedora-19 --size 20G --install nmap
[     0.0] Downloading: http://libguestfs.org/download/builder/fedora-19.xz
[     2.0] Uncompressing: http://libguestfs.org/download/builder/fedora-19.xz
[    25.0] Running virt-resize to expand the disk to 20.0G
[    74.0] Opening the new disk
[    78.0] Random root password: RCuMKJ4NPak0ptJQ [did you mean to use --root-password?]
[    78.0] Installing packages: nmap
[    93.0] Finishing off

Some notable features:

  • Fast: As you can see above, once it has downloaded and cached the template first time, it can churn out new guests in around 90 seconds.
  • Install packages.
  • Set the hostname.
  • Generate a random seed for the guest.
  • Upload files.
  • Set passwords, create user accounts.
  • Run custom scripts.
  • Install firstboot scripts.
  • Fetch packages from private repos and ISOs.
  • Secure: Everything is assembled in a container (using SELinux if available).
  • Guest templates are PGP-signed.
  • No root or privileged access needed at all (no setuid, no sudo).
  • Fully scriptable.
  • Can be used in locked-down no-network scenarios.
  • Can use UML as a backend (good for use in a cloud).

13 Comments

Filed under Uncategorized

Tip: Pack files into a new disk image

Note: This requires libguestfs ≥ 1.4

If files/ is a directory containing some files, you can create a new disk image populated with those files using the command below (virt-make-fs can also be used for this):

$ guestfish -N fs:ext2:400M -m /dev/sda1 tar-in <(tar -C files/ -cf - .) /

Explanation:

  1. guestfish -N fs:ext2:400M creates a prepared disk image which is 400MB in size with a single partition formatted as ext2. The new disk image is called test1.img.
  2. -m /dev/sda1 mounts the new filesystem when guestfish starts up.
  3. tar-in [...] / uploads and unpacks the tar file into the root directory of the mounted disk image.
  4. <(tar -C files/ -cf - .) is a bash process substitution which runs the tar command and supplies the output (ie. the tar file) as the input on the command line.

This is a quick way to create “appliances” using febootstrap and libguestfs, although you should note that I don’t think these appliances would really work, I just use them for testing our virtualization management tools, like the ability to detect and manage disk images:

$ febootstrap -i fedora-release -i bash -i setup -i coreutils fedora-13 f13
$ echo '/dev/sda1 / ext2 defaults 1 1' > fstab
$ febootstrap-install f13 fstab /etc 0644 root.root
$ febootstrap-minimize f13
$ guestfish -N fs:ext2:40M -m /dev/sda1 tar-in <(tar -C f13 -cf - .) /
$ guestfish --ro -i -a test1.img

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for a list of commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Fedora release 13 (Goddard)
/dev/vda1 mounted on /
><fs> cat /etc/fstab
/dev/sda1 / ext2 defaults 1 1

6 Comments

Filed under Uncategorized