Tag Archives: libguestfs

NBD graphical viewer – RAID 5 edition

If you saw my posting from two days ago you’ll know I’m working on visualizing what happens on block devices when you perform various operations. Last time we covered basics like partitioning a disk, creating a filesystem, creating files, and fstrim.

This time I’ve tied together 5 of the nbdcanvas widgets into a bigger Tcl application that can show what’s happening on a RAID 5 disk set. As with the last posting there’s a video followed by commentary on what happens at each step.

raid0066

  • 00:00: I start guestfish connected to all 5 nbdkit servers. Also of note I’ve added raid456.devices_handle_discard_safely=1 to the appliance kernel command line, which is required for discards to work through MD RAID devices (I didn’t know that before yesterday).
  • 00:02: When the appliance starts up, the black flashes show the kernel probing for possible partitions or filesystems. The disks are blank so nothing is found.
  • 00:16: As in the previous post I’m partitioning the disks using GPT. Each ends up with a partition table at the start and end of the disk (two red blocks of pixels).
  • 00:51: Now I use mdadm --create (via the guestfish md-create command) to make a RAID 5 array across the first 4 disks. The 4th disk is the parity disk — you can see disks 1 through 3 being scanned and the parity information being written to the 4th disk. The 5th disk is a hot spare. Notice how the scanning continues after the mdadm command has returned. In real arrays this can go on for hours or days.
  • 01:11: I create a filesystem. The first action that mkfs performs is discarding previous data (indicated by light purple). Notice that the parity data is also discarded, which surprised me, but does make sense.
  • 01:27: The RAID array is mounted and I unpack a tarball into it.
  • 01:40: I delete the files and fstrim, which discards the underlying blocks again.
  • 01:48: Now I’m going to inject errors at the block layer into the 3rd disk. The Error checkbox in the Tcl widget simply creates a file. We’re using the nbdkit error filter which monitors for the named file and when it is created starts injecting errors into any read or write operation. Almost immediately the RAID array notices the damage and starts rebuilding on to the hot spare. Notice the black flashes where it reads the working disks (including old parity disk) to construct the redundant information on the spare.
  • 01:55: While reconstruction is under way, the RAID array can be used normally.
  • 02:14: Examining /proc/mdstat shows that the third disk has been marked failed.
  • 02:24: Now I’m going to inject errors into the 4th disk as well. This RAID array can survive this, operating in a “degraded state”, but there is no more redundancy.
  • 02:46: Finally we can examine the kernel messages which show that the RAID array is continuing on 3 devices.

In case you want to reproduce the results yourself, the full command to run nbdkit (repeated 5 times) is:

$ rm /tmp/sock1 /tmp/error1
$ ./nbdkit -fv -U /tmp/sock1 \
    --filter=error --filter=log --filter=delay \
    memory size=$((64*1024*1024)) \
    logfile=/tmp/log1 \
    rdelay=40ms wdelay=40ms \
    error-rate=100% error-file=/tmp/error1

And the nbdraid viewing program:

$ ./nbdraid.tcl 5 $((64*1024*1024)) /tmp/log%d /tmp/error%d
Advertisements

Leave a comment

Filed under Uncategorized

Partitioning a 7 exabyte disk

In the latest nbdkit (and at the time of writing you will need nbdkit from git) you can type this magical incantation:

nbdkit data data="
       @0x1c0 2 0 0xee 0xfe 0xff 0xff 0x01 0  0 0 0xff 0xff 0xff 0xff
       @0x1fe 0x55 0xaa
       @0x200 0x45 0x46 0x49 0x20 0x50 0x41 0x52 0x54
                     0 0 1 0 0x5c 0 0 0
              0x9b 0xe5 0x6a 0xc5 0 0 0 0  1 0 0 0 0 0 0 0
              0xff 0xff 0xff 0xff 0xff 0xff 0x37 0  0x22 0 0 0 0 0 0 0
              0xde 0xff 0xff 0xff 0xff 0xff 0x37 0
                     0x72 0xb6 0x9e 0x0c 0x6b 0x76 0xb0 0x4f
              0xb3 0x94 0xb2 0xf1 0x61 0xec 0xdd 0x3c  2 0 0 0 0 0 0 0
              0x80 0 0 0 0x80 0 0 0  0x79 0x8a 0xd0 0x7e 0 0 0 0
       @0x400 0xaf 0x3d 0xc6 0x0f 0x83 0x84 0x72 0x47
                     0x8e 0x79 0x3d 0x69 0xd8 0x47 0x7d 0xe4
              0xd5 0x19 0x46 0x95 0xe3 0x82 0xa8 0x4c
                     0x95 0x82 0x7a 0xbe 0x1c 0xfc 0x62 0x90
              0x80 0 0 0 0 0 0 0  0x80 0xff 0xff 0xff 0xff 0xff 0x37 0
              0 0 0 0 0 0 0 0  0x70 0 0x31 0 0 0 0 0
       @0x6fffffffffffbe00
              0xaf 0x3d 0xc6 0x0f 0x83 0x84 0x72 0x47
                     0x8e 0x79 0x3d 0x69 0xd8 0x47 0x7d 0xe4
              0xd5 0x19 0x46 0x95 0xe3 0x82 0xa8 0x4c
                     0x95 0x82 0x7a 0xbe 0x1c 0xfc 0x62 0x90
              0x80 0 0 0 0 0 0 0  0x80 0xff 0xff 0xff 0xff 0xff 0x37 0
              0 0 0 0 0 0 0 0  0x70 0 0x31 0 0 0 0 0
       @0x6ffffffffffffe00
              0x45 0x46 0x49 0x20 0x50 0x41 0x52 0x54
                     0 0 1 0 0x5c 0 0 0
              0x6c 0x76 0xa1 0xa0 0 0 0 0
                     0xff 0xff 0xff 0xff 0xff 0xff 0x37 0
              1 0 0 0 0 0 0 0  0x22 0 0 0 0 0 0 0
              0xde 0xff 0xff 0xff 0xff 0xff 0x37 0
                     0x72 0xb6 0x9e 0x0c 0x6b 0x76 0xb0 0x4f
              0xb3 0x94 0xb2 0xf1 0x61 0xec 0xdd 0x3c
                     0xdf 0xff 0xff 0xff 0xff 0xff 0x37 0
              0x80 0 0 0 0x80 0 0 0  0x79 0x8a 0xd0 0x7e 0 0 0 0
" size=7E

When nbdkit starts up you can connect to it in a few ways. If you have a qemu virtual machine running an installed operating system, attach a second NBD drive. On the command line that would look like this:

$ qemu-system-x86_64 ... -file drive=nbd:localhost:10809,if=virtio

Or you could use guestfish:

$ guestfish --format=raw -a nbd://localhost
><fs> run

What this creates is a 7 exabyte disk with a single, empty GPT partition.

7 exabytes is a lot. It’s 8,070,450,532,247,928,832 bytes, or about 7 billion gigabytes. In fact even with ever increasing storage capacities in hard disk drives it’ll be a very long time before we get exabyte drives.

Peculiar things happen when you try to use this disk in Linux. For sure the kernel has no problem finding the partition, creating a /dev/sda1 device, and returning the right size. Ext4 has a maximum filesystem size of merely 1 exabyte so it won’t even try to make a filesystem, and on my laptop trying to write an XFS filesystem on the partition just caused qemu to grind away at 200% CPU making no apparent progress even after many minutes.

Why not throw your own favourite disk analysis tools at this image and see what they make of it.

Finally how did I create the magic command line above?

I used the nbdkit memory plugin to make an empty 7 EB disk. Note this requires a recent version of the plugin which was rewritten with support for sparse arrays.

$ nbdkit memory size=7E

Then I could connect to it with guestfish to create the GPT partition:

$ guestfish --format=raw -a nbd://localhost
><fs> run
><fs> part-disk /dev/sda gpt

GPT uses a partition table at the beginning and end of the disk. So – still in guestfish – I could sample what the partitioning tool had written to both ends of the disk:

><fs> pread-device /dev/sda 1M 0 | cat > start
><fs> pread-device /dev/sda 1M 8070450532246880256 | cat > end

I then used hexdump + manual inspection of the hexdump output to write the long data string:

$ hexdump -C start
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001c0  02 00 ee fe ff ff 01 00  00 00 ff ff ff ff 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
...

translates to …

@0x1c0 2 0 0xee 0xfe 0xff 0xff 0x01 0  0 0 0xff 0xff 0xff 0xff
@0x1fe 0x55 0xaa

1 Comment

Filed under Uncategorized

Dockerfile for running libguestfs, virt-tools and virt-v2v

FROM fedora
RUN dnf install -y libguestfs libguestfs-tools-c virt-v2v \
                   libvirt-daemon libvirt-daemon-config-network

# https://bugzilla.redhat.com/show_bug.cgi?id=1045069
RUN useradd -ms /bin/bash v2v
USER v2v
WORKDIR /home/v2v

# This is required for virt-v2v because neither systemd nor
# root libvirtd runs, and therefore there is no virbr0, and
# therefore virt-v2v cannot set up the network through libvirt.
ENV LIBGUESTFS_BACKEND direct

2 Comments

Filed under Uncategorized

libguestfs for RHEL 7.5 preview

As usual I’ve placed the proposed RHEL 7.5 libguestfs packages in a public repository so you can try them out.

Thanks to Pino Toscano for doing the packaging work.

Leave a comment

Filed under Uncategorized

Great new changes coming to nbdkit

Eric Blake has been doing some great stuff for nbdkit, the flexible plugin-based NBD server.

  • Full parallel request handling.
    You’ve always been able to tell nbdkit that your plugin can handle multiple requests in parallel from a single client, but until now that didn’t actually do anything (only parallel requests from multiple clients worked).
  • An NBD forwarding plugin, so if you have another NBD server which doesn’t support a feature like encryption or new-style protocol, then you can front that server with nbdkit which does.

As well as that he’s fixed lots of small bugs with NBD compliance so hopefully we’re now much closer to the protocol spec (we always check that we interoperate with qemu’s nbd client, but it’s nice to know that we’re also complying with the spec). He also fixed a potential DoS where nbdkit would try to handle very large writes which would delay a thread in the server indefinitely.


Also this week, I wrote an nbdkit plugin for handling the weird Xen XVA file format. The whole thread is worth reading because 3 people came up with 3 unique solutions to this problem.

1 Comment

Filed under Uncategorized

Fedora 27 virt-builder images

Fedora 27 has just been released, and I’ve just uploaded virt-builder images so you can try it right away:

$ virt-builder -l | grep fedora-27
fedora-27                aarch64    Fedora® 27 Server (aarch64)
fedora-27                armv7l     Fedora® 27 Server (armv7l)
fedora-27                i686       Fedora® 27 Server (i686)
fedora-27                ppc64      Fedora® 27 Server (ppc64)
fedora-27                ppc64le    Fedora® 27 Server (ppc64le)
fedora-27                x86_64     Fedora® 27 Server
$ virt-builder fedora-27 \
      --root-password password:123456 \
      --install emacs \
      --selinux-relabel \
      --size 30G
$ qemu-system-x86_64 \
      -machine accel=kvm:tcg \
      -cpu host -m 2048 \
      -drive file=fedora-27.img,format=raw,if=virtio &

Leave a comment

Filed under Uncategorized

Tip: Changing the qemu product name in libguestfs

20:30 < koike> Hi. Is it possible to configure the dmi codes for libguestfs? I mean, I am running cloud-init inside a libguestfs session (through python-guestfs) in GCE, the problem is that cloud-init reads /sys/class/dmi/id/product_name to determine if the machine is a GCE machine, but the value it read is Standard PC (i440FX + PIIX, 1996) instead of the expected Google Compute Engine so cloud-init fails.

The answer is yes, using the guestfs_config API that lets you set arbitrary qemu parameters:

g.config('-smbios',
         'type=1,product=Google Compute Engine')

Leave a comment

Filed under Uncategorized