Tag Archives: disk image

New in nbdkit: Create a virtual floppy disk

nbdkit is our flexible, plug-in based Network Block Device server.

While I was visiting the KVM Forum last week, one of the most respected members of the QEMU development team mentioned to me that he wanted to think about deprecating QEMU’s VVFAT driver. This QEMU driver is a bit of an oddity — it lets you point QEMU to a directory of files, and inside the guest it will see a virtual floppy containing those files:

$ qemu -drive file=fat:/some/directory

That’s not the odd thing. The odd thing is that it also lets you make the drive writable, and the VVFAT driver then turns those writes back into modifications to the host filesystem (remember that these are writes happening to raw FAT32 data structures, the driver has to infer from just seeing the writes what is happening at the filesystem level). Which is both amazing and crazy (and also buggy).

Anyway I have implemented the read-only part of this in nbdkit. I didn’t implement the write stuff because that’s very ambitious, although if you were going to implement that, doing it in nbdkit would be better than qemu since the only thing that can crash is nbdkit, not the whole hypervisor.

Usage is very simple:

$ nbdkit floppy /some/directory

This gives you an NBD source which you can connect straight to a qemu virtual machine:

$ qemu -drive nbd:localhost:10809

or examine with guestfish:

$ guestfish --ro --format=raw -a nbd://localhost -m /dev/sda1
Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: ‘help’ for help on commands
      ‘man’ to read the manual
      ‘quit’ to quit the shell

> ll /
total 2420
drwxr-xr-x 14 root root  16384 Jan  1  1970 .
drwxr-xr-x 19 root root   4096 Oct 28 10:07 ..
-rwxr-xr-x  1 root root     40 Sep 17 21:23 .dir-locals.el
-rwxr-xr-x  1 root root    879 Oct 27 21:10 .gdb_history
drwxr-xr-x  8 root root  16384 Oct 28 10:05 .git
-rwxr-xr-x  1 root root   1383 Sep 17 21:23 .gitignore
-rwxr-xr-x  1 root root   1453 Sep 17 21:23 LICENSE
-rwxr-xr-x  1 root root  34182 Oct 28 10:04 Makefile
-rwxr-xr-x  1 root root   2568 Oct 27 22:17 Makefile.am
-rwxr-xr-x  1 root root  32085 Oct 27 22:18 Makefile.in
-rwxr-xr-x  1 root root    620 Sep 17 21:23 OTHER_PLUGINS
-rwxr-xr-x  1 root root   4628 Oct 16 22:36 README
-rwxr-xr-x  1 root root   4007 Sep 17 21:23 TODO
-rwxr-xr-x  1 root root  54733 Oct 27 22:18 aclocal.m4
drwxr-xr-x  2 root root  16384 Oct 27 22:18 autom4te.cache
drwxr-xr-x  2 root root  16384 Oct 28 10:04 bash
drwxr-xr-x  5 root root  16384 Oct 27 18:07 common
[etc]

Previously … create ISO images on the fly in nbdkit

Advertisements

Leave a comment

Filed under Uncategorized

nbdkit for loopback pt 6: giant file-backed disks for testing

In part 1 and part 5 of this series I created some giant disks with a virtual size of 263-1 bytes (8 exabytes). However these were stored in memory using nbdkit-memory-plugin so you could never allocate more space in these disks than available RAM plus swap.

This is a problem when testing some filesystems because the filesystem overhead (the space used to store superblocks, inode tables, block free maps and so on) can be 1% or more.

The solution to this is to back the virtual disks using a sparse file instead. XFS lets you create sparse files up to 263-1 bytes and you can serve them using nbdkit-file-plugin instead:

$ rm -f temp
$ truncate -s $(( 2**63 - 1 )) temp
$ stat -c %s temp
9223372036854775807
$ nbdkit file file=temp

nbdkit-file-plugin recently got a lot of updates to ensure it always maintains sparseness where possible and supports efficient zeroing, so make sure you’re using at least nbdkit ≥ 1.6.

Now you can serve this in the ordinary way and you should be able to allocate as much space as is available on the host filesystem:

# nbd-client -b 512 localhost /dev/nbd0
Negotiation: ..size = 8796093022207MB
Connected /dev/nbd0
# blockdev --getsize64 /dev/nbd0
9223372036854774784
# sgdisk -n 1 /dev/nbd0
# gdisk -l /dev/nbd0
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048  18014398509481948   8.0 EiB     8300

This command will still probably fail unless you have a lot of patience and a huge amount of space on your host:

# mkfs.xfs -K /dev/nbd0p1

Leave a comment

Filed under Uncategorized

Partitioning a 7 exabyte disk

In the latest nbdkit (and at the time of writing you will need nbdkit from git) you can type this magical incantation:

nbdkit data data="
       @0x1c0 2 0 0xee 0xfe 0xff 0xff 0x01 0  0 0 0xff 0xff 0xff 0xff
       @0x1fe 0x55 0xaa
       @0x200 0x45 0x46 0x49 0x20 0x50 0x41 0x52 0x54
                     0 0 1 0 0x5c 0 0 0
              0x9b 0xe5 0x6a 0xc5 0 0 0 0  1 0 0 0 0 0 0 0
              0xff 0xff 0xff 0xff 0xff 0xff 0x37 0  0x22 0 0 0 0 0 0 0
              0xde 0xff 0xff 0xff 0xff 0xff 0x37 0
                     0x72 0xb6 0x9e 0x0c 0x6b 0x76 0xb0 0x4f
              0xb3 0x94 0xb2 0xf1 0x61 0xec 0xdd 0x3c  2 0 0 0 0 0 0 0
              0x80 0 0 0 0x80 0 0 0  0x79 0x8a 0xd0 0x7e 0 0 0 0
       @0x400 0xaf 0x3d 0xc6 0x0f 0x83 0x84 0x72 0x47
                     0x8e 0x79 0x3d 0x69 0xd8 0x47 0x7d 0xe4
              0xd5 0x19 0x46 0x95 0xe3 0x82 0xa8 0x4c
                     0x95 0x82 0x7a 0xbe 0x1c 0xfc 0x62 0x90
              0x80 0 0 0 0 0 0 0  0x80 0xff 0xff 0xff 0xff 0xff 0x37 0
              0 0 0 0 0 0 0 0  0x70 0 0x31 0 0 0 0 0
       @0x6fffffffffffbe00
              0xaf 0x3d 0xc6 0x0f 0x83 0x84 0x72 0x47
                     0x8e 0x79 0x3d 0x69 0xd8 0x47 0x7d 0xe4
              0xd5 0x19 0x46 0x95 0xe3 0x82 0xa8 0x4c
                     0x95 0x82 0x7a 0xbe 0x1c 0xfc 0x62 0x90
              0x80 0 0 0 0 0 0 0  0x80 0xff 0xff 0xff 0xff 0xff 0x37 0
              0 0 0 0 0 0 0 0  0x70 0 0x31 0 0 0 0 0
       @0x6ffffffffffffe00
              0x45 0x46 0x49 0x20 0x50 0x41 0x52 0x54
                     0 0 1 0 0x5c 0 0 0
              0x6c 0x76 0xa1 0xa0 0 0 0 0
                     0xff 0xff 0xff 0xff 0xff 0xff 0x37 0
              1 0 0 0 0 0 0 0  0x22 0 0 0 0 0 0 0
              0xde 0xff 0xff 0xff 0xff 0xff 0x37 0
                     0x72 0xb6 0x9e 0x0c 0x6b 0x76 0xb0 0x4f
              0xb3 0x94 0xb2 0xf1 0x61 0xec 0xdd 0x3c
                     0xdf 0xff 0xff 0xff 0xff 0xff 0x37 0
              0x80 0 0 0 0x80 0 0 0  0x79 0x8a 0xd0 0x7e 0 0 0 0
" size=7E

When nbdkit starts up you can connect to it in a few ways. If you have a qemu virtual machine running an installed operating system, attach a second NBD drive. On the command line that would look like this:

$ qemu-system-x86_64 ... -file drive=nbd:localhost:10809,if=virtio

Or you could use guestfish:

$ guestfish --format=raw -a nbd://localhost
><fs> run

What this creates is a 7 exabyte disk with a single, empty GPT partition.

7 exabytes is a lot. It’s 8,070,450,532,247,928,832 bytes, or about 7 billion gigabytes. In fact even with ever increasing storage capacities in hard disk drives it’ll be a very long time before we get exabyte drives.

Peculiar things happen when you try to use this disk in Linux. For sure the kernel has no problem finding the partition, creating a /dev/sda1 device, and returning the right size. Ext4 has a maximum filesystem size of merely 1 exabyte so it won’t even try to make a filesystem, and on my laptop trying to write an XFS filesystem on the partition just caused qemu to grind away at 200% CPU making no apparent progress even after many minutes.

Why not throw your own favourite disk analysis tools at this image and see what they make of it.

Finally how did I create the magic command line above?

I used the nbdkit memory plugin to make an empty 7 EB disk. Note this requires a recent version of the plugin which was rewritten with support for sparse arrays.

$ nbdkit memory size=7E

Then I could connect to it with guestfish to create the GPT partition:

$ guestfish --format=raw -a nbd://localhost
><fs> run
><fs> part-disk /dev/sda gpt

GPT uses a partition table at the beginning and end of the disk. So – still in guestfish – I could sample what the partitioning tool had written to both ends of the disk:

><fs> pread-device /dev/sda 1M 0 | cat > start
><fs> pread-device /dev/sda 1M 8070450532246880256 | cat > end

I then used hexdump + manual inspection of the hexdump output to write the long data string:

$ hexdump -C start
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001c0  02 00 ee fe ff ff 01 00  00 00 ff ff ff ff 00 00  |................|
000001d0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
000001f0  00 00 00 00 00 00 00 00  00 00 00 00 00 00 55 aa  |..............U.|
...

translates to …

@0x1c0 2 0 0xee 0xfe 0xff 0xff 0x01 0  0 0 0xff 0xff 0xff 0xff
@0x1fe 0x55 0xaa

1 Comment

Filed under Uncategorized

Tip: guestmount (FUSE mount) every filesystem in a disk image

Maxim asks an interesting question which is if you’ve got a disk image, how do you mount every filesystem onto your host. Like this:

$ ./fs-mount.pl rhel-5.11.img /tmp/fs &
$ cd /tmp/fs
/tmp/fs$ ls
dev
/tmp/fs$ cd dev
/tmp/fs/dev$ ls
sda1  sda2  sda3
/tmp/fs/dev$ cd sda2
/tmp/fs/dev/sda2$ ls
bin   dev  home  lib64       media  mnt  proc  sbin     srv  tmp  var
boot  etc  lib   lost+found  misc   opt  root  selinux  sys  usr
...
$ cd /tmp
$ guestunmount /tmp/fs

The answer is this surprisingly short Perl script.

#!/usr/bin/perl

use warnings;
use strict;

use Sys::Guestfs;

die "usage: $0 disk1 [disk2 ...] mountpoint\n" if @ARGV <= 1;

my $mp = pop;

my $g = Sys::Guestfs->new ();
foreach (@ARGV) {
    $g->add_drive ($_);
}
$g->launch ();

# Examine the filesystems.
my %fses = $g->list_filesystems ();

# Create the mountpoint directories (in the libguestfs namespace)
# and mount the filesystems on them.
foreach my $fs (sort keys %fses) {
    # mkmountpoint is really the same as mkdir.  Unfortunately there
    # is no 'mkdir -p' equivalent, so we have to do this instead:
    my @components = split ("/", $fs);
    for (my $i = 1; $i < @components; ++$i) {
        my $dir = "/" . join ("/", @components[1 .. $i]);
        eval { $g->mkmountpoint ($dir) }
    }

    # Don't fail if the filesystem can't be mounted, eg. it's swap.
    eval { $g->mount ($fs, $fs) }
}

# Export the filesystem on the host.
$g->mount_local ($mp);
$g->mount_local_run ();

# Close nicely since we mounted everything writable.
$g->shutdown ();
$g->close ();

Leave a comment

Filed under Uncategorized

Importing KVM guests to oVirt or RHEV

One of the tools I maintain is virt-v2v. It’s a program to import guests from foreign hypervisors like VMware and Xen, to KVM. It only does conversions to KVM, not the other way. And a feature I intentionally removed in RHEL 7 was importing KVM → KVM.

Why would you want to “import” KVM → KVM? Well, no reason actually. In fact it’s one of those really bad ideas for V2V. However it used to have a useful purpose: oVirt/RHEV can’t import a plain disk image, but virt-v2v knows how to import things to oVirt, so people used virt-v2v as backdoor for this missing feature.

Removing this virt-v2v feature has caused a lot of moaning, but I’m adamant it’s a very bad idea to use virt-v2v as a way to import disk images. Virt-v2v does all sorts of complex filesystem and Windows Registry manipulations, which you don’t want and don’t need if your guest already runs on KVM. Worst case, you could even end up breaking your guest.

However I have now written a replacement script that does the job: http://git.annexia.org/?p=import-to-ovirt.git

If your guest is a disk image that already runs on KVM, then you can use this script to import the guest. You’ll need to clone the git repo, read the README file, and then read the tool’s man page. It’s pretty straightforward.

There are a few shortcomings with this script to be aware of:

  1. The guest must have virtio drivers installed already, and must be able to boot off virtio-blk (default) or virtio-scsi. For virtio-scsi, you’ll need to flip the checkbox in the ‘Advanced’ section of the guest parameters in the oVirt UI.
  2. It should be possible to import guests that don’t have virtio drivers installed, but can use IDE. This is a missing feature (patches welcome).
  3. No network card is added to the guest, so it probably won’t have network when it boots. It should be possible to add a network card through the UI, but really this is something that needs to be fixed in the script (patches welcome).
  4. It doesn’t handle all the random packaging formats that guests come in, like OVA. You’ll have to extract these first and import just the disk image.
  5. It’s not in any way supported or endorsed by Red Hat.

28 Comments

Filed under Uncategorized

Tip: compress raw disk images using qcow2

$ qemu-img convert -c -f raw -O qcow2 win.img winq.img
$ ls -lh win*
-rw-r--r--. 1 root   root    10G May 18 14:34 win.img
-rw-r--r--. 1 rjones rjones 6.5G May 18 14:59 winq.img

Of course the degree of compression you get depends on the amount of zeroed free space in the image, and the amount by which qcow2 is able to compress the other blocks containing data.

qcow2 uses zlib for compression, so the compression won’t be that spectacular. It’s better to keep the filesystems “sparse” in the first place, by ensuring unused disk blocks are zeroed.

For ext2/3 filesystems, Fedora ships a utility called zerofree, which you can either run inside the guest, or run offline from guestfish. This turns unused filesystem blocks into zeroes, which will make outside compression eg with qcow2 much more efficient. For other filesystems, the usual trick is to create a large file of all zeroes until you fill up the free space, then delete it.

qcow2 files are completely interchangeable with raw disk images:

$ virt-df -h win.img
Filesystem                                Size       Used  Available  Use%
win.img:/dev/vda1                       100.0M      24.1M      75.9M   25%
win.img:/dev/vda2                         9.9G       7.4G       2.5G   75%
$ virt-df -h winq.img
Filesystem                                Size       Used  Available  Use%
winq.img:/dev/vda1                      100.0M      24.1M      75.9M   25%
winq.img:/dev/vda2                        9.9G       7.4G       2.5G   75%

4 Comments

Filed under Uncategorized

virt-rescue

Virt-rescue is a new tool proposed for inclusion in libguestfs. It lets you get a rescue shell on your virtual machine, good for making quick, interactive, unstructured fixes:

$ virt-rescue F11x64

Welcome to virt-rescue, the libguestfs rescue shell.

Note: The contents of / are the rescue appliance.
You have to mount the guest's partitions under /sysroot
before you will be able to examine them.

><rescue> /sbin/e2fsck /dev/vg_f11x64/lv_root
[...]
><rescue> mount /dev/vg_f11x64/lv_root /sysroot
><rescue> ls /sysroot/
bin   dev  home  lib64       media  opt   root  selinux  sys  usr
boot  etc  lib   lost+found  mnt    proc  sbin  srv      tmp  var
><rescue> sync
><rescue> umount /sysroot
><rescue> exit

Of course we encourage you to continue using libguestfs and guestfish for making properly structured changes through a stable, programmable API!

Leave a comment

Filed under Uncategorized