Tag Archives: qemu

libguestfs and qemu ssh driver

The qemu ssh block device is now up to version 7 … although sadly not upstream yet.

Nevertheless by applying this patch to libguestfs you can use libguestfs to access remote disks over ssh:

$ export LIBGUESTFS_QEMU=~/d/qemu/qemu.wrapper
$ export LIBGUESTFS_BACKEND=direct
$ ./run ./fish/guestfish

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

><fs> add /tmp/f17x64.img readonly:true format:raw \
        protocol:ssh server:onuma
><fs> run
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
><fs> inspect-os
/dev/vg_f17x64/lv_root
><fs> inspect-get-product-name /dev/vg_f17x64/lv_root
Fedora release 17 (Beefy Miracle)
><fs> list-filesystems
/dev/sda1: ext4
/dev/vg_f17x64/lv_root: ext4
/dev/vg_f17x64/lv_swap: swap
><fs> mount /dev/vg_f17x64/lv_root /
><fs> cat /etc/redhat-release
Fedora release 17 (Beefy Miracle)

Everything just works as if this were a local disk.

There are a couple of minor caveats (the major caveat being none of this is upstream): Firstly you have to have ssh-agent set up. Secondly the remote host must be in your known_hosts file (if not, do ssh remotehost first to add it).

Leave a comment

Filed under Uncategorized

QEMU ssh block device

I wrote a small patch (intro, patch) which adds a Secure Shell (ssh) block device to qemu. With this patch you could access a remote disk image or device by doing:

qemu -drive file=ssh://host/path/to/file,if=virtio,cache=none

QEMU ssh’es into “host” and opens /path/to/file. For the initial version of this patch you will need to set up ssh-agent access to the remote server.

The motivation behind this patch is to allow libguestfs to access remote disks using ssh the same way we already do with NBD. Secure Shell is ubiquitous, so for the majority of users libguestfs-over-qemu/ssh would let them use disks remotely with zero configuration.

4 Comments

Filed under Uncategorized

Accessing NBD disks using libguestfs

It’s always been possible, but clumsy, to access Network Block Device (NBD) disks from libguestfs, but starting in libguestfs 1.22 we hope to make this (and Gluster, Ceph and Sheepdog access) much simpler.

The first change is upstream in libguestfs 1.21.21. You can add an NBD disk directly.

To show this using guestfish, I’ll start an NBD server. This could be started on another machine, but to make things simple I’ll start this server on the same machine:

$ qemu-nbd f18x64.img -t

f18x64.img is the disk image that I want to export. The -t option makes the qemu-nbd server persistent (ie. it doesn’t just exit after serving the first client).

Now we can connect to this server using guestfish as follows:

$ guestfish

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

><fs> add-drive "" format:raw protocol:nbd server:localhost
><fs> run

The empty "" (quotes) are the export name. Since qemu-nbd doesn’t support export names, we can leave this empty. The main change is to specify the protocol (nbd) and the server that libguestfs should connect to (localhost, but a remote host would also work). I haven’t specified a port number here because both the client and server are using the standard NBD port (10809), but you could use server:localhost:NNN to use a different port number if needed.

Ordinary guestfish commands just work:

><fs> list-filesystems
/dev/sda1: ext4
/dev/fedora/root: ext4
/dev/fedora/swap: swap
><fs> inspect-os
/dev/fedora/root
><fs> inspect-get-product-name /dev/fedora/root
Fedora release 18 (Spherical Cow)

The next steps are to:

  1. Add support for more technologies. At least: Gluster, Ceph, Sheepdog and iSCSI, since those are all supported by qemu so we can easily leverage that support in libguestfs.
  2. Change the guestfish -a option to make adding remote drives possible from the command line.

The obvious [but not yet implemented] way to change the -a option is to allow a URI to be specified. For example:

$ guestfish -a nbd://localhost/exportname

where the elements of the URI like protocol, transport, server, port number and export name translate naturally into parameters of the add-drive API.

Leave a comment

Filed under Uncategorized

Accessing Ceph (rbd), sheepdog, etc using libguestfs

Thanks to infernix who contributed this tip on how to use libguestfs to access Ceph (and in theory, sheepdog, gluster, iscsi and more) devices.

If you apply this small patch to libguestfs you can use these distributed filesystems straight away by doing:

$ guestfish
><fs> set-attach-method appliance
><fs> add-drive /dev/null
><fs> config -set drive.hd0.file=rbd:pool/volume
><fs> run

… followed by usual guestfish commands.

This is a temporary hack, until we properly model Ceph (etc) through the libguestfs stable API. Nevertheless it works as follows:

  1. The add-drive /dev/null adds a drive, known to libguestfs.
  2. Implicitly this means that libguestfs adds a -drive option when it runs qemu.
  3. The custom qemu -set drive.hd0.file=... parameter modifies the preceding -drive option added by libguestfs so that the file is changed from /dev/null to whatever you want. In this case, to a Ceph rbd:... path.

3 Comments

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 4

[Part 1, part 2, part 3.]

Finally I modified the test to do some representative work: We now load a real Windows XP guest, inspect it (a heavyweight operation), and mount and stat each filesystem. I won’t reproduce the entire test program again because only the test subroutine has changed:

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/tmp/winxp.img");
    $g->launch ();

    # Inspect the guest (ignore the result).
    $g->inspect_os ();

    # Approximate what virt-df does.
    my %fses = $g->list_filesystems ();
    foreach (keys %fses) {
        my $mounted = 0;
        eval { $g->mount_ro ($_, "/"); $mounted = 1; };
        if ($mounted) {
            $g->statvfs ("/");
            $g->umount_all ();
        }
    }

    return $g;
}

Even with all that work going on, I was able to inspect more than 1 disk per second on my laptop, and run 60 threads in parallel with good performance and scalability:

data

Leave a comment

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 3

A problem encountered in part 2 was that I couldn’t measure the maximum number of parallel libguestfs appliances that can be run at the same time. There are two reasons for that. The simpler one is that libvirt has a limit of 20 connections, which is easily overcome by setting LIBGUESTFS_ATTACH_METHOD=appliance to eliminate libvirt and run qemu directly. The harder one is that by the time the last appliances in the test are starting to launch, earlier ones have already shut down and their threads have exited.

What is needed is for the test to work in two phases: In the first phase we start up all the threads and launch all the appliances. Only when this is complete do we enter the second phase where we shut down all the appliances.

The easiest way to do this is by modifying the test to use a barrier (or in fact to implement a barrier using the condition primitives). See the modified test script below.

With the modified test script I was able to run ≥ 110 and < 120 parallel appliances in ~ 13 GB of free RAM, or around 120 MB / appliance, still with excellent performance and nearly linear scalability:

data


#!/usr/bin/perl -w

use strict;
use threads qw(yield);
use threads::shared qw(cond_broadcast cond_wait lock);
use Sys::Guestfs;
use Time::HiRes qw(time);

my $nr_threads_launching :shared;

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/dev/null");
    $g->launch ();
    return $g;
}

# Get everything into cache.
test (); test (); test ();

sub thread {
    my $g = test ();

    {
        lock ($nr_threads_launching);
        $nr_threads_launching--;
        cond_broadcast ($nr_threads_launching);
        cond_wait ($nr_threads_launching) until $nr_threads_launching == 0;
    }

    $g->close ();
}

# Test increasing numbers of threads until it fails.
for (my $nr_threads = 10; $nr_threads < 1000; $nr_threads += 10) {
    my $start_t = time ();

    $nr_threads_launching = $nr_threads;

    my @threads;
    foreach (1..$nr_threads) {
        push @threads, threads->create (\&thread)
    }
    foreach (@threads) {
        $_->join ();
        if (my $err = $_->error ()) {
            die "launch failed with $nr_threads threads: $err"
        }
    }

    my $end_t = time ();
    printf ("%d %.2f\n", $nr_threads, $end_t - $start_t);
}

1 Comment

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 2

One problem with the previous test is that I hit a limit of 20 parallel appliances and mistakenly thought that I’d hit a memory limit. In fact libvirt out of the box limits the number of client connections to 20. You can adjust libvirt’s limit by editing /etc/libvirt/libvirtd.conf, but easier for us is to simply eliminate libvirt from the equation by doing:

export LIBGUESTFS_ATTACH_METHOD=appliance

which causes libguestfs to run qemu directly. In my first test I reached 48 parallel launches before I killed the program (because that’s a lot of parallelism and there seemed no end in sight). Scalability of the libguestfs / qemu combination was excellent again:

data

But there’s more! (In the next part …)

2 Comments

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 1

I wrote the Perl script below to find out how many libguestfs appliances we can start in parallel. The results are surprising (-ly good):

data

What’s happening here is that we’re booting up a KVM guest with 500 MB of memory, booting the Linux kernel, booting a minimal userspace, then shutting the whole lot down. And then doing that in parallel with 1, 2, .. 20 threads.

[Note: Hardware is my Lenovo x230 laptop with an Intel Core(TM) i7-3520M CPU @ 2.90GHz, 2 cores with 4 threads, 16 GB of RAM with approx. 13 GB free. Software is: Fedora 18 with libguestfs 1.20.2, libvirt 1.0.2 (from Rawhide), qemu 1.4.0 (from Rawhide)]

The test fails at 21 threads because there isn’t enough free memory, so each qemu instance is allocating around 660 MB of RAM. This is wrong: It failed because libvirt out of the box limits the maximum number of clients to 20. See next part in this series.

Up to 4 parallel launches, you can clearly see the effect of better utilization of the parallelism of the CPU — the total elapsed time hardly moves, even though we’re doing up to 4 times more work.


#!/usr/bin/perl -w

use strict;
use threads;
use Sys::Guestfs;
use Time::HiRes qw(time);

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/dev/null");
    $g->launch ();
}

# Get everything into cache.
test (); test (); test ();

# Test increasing numbers of threads until it fails.
for my $nr_threads (1..100) {
    my $start_t = time ();
    my @threads;
    foreach (1..$nr_threads) {
        push @threads, threads->create (\&test)
    }
    foreach (@threads) {
        $_->join ();
        if (my $err = $_->error ()) {
            die "launch failed with nr_threads = $nr_threads: $err"
        }
    }
    my $end_t = time ();
    printf ("%d %.2f\n", $nr_threads, $end_t - $start_t);
}

2 Comments

Filed under Uncategorized

What is the overhead of qemu/KVM?

To clarify, what is the memory overhead, or how many guests can you cram onto a single host, memory being the typical limiting factor when you virtualize.

This was the question someone asked at work today. I don’t know the answer either, but the small program I wrote (below) aims to find out. If you believe the numbers below from qemu 1.2.2 running on Fedora 18, then the overhead is around 150 MB per qemu process that cannot be shared, plus around 200 MB per host (that is, shared between all qemu processes).

guest size 256 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 404.20 MB
Shared writable memory: 0.03 MB

guest size 512 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 643.76 MB
Shared writable memory: 0.03 MB

guest size 1024 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 1172.38 MB
Shared writable memory: 0.03 MB

guest size 2048 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 2237.16 MB
Shared writable memory: 0.03 MB

guest size 4096 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 4245.13 MB
Shared writable memory: 0.03 MB

The number to pay attention to is “Anonymous memory” since that is what cannot be shared between guests (except if you have KSM and your guests are such that KSM can be effective).

There are some known shortcomings with my testing methodology that I summarise below. You may be able to see others.

  1. We’re testing a libguestfs appliance. A libguestfs appliance does not have the full range of normal qemu devices that a real guest would have, and so the overhead of a real guest is likely to be higher. The main difference is probably lack of a video device (so no video RAM is evident).
  2. This uses virtio-scsi. Real guests use IDE, virtio-blk, etc which may have quite different characteristics.
  3. This guest has one user network device (ie. SLIRP) which could be quite different from a real network device.
  4. During the test, the guest only runs for a few seconds. A normal, long-running guest would experience qemu memory growth or even memory leaks. You could fix this relatively easily by adding some libguestfs busy-work after the launch.
  5. The guest does not do any significant writes, so during the test qemu won’t be storing any cached or in-flight data blocks.
  6. It only accounts for memory used by qemu in userspace, not memory used by the host kernel on behalf of qemu.
  7. The effectiveness or otherwise of KSM is not tested. It’s likely that KSM depends heavily on your workload, so it wouldn’t be fair to publish any KSM figures.
  8. The script uses /proc/PID/maps but it would be better to use smaps so that we can see how much of the file-backed copy-on-write segments have actually been copied. Currently the script overestimates these by assuming that (eg) all the data pages from a library would be dirtied by qemu.

Another interesting question would be whether qemu is getting better or worse over time.

#!/usr/bin/perl -w

# Estimate memory usage of qemu-kvm at different guest RAM sizes.
# By Richard W.M. Jones <rjones@redhat.com>

use strict;
use Sys::Guestfs;
no warnings "portable"; # 64 bit platform required.

# Loop over different guest RAM sizes.
my $mbytes;
for $mbytes (256, 512, 1024, 2048, 4096) {
    print "guest size ", $mbytes, " MB:\n";

    my $g = Sys::Guestfs->new;

    # Ensure we're using the direct qemu launch backend, otherwise
    # libvirt stops us from finding the qemu PID.
    $g->set_attach_method ("appliance");

    # Set guest memory size.
    $g->set_memsize ($mbytes);

    # Enable user networking just to be more like a "real" guest.
    $g->set_network (1);

    # Launch guest with one dummy disk.
    $g->add_drive ("/dev/null");
    $g->launch ();

    # Get process ID of qemu.
    my $pid = $g->get_pid ();
    die unless $pid > 0;

    # Read the memory maps of the guest.
    open MAPS, "/proc/$pid/maps" or die "cannot open memory map of pid $pid";
    my @maps = <MAPS>;
    close MAPS;

    # Kill qemu.
    $g->close ();

    # Parse the memory maps.
    my $shared_file_backed = 0;
    my $anonymous = 0;
    my $shared_writable = 0;

    my $map;
    foreach $map (@maps) {
        chomp $map;

        if ($map =~ m/
                     ^([0-9a-f]+)-([0-9a-f]+) \s
                     (....) \s
                     [0-9a-f]+ \s ..:.. \s (\d+) \s+ (\S+)?
                    /x) {
            my ($start, $end) = (hex $1, hex $2);
            my $size = $end - $start;
            my $mode = $3;
            my $inode = $4;
            my $filename = $5; # could also be "[heap]", "[vdso]", etc.

            # Shared file-backed text: r-xp, r--p, etc. with a file backing.
            if ($inode != 0 &&
                ($mode eq "r-xp" || $mode eq "r--p" || $mode eq "---p")) {
                $shared_file_backed += $size;
            }

            # Anonymous memory: rw-p.
            elsif ($mode eq "rw-p") {
                $anonymous += $size;
            }

            # Writable and shared.  Not sure what this is ...
            elsif ($mode eq "rw-s") {
                $shared_writable += $size;
            }

            # Ignore [vdso], [vsyscall].
            elsif (defined $filename &&
                   ($filename eq "[vdso]" || $filename eq "[vsyscall]")) {
            }

            # Ignore ---p with no file.  What's this?
            elsif ($inode == 0 && $mode eq "---p") {
            }

            # Ignore kvm-vcpu.
            elsif ($filename eq "anon_inode:kvm-vcpu") {
            }

            else {
                warn "warning: could not parse '$map'\n";
            }
        }
        else {
            die "incorrect maps format: '$map'";
        }
    }

    printf("Shared memory backed by a file: %.2f MB\n",
           $shared_file_backed / 1024.0 / 1024.0);
    printf("Anonymous memory (eg. malloc, COW, stack), not shared: %.2f MB\n",
           $anonymous / 1024.0 / 1024.0);
    printf("Shared writable memory: %.2f MB\n",
           $shared_writable / 1024.0 / 1024.0);

    print "\n";
}


1 Comment

Filed under Uncategorized

Installing Ubuntu 12.04 PowerPPC on qemu

Fedora gave up building on ppc as a primary architecture a while back [edit: see comments], but Ubuntu has a working ppc build. This is useful for testing software because it’s a big endian architecture, and therefore breaks some assumptions made by software that has only seen an Intel (little endian) architecture.

Fortunately it’s very simple to install Ubuntu/ppc as a qemu guest. Here is how I did it:

  1. Download mini.iso from here.
  2. Compile qemu from git (it’s easy!) so you have a qemu-system-ppc binary with a working bios.
  3. Create a virtual hard disk: truncate -s 10G disk.img
  4. Boot the ISO: ./qemu-system-ppc -m 1024 -hda disk.img -cdrom mini.iso -boot d
  5. At the first prompt, type install and go through the installation.

At the end of the installation, it won’t install a boot loader, so the guest won’t be bootable without an external kernel and initrd. This is easy to arrange:

$ guestfish --ro -a disk.img -m /dev/sda2 \
    download /vmlinux vmlinux : \
    download /initrd.img initrd.img

With the external files vmlinux and initrd.img you can now boot your guest:

$ ./qemu-system-ppc -m 1024 \
    -hda disk.img \
    -kernel vmlinux -initrd initrd.img \
    -append "ro root=/dev/sda3"

10 Comments

Filed under Uncategorized