Tag Archives: kvm

Multiple libguestfs appliances in parallel, part 4

[Part 1, part 2, part 3.]

Finally I modified the test to do some representative work: We now load a real Windows XP guest, inspect it (a heavyweight operation), and mount and stat each filesystem. I won’t reproduce the entire test program again because only the test subroutine has changed:

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/tmp/winxp.img");
    $g->launch ();

    # Inspect the guest (ignore the result).
    $g->inspect_os ();

    # Approximate what virt-df does.
    my %fses = $g->list_filesystems ();
    foreach (keys %fses) {
        my $mounted = 0;
        eval { $g->mount_ro ($_, "/"); $mounted = 1; };
        if ($mounted) {
            $g->statvfs ("/");
            $g->umount_all ();
        }
    }

    return $g;
}

Even with all that work going on, I was able to inspect more than 1 disk per second on my laptop, and run 60 threads in parallel with good performance and scalability:

data

Leave a Comment

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 1

I wrote the Perl script below to find out how many libguestfs appliances we can start in parallel. The results are surprising (-ly good):

data

What’s happening here is that we’re booting up a KVM guest with 500 MB of memory, booting the Linux kernel, booting a minimal userspace, then shutting the whole lot down. And then doing that in parallel with 1, 2, .. 20 threads.

[Note: Hardware is my Lenovo x230 laptop with an Intel Core(TM) i7-3520M CPU @ 2.90GHz, 2 cores with 4 threads, 16 GB of RAM with approx. 13 GB free. Software is: Fedora 18 with libguestfs 1.20.2, libvirt 1.0.2 (from Rawhide), qemu 1.4.0 (from Rawhide)]

The test fails at 21 threads because there isn’t enough free memory, so each qemu instance is allocating around 660 MB of RAM. This is wrong: It failed because libvirt out of the box limits the maximum number of clients to 20. See next part in this series.

Up to 4 parallel launches, you can clearly see the effect of better utilization of the parallelism of the CPU — the total elapsed time hardly moves, even though we’re doing up to 4 times more work.


#!/usr/bin/perl -w

use strict;
use threads;
use Sys::Guestfs;
use Time::HiRes qw(time);

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/dev/null");
    $g->launch ();
}

# Get everything into cache.
test (); test (); test ();

# Test increasing numbers of threads until it fails.
for my $nr_threads (1..100) {
    my $start_t = time ();
    my @threads;
    foreach (1..$nr_threads) {
        push @threads, threads->create (\&test)
    }
    foreach (@threads) {
        $_->join ();
        if (my $err = $_->error ()) {
            die "launch failed with nr_threads = $nr_threads: $err"
        }
    }
    my $end_t = time ();
    printf ("%d %.2f\n", $nr_threads, $end_t - $start_t);
}

2 Comments

Filed under Uncategorized

What is the overhead of qemu/KVM?

To clarify, what is the memory overhead, or how many guests can you cram onto a single host, memory being the typical limiting factor when you virtualize.

This was the question someone asked at work today. I don’t know the answer either, but the small program I wrote (below) aims to find out. If you believe the numbers below from qemu 1.2.2 running on Fedora 18, then the overhead is around 150 MB per qemu process that cannot be shared, plus around 200 MB per host (that is, shared between all qemu processes).

guest size 256 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 404.20 MB
Shared writable memory: 0.03 MB

guest size 512 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 643.76 MB
Shared writable memory: 0.03 MB

guest size 1024 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 1172.38 MB
Shared writable memory: 0.03 MB

guest size 2048 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 2237.16 MB
Shared writable memory: 0.03 MB

guest size 4096 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 4245.13 MB
Shared writable memory: 0.03 MB

The number to pay attention to is “Anonymous memory” since that is what cannot be shared between guests (except if you have KSM and your guests are such that KSM can be effective).

There are some known shortcomings with my testing methodology that I summarise below. You may be able to see others.

  1. We’re testing a libguestfs appliance. A libguestfs appliance does not have the full range of normal qemu devices that a real guest would have, and so the overhead of a real guest is likely to be higher. The main difference is probably lack of a video device (so no video RAM is evident).
  2. This uses virtio-scsi. Real guests use IDE, virtio-blk, etc which may have quite different characteristics.
  3. This guest has one user network device (ie. SLIRP) which could be quite different from a real network device.
  4. During the test, the guest only runs for a few seconds. A normal, long-running guest would experience qemu memory growth or even memory leaks. You could fix this relatively easily by adding some libguestfs busy-work after the launch.
  5. The guest does not do any significant writes, so during the test qemu won’t be storing any cached or in-flight data blocks.
  6. It only accounts for memory used by qemu in userspace, not memory used by the host kernel on behalf of qemu.
  7. The effectiveness or otherwise of KSM is not tested. It’s likely that KSM depends heavily on your workload, so it wouldn’t be fair to publish any KSM figures.
  8. The script uses /proc/PID/maps but it would be better to use smaps so that we can see how much of the file-backed copy-on-write segments have actually been copied. Currently the script overestimates these by assuming that (eg) all the data pages from a library would be dirtied by qemu.

Another interesting question would be whether qemu is getting better or worse over time.

#!/usr/bin/perl -w

# Estimate memory usage of qemu-kvm at different guest RAM sizes.
# By Richard W.M. Jones <rjones@redhat.com>

use strict;
use Sys::Guestfs;
no warnings "portable"; # 64 bit platform required.

# Loop over different guest RAM sizes.
my $mbytes;
for $mbytes (256, 512, 1024, 2048, 4096) {
    print "guest size ", $mbytes, " MB:\n";

    my $g = Sys::Guestfs->new;

    # Ensure we're using the direct qemu launch backend, otherwise
    # libvirt stops us from finding the qemu PID.
    $g->set_attach_method ("appliance");

    # Set guest memory size.
    $g->set_memsize ($mbytes);

    # Enable user networking just to be more like a "real" guest.
    $g->set_network (1);

    # Launch guest with one dummy disk.
    $g->add_drive ("/dev/null");
    $g->launch ();

    # Get process ID of qemu.
    my $pid = $g->get_pid ();
    die unless $pid > 0;

    # Read the memory maps of the guest.
    open MAPS, "/proc/$pid/maps" or die "cannot open memory map of pid $pid";
    my @maps = <MAPS>;
    close MAPS;

    # Kill qemu.
    $g->close ();

    # Parse the memory maps.
    my $shared_file_backed = 0;
    my $anonymous = 0;
    my $shared_writable = 0;

    my $map;
    foreach $map (@maps) {
        chomp $map;

        if ($map =~ m/
                     ^([0-9a-f]+)-([0-9a-f]+) \s
                     (....) \s
                     [0-9a-f]+ \s ..:.. \s (\d+) \s+ (\S+)?
                    /x) {
            my ($start, $end) = (hex $1, hex $2);
            my $size = $end - $start;
            my $mode = $3;
            my $inode = $4;
            my $filename = $5; # could also be "[heap]", "[vdso]", etc.

            # Shared file-backed text: r-xp, r--p, etc. with a file backing.
            if ($inode != 0 &&
                ($mode eq "r-xp" || $mode eq "r--p" || $mode eq "---p")) {
                $shared_file_backed += $size;
            }

            # Anonymous memory: rw-p.
            elsif ($mode eq "rw-p") {
                $anonymous += $size;
            }

            # Writable and shared.  Not sure what this is ...
            elsif ($mode eq "rw-s") {
                $shared_writable += $size;
            }

            # Ignore [vdso], [vsyscall].
            elsif (defined $filename &&
                   ($filename eq "[vdso]" || $filename eq "[vsyscall]")) {
            }

            # Ignore ---p with no file.  What's this?
            elsif ($inode == 0 && $mode eq "---p") {
            }

            # Ignore kvm-vcpu.
            elsif ($filename eq "anon_inode:kvm-vcpu") {
            }

            else {
                warn "warning: could not parse '$map'\n";
            }
        }
        else {
            die "incorrect maps format: '$map'";
        }
    }

    printf("Shared memory backed by a file: %.2f MB\n",
           $shared_file_backed / 1024.0 / 1024.0);
    printf("Anonymous memory (eg. malloc, COW, stack), not shared: %.2f MB\n",
           $anonymous / 1024.0 / 1024.0);
    printf("Shared writable memory: %.2f MB\n",
           $shared_writable / 1024.0 / 1024.0);

    print "\n";
}


1 Comment

Filed under Uncategorized

Handout for my talk at KVM Forum

The handout is here (PDF). The talk itself can be downloaded from this git repository.

For more information about libguestfs, there is copious documentation on the website.

6 Comments

Filed under Uncategorized

KVM Forum Barcelona next week

I am giving a short talk about libguestfs at the Linux Foundation KVM Forum in Barcelona next week (full schedule here).

Leave a Comment

Filed under Uncategorized

FreeDOS 1.1 in KVM

FreeDOS 1.1 running in KVM with 4 MB (sic) of virtual RAM:

This is more than just a silly experiment. Being able to run very small VMs (and this is by far the smallest real VM I have been able to run) allows us to test the scalability of KVM to hundreds or thousands of guests using standard hardware.

It has revealed a couple of bugs in libguestfs too …

1 Comment

Filed under Uncategorized

Why can’t you live migrate from newer to older versions of qemu/KVM?

I answered a question on a mailing list about live migration versus copying guests between different versions of KVM on RHEL. The complainant observed that you can’t live migrate from RHEL 6.2 to RHEL 6.1. But you can shut down a guest, copy it from RHEL 6.2 to 6.1, and boot it.

Why is there this difference? It comes down to how live migration is implemented.

Live migration is completely different from shutting down and copying a guest. During live migration we must send the complete state of system RAM, virtual CPUs, and all virtual devices, over to the remote side. In qemu this is done by sending “VMState” structures over the wire, one struct for each device that the guest is using. These structures are mostly a memory dump, but so that you don’t need byte-for-byte compatible versions of qemu when live migrating, each struct is preceded by a version ID.

The receiving qemu checks that it can handle that version of the struct. In some (but not all) cases, qemu knows how to “upgrade”, say, a version 1 struct into a version 2 struct. Downgrades are never possible, and some upgrades are also rejected (eg. if version 2 is a complete rewrite over version 1, then it’s possible for a device to refuse to deal with version 1 structs at all).

Downgrades are not possible, and that’s the basic reason why live migration doesn’t work from a newer to an older version of qemu.

Why does copying work? When a VM is shut down, there is no RAM, vCPU or device state. All the state that remains is the contents of the hard disk. If the hard disk is booted on an older qemu, then the kernel, during boot, will test the available CPUs, devices, etc and adjust itself, exactly the same as if you took a physical hard disk and transplanted it between real machines.

Indirectly related to all this is the qemu machine type. If you created guests on RHEL 6.0, then you may notice the libvirt XML contains:

<type arch='x86_64' machine='rhel6.0.0'>hvm</type>

This machine type stays with the guest even when you update the host.

The machine type controls what devices and PCI slots we present to the guest at boot, and it’s mainly there so that Windows doesn’t try to reactivate itself when you upgrade your host. The newer qemu presents the old devices and PCI assignments, so Windows doesn’t “notice” the updated hypervisor.

For Linux guests this is usually not a problem you have to worry about and you can go ahead and change the machine type at will.

3 Comments

Filed under Uncategorized

CVE-2011-4127: privilege escalation from qemu / KVM guests

Paolo Bonzini discovered that you can issue SCSI ioctls to virtio devices which are passed down to the host.

The very unfortunate part about this is it easily allows guests to read and write parts of host devices that they are not supposed to. For example, if a guest was confined to host device /dev/sda3, it could read or write other partitions or the boot sector on /dev/sda.

In your guest, try this command which reads the host boot sector:

sg_dd if=/dev/vda blk_sgio=1 bs=512 count=1 of=output

Swap the if and of arguments around to exploit the host.

Here’s Paolo’s write-up on LKML.

Here is the libguestfs mitigation patch. The libvirt mitigation patch.

Leave a Comment

Filed under Uncategorized

KVM Forum 2011 videos

Some of them are on youtube here:

http://www.youtube.com/playlist?list=PL7C0F52E2227156B3

Open formats here:

http://www.montanalinux.org/video-kvm-forums-2011.html

(Thanks Dan Berrange)

Leave a Comment

Filed under Uncategorized

KVM Forum 2011 slides are up

The slides are available here. Videos will be available later.

Leave a Comment

Filed under Uncategorized