Tag Archives: kvm

KVM working on the Cubietruck

ctg

I managed to get KVM working on the Cubietruck last week. It’s not exactly simple, but this post describes in overview how to do it.

(1) You will need a Cubietruck, a CP2102 serial cable, a micro SDHC card, a card reader for your host computer, and a network patch cable (the board supports wifi but it doesn’t work with the newer kernel we’ll be using). Optional: 2.5″ SATA HDD or SSD.

(2) Start with Hans De Goede’s AllWinner remix of Fedora 19, and get that working. It’s important to read his README file carefully.

(3) Build this upstream kernel with this configuration:

make oldconfig
make menuconfig

In menuconfig, enable Large Page Address Extension (LPAE), and then enable KVM in the Virtualization menu.

LOADADDR=0x40008000 make uImage dtbs
make modules
sudo cp arch/arm/boot/uImage /boot/uImage.sunxi-test
sudo cp arch/arm/boot/dts/sun7i-a20-cubietruck.dtb /boot/sun7i-a20-cubietruck.dtb.sunxi-test
sudo make modules_install

Reboot, interrupt u-boot (using the serial console), and type the following commands to load the new kernel:

setenv bootargs console=ttyS0,115200 loglevel=9 earlyprintk ro rootwait root=/dev/mmcblk0p3
ext2load mmc 0 0x46000000 uImage.sunxi-test
ext2load mmc 0 0x4b000000 sun7i-a20-cubietruck.dtb.sunxi-test
env set fdt_high ffffffff
bootm 0x46000000 - 0x4b000000

(4) Build this modified u-boot which supports Hyp mode.

make cubietruck_config
make
sudo dd if=u-boot-sunxi-with-spl.bin of=/dev/YOURSDCARD bs=1024 seek=8

Reboot again, use the commands above to boot into the upstream kernel, and if everything worked you should see:

Brought up 2 CPUs
SMP: Total of 2 processors activated.
CPU: All CPU(s) started in HYP mode.
CPU: Virtualization extensions available.

Also /dev/kvm should exist.

(5) Hack QEMU to create Cortex-A7 CPUs using this one-line patch.

Then you should be able to create VMs using libvirt. Note if using libguestfs you will need to use the direct backend (LIBGUESTFS_BACKEND=direct) because of this libvirt bug.

5 Comments

Filed under Uncategorized

Creating a cloud-init config disk for non-cloud boots

There are lots of cloud disk images floating around. They are designed to run in clouds where there is a boot-time network service called cloud-init available that provides initial configuration. If that’s not present, or you’re just trying to boot these images in KVM/libvirt directly without any cloud, then things can go wrong.

Luckily it’s fairly easy to create a config disk (aka “seed disk”) which you attach to the guest and then let cloud-init in the guest get its configuration from there. No cloud, or even network, required.

I’m going to use a tool called virt-make-fs to make the config disk, as it’s easy to use and doesn’t require root. There are other tools around, eg. make-seed-disk which do a similar job. (NB: You might hit this bug in virt-make-fs, which should be fixed in the latest version).

I’m also using a cloud image downloaded from the Fedora project, but any cloud image should work.

First I create my cloud-init metadata. This consists of two files. meta-data contains host and network configuration:

instance-id: iid-123456
local-hostname: cloudy

user-data contains other custom configuration (note #cloud-config is
not a comment, it’s a directive to tell cloud-init the format of the file):

#cloud-config
password: 123456
runcmd:
 - [ useradd, -m, -p, "", rjones ]
 - [ chage, -d, 0, rjones ]

(The idea behind this split is probably not obvious, but apparently it’s because the meta-data is meant to be supplied by the Cloud, and the user-data is meant to be supplied by the Cloud’s customer. In this case, no cloud, so we’re going to supply both!)

I put these two files into a directory, and run virt-make-fs to create the config disk:

$ ls
meta-data  user-data
$ virt-make-fs --type=msdos --label=cidata . /tmp/seed.img
$ virt-filesystems -a /tmp/seed.img --all --long -h
Name      Type        VFS   Label   MBR  Size  Parent
/dev/sda  filesystem  vfat  cidata  -    286K  -
/dev/sda  device      -     -       -    286K  -

Now I need to pass some kernel options when booting the Fedora cloud image, and the only way to do that is if I boot from an external kernel & initrd. This is not as complicated as it sounds, and virt-builder has an option to get the kernel and initrd that I’m going to need:

$ virt-builder --get-kernel Fedora-cloud.raw
download: /boot/vmlinuz-3.9.5-301.fc19.x86_64 -> ./vmlinuz-3.9.5-301.fc19.x86_64
download: /boot/initramfs-3.9.5-301.fc19.x86_64.img -> ./initramfs-3.9.5-301.fc19.x86_64.img

Finally I’m going to boot the guest using KVM (you could also use libvirt with a little extra effort):

$ qemu-kvm -m 1024 \
    -drive file=Fedora-cloud.raw,if=virtio \
    -drive file=seed.img,if=virtio \
    -kernel ./vmlinuz-3.9.5-301.fc19.x86_64 \
    -initrd ./initramfs-3.9.5-301.fc19.x86_64.img \
    -append 'root=/dev/vda1 ro ds=nocloud-net'

You’ll be able to log in either as fedora/123456 or rjones (no password), and you should see that the hostname has been set to cloudy.

2 Comments

Filed under Uncategorized

Performance of User-Mode Linux as a libguestfs backend

As of libguestfs 1.23.16, the User-Mode Linux backend is now a supported feature upstream, meaning that at least it gets tested fully for each release.

I did some performance tests on the User-Mode Linux backend compared to the ordinary KVM-based appliance and the results are quite interesting.

The first test is to run the C API test suite using UML and KVM on baremetal. All times are in seconds, averaged over a few runs:

tests/c-api (baremetal) — UML: 630 — KVM: 332

UML is roughly half the speed, but do remember that the test is very system-call intensive, which is one of the worst cases for UML.

The same test again, but performed inside a KVM virtual machine (on the same hardware):

tests/c-api (virtualized) — UML: 334 — KVM: 961

The results of this are so surprising I went back and retested everything several times, but this is completely reproducible. UML runs the C API test suite about twice as fast virtualized as on baremetal.

KVM (no surprise) runs several times slower. Inside the VM there is no hardware virtualization, and so qemu-kvm has to fall back on TCG software emulation of everything.

One conclusion you might draw from this is that UML could be a better choice of backend if you want to use libguestfs inside a VM (eg. in the cloud). As always, you should measure your own workload.


The second test is of start-up times. If you want to use libguestfs to process a lot of disk images, this matters.

start-up (baremetal) — UML: 3.9 — KVM: 3.7
start-up (virtualized) — UML: 3.0 — KVM: 8-11

The start-up time of KVM virtualized was unstable, but appeared to be around 3 times slower than on baremetal. UML performs about the same in both cases.

A couple of conclusions that I take from this:

(1) Most of the time is now spent initializing the appliance, searching for LVM and RAID and so on. The choice of hypervisor makes no difference. This is never going to go away, even if libguestfs was rewritten to use (eg) containers, or if libguestfs linked directly to kernel code. It just takes this time for this kernel & userspace LVM/MD/filesystem code to initialize.

(2) The overhead of starting a KVM VM is not any different from starting a big Linux application. This is no surprise for people who have used KVM for a long time, but it’s counter-intuitive for most people who think that VMs “must” be heavyweight compared to ordinary processes.


The third test is of uploading data from the host into a disk image. I created a 1 GB disk image containing an ext2 filesystem, and I timed how long it took to upload 500 MB of data to a file on this filesystem.

upload (baremetal) — UML: 147 — KVM: 16
upload (virtualized) — UML: 149 — KVM: 73

KVM is predictably much slower when no hardware virtualization is available, by a factor of about 4.5 times.

UML is overall far slower than KVM, but it is at least consistent.

In order to work out why UML is so much slower, I wanted to find out if it was because of the emulated serial port that we push the data through, or because writes to the disk are slow, so I carried out some extra tests:

upload-no-write (baremetal) — UML: 141 — KVM: 11
upload-no-write (virtualized) — UML: 140 — KVM: 20
write-no-upload (baremetal) — UML: 7 — KVM: 13
write-no-upload (virtualized) — UML: 9 — KVM: 25

My conclusion is that the UML emulated serial device is over 10 times slower than KVM’s virtio-serial. This is a problem, but at least it’s a well-defined problem the UML team can fix with an example (virtio-serial) that it’s possible to do much better.

Finally, notice that UML appears faster than KVM at writes.

In fact what’s happening is a difference in caching modes: For safety, libguestfs forces KVM to bypass the host disk cache. This ensures that modifications made to disk images remain consistent even if there is a sudden power failure.

The UML backend currently uses the host cache, so the writes weren’t hitting the disk before the test finished (this is in fact a bug in UML since libguestfs performs an fsync inside the appliance, which UML does not honour).

As always with benchmarks, the moral is to take everything with a pinch of salt and measure your workloads!

3 Comments

Filed under Uncategorized

Nested virtualization (not) enabled

Interesting thing I learned a few days ago:

kvm: Nested Virtualization enabled

does not always mean that nested virtualization is being used.

If you use qemu’s software emulation (more often known as TCG) then it emulates a generic-looking AMD CPU with SVM (AMD’s virtualization feature).

AMD virtualization easily supports nesting (unlike Intel’s VT which is a massive PITA to nest), and when the KVM module is loaded, it notices the “AMD” host CPU with SVM and willingly enables nested virt. There’s actually a little bit of benefit to this because it avoids a second layer of TCG being needed if you did run a L2 guest in there (although it’s still going to be slow).

Leave a comment

Filed under Uncategorized

The boring truth: full virtualization and containerization both have their place

Containers are the future! and lots of misinformed talk.

The truth here is more boring. Both full-fat virtualization (KVM), and containers (LXC) are the future, and which you use will depend on what you want to do. You’ll probably use both. You’re probably using both now, but you don’t know it.

The first thing to say is that full virt and containers are used for different things, although there is a little bit of overlap. Containers are only useful when the kernel of all your VMs is identical. That means you can run 1000 copies of Fedora 18 as containers, probably not something you could do with full virt, but you can’t run Windows or FreeBSD or possibly not even Debian in one of those containers. Nor can your users install their own kernels.

The second important point is that containers in Linux are not secure at all. It’s moderately easy to escape from a container and take control of the host or other containers. I also doubt they’ll ever be secure because any local kernel exploit in the enormous syscall API is a host exploit when you’re using containers. This is in contrast with full virt on Fedora and RHEL where the qemu process is protected by multiple layers: process isolation, Unix permissions, SELinux, seccomp, and, yes, a container.

So containers are useless? Not at all. If you need to run lots and lots of VMs and you don’t care about multiple kernels or security, containers are just about the only game in town (although it turns out that KVM is no slouch these days).

Also containers (or the cgroups technology behind them) are being used in very innovative places. They are used in systemd for resource control, in libvirt to isolate qemu, and for sandboxing.

3 Comments

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 4

[Part 1, part 2, part 3.]

Finally I modified the test to do some representative work: We now load a real Windows XP guest, inspect it (a heavyweight operation), and mount and stat each filesystem. I won’t reproduce the entire test program again because only the test subroutine has changed:

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/tmp/winxp.img");
    $g->launch ();

    # Inspect the guest (ignore the result).
    $g->inspect_os ();

    # Approximate what virt-df does.
    my %fses = $g->list_filesystems ();
    foreach (keys %fses) {
        my $mounted = 0;
        eval { $g->mount_ro ($_, "/"); $mounted = 1; };
        if ($mounted) {
            $g->statvfs ("/");
            $g->umount_all ();
        }
    }

    return $g;
}

Even with all that work going on, I was able to inspect more than 1 disk per second on my laptop, and run 60 threads in parallel with good performance and scalability:

data

Leave a comment

Filed under Uncategorized

Multiple libguestfs appliances in parallel, part 1

I wrote the Perl script below to find out how many libguestfs appliances we can start in parallel. The results are surprising (-ly good):

data

What’s happening here is that we’re booting up a KVM guest with 500 MB of memory, booting the Linux kernel, booting a minimal userspace, then shutting the whole lot down. And then doing that in parallel with 1, 2, .. 20 threads.

[Note: Hardware is my Lenovo x230 laptop with an Intel Core(TM) i7-3520M CPU @ 2.90GHz, 2 cores with 4 threads, 16 GB of RAM with approx. 13 GB free. Software is: Fedora 18 with libguestfs 1.20.2, libvirt 1.0.2 (from Rawhide), qemu 1.4.0 (from Rawhide)]

The test fails at 21 threads because there isn’t enough free memory, so each qemu instance is allocating around 660 MB of RAM. This is wrong: It failed because libvirt out of the box limits the maximum number of clients to 20. See next part in this series.

Up to 4 parallel launches, you can clearly see the effect of better utilization of the parallelism of the CPU — the total elapsed time hardly moves, even though we’re doing up to 4 times more work.


#!/usr/bin/perl -w

use strict;
use threads;
use Sys::Guestfs;
use Time::HiRes qw(time);

sub test {
    my $g = Sys::Guestfs->new;
    $g->add_drive_ro ("/dev/null");
    $g->launch ();
}

# Get everything into cache.
test (); test (); test ();

# Test increasing numbers of threads until it fails.
for my $nr_threads (1..100) {
    my $start_t = time ();
    my @threads;
    foreach (1..$nr_threads) {
        push @threads, threads->create (\&test)
    }
    foreach (@threads) {
        $_->join ();
        if (my $err = $_->error ()) {
            die "launch failed with nr_threads = $nr_threads: $err"
        }
    }
    my $end_t = time ();
    printf ("%d %.2f\n", $nr_threads, $end_t - $start_t);
}

2 Comments

Filed under Uncategorized

What is the overhead of qemu/KVM?

To clarify, what is the memory overhead, or how many guests can you cram onto a single host, memory being the typical limiting factor when you virtualize.

This was the question someone asked at work today. I don’t know the answer either, but the small program I wrote (below) aims to find out. If you believe the numbers below from qemu 1.2.2 running on Fedora 18, then the overhead is around 150 MB per qemu process that cannot be shared, plus around 200 MB per host (that is, shared between all qemu processes).

guest size 256 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 404.20 MB
Shared writable memory: 0.03 MB

guest size 512 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 643.76 MB
Shared writable memory: 0.03 MB

guest size 1024 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 1172.38 MB
Shared writable memory: 0.03 MB

guest size 2048 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 2237.16 MB
Shared writable memory: 0.03 MB

guest size 4096 MB:
Shared memory backed by a file: 201.41 MB
Anonymous memory (eg. malloc, COW, stack), not shared: 4245.13 MB
Shared writable memory: 0.03 MB

The number to pay attention to is “Anonymous memory” since that is what cannot be shared between guests (except if you have KSM and your guests are such that KSM can be effective).

There are some known shortcomings with my testing methodology that I summarise below. You may be able to see others.

  1. We’re testing a libguestfs appliance. A libguestfs appliance does not have the full range of normal qemu devices that a real guest would have, and so the overhead of a real guest is likely to be higher. The main difference is probably lack of a video device (so no video RAM is evident).
  2. This uses virtio-scsi. Real guests use IDE, virtio-blk, etc which may have quite different characteristics.
  3. This guest has one user network device (ie. SLIRP) which could be quite different from a real network device.
  4. During the test, the guest only runs for a few seconds. A normal, long-running guest would experience qemu memory growth or even memory leaks. You could fix this relatively easily by adding some libguestfs busy-work after the launch.
  5. The guest does not do any significant writes, so during the test qemu won’t be storing any cached or in-flight data blocks.
  6. It only accounts for memory used by qemu in userspace, not memory used by the host kernel on behalf of qemu.
  7. The effectiveness or otherwise of KSM is not tested. It’s likely that KSM depends heavily on your workload, so it wouldn’t be fair to publish any KSM figures.
  8. The script uses /proc/PID/maps but it would be better to use smaps so that we can see how much of the file-backed copy-on-write segments have actually been copied. Currently the script overestimates these by assuming that (eg) all the data pages from a library would be dirtied by qemu.

Another interesting question would be whether qemu is getting better or worse over time.

#!/usr/bin/perl -w

# Estimate memory usage of qemu-kvm at different guest RAM sizes.
# By Richard W.M. Jones <rjones@redhat.com>

use strict;
use Sys::Guestfs;
no warnings "portable"; # 64 bit platform required.

# Loop over different guest RAM sizes.
my $mbytes;
for $mbytes (256, 512, 1024, 2048, 4096) {
    print "guest size ", $mbytes, " MB:\n";

    my $g = Sys::Guestfs->new;

    # Ensure we're using the direct qemu launch backend, otherwise
    # libvirt stops us from finding the qemu PID.
    $g->set_attach_method ("appliance");

    # Set guest memory size.
    $g->set_memsize ($mbytes);

    # Enable user networking just to be more like a "real" guest.
    $g->set_network (1);

    # Launch guest with one dummy disk.
    $g->add_drive ("/dev/null");
    $g->launch ();

    # Get process ID of qemu.
    my $pid = $g->get_pid ();
    die unless $pid > 0;

    # Read the memory maps of the guest.
    open MAPS, "/proc/$pid/maps" or die "cannot open memory map of pid $pid";
    my @maps = <MAPS>;
    close MAPS;

    # Kill qemu.
    $g->close ();

    # Parse the memory maps.
    my $shared_file_backed = 0;
    my $anonymous = 0;
    my $shared_writable = 0;

    my $map;
    foreach $map (@maps) {
        chomp $map;

        if ($map =~ m/
                     ^([0-9a-f]+)-([0-9a-f]+) \s
                     (....) \s
                     [0-9a-f]+ \s ..:.. \s (\d+) \s+ (\S+)?
                    /x) {
            my ($start, $end) = (hex $1, hex $2);
            my $size = $end - $start;
            my $mode = $3;
            my $inode = $4;
            my $filename = $5; # could also be "[heap]", "[vdso]", etc.

            # Shared file-backed text: r-xp, r--p, etc. with a file backing.
            if ($inode != 0 &&
                ($mode eq "r-xp" || $mode eq "r--p" || $mode eq "---p")) {
                $shared_file_backed += $size;
            }

            # Anonymous memory: rw-p.
            elsif ($mode eq "rw-p") {
                $anonymous += $size;
            }

            # Writable and shared.  Not sure what this is ...
            elsif ($mode eq "rw-s") {
                $shared_writable += $size;
            }

            # Ignore [vdso], [vsyscall].
            elsif (defined $filename &&
                   ($filename eq "[vdso]" || $filename eq "[vsyscall]")) {
            }

            # Ignore ---p with no file.  What's this?
            elsif ($inode == 0 && $mode eq "---p") {
            }

            # Ignore kvm-vcpu.
            elsif ($filename eq "anon_inode:kvm-vcpu") {
            }

            else {
                warn "warning: could not parse '$map'\n";
            }
        }
        else {
            die "incorrect maps format: '$map'";
        }
    }

    printf("Shared memory backed by a file: %.2f MB\n",
           $shared_file_backed / 1024.0 / 1024.0);
    printf("Anonymous memory (eg. malloc, COW, stack), not shared: %.2f MB\n",
           $anonymous / 1024.0 / 1024.0);
    printf("Shared writable memory: %.2f MB\n",
           $shared_writable / 1024.0 / 1024.0);

    print "\n";
}


1 Comment

Filed under Uncategorized

Handout for my talk at KVM Forum

The handout is here (PDF). The talk itself can be downloaded from this git repository.

For more information about libguestfs, there is copious documentation on the website.

6 Comments

Filed under Uncategorized

KVM Forum Barcelona next week

I am giving a short talk about libguestfs at the Linux Foundation KVM Forum in Barcelona next week (full schedule here).

Leave a comment

Filed under Uncategorized