Tag Archives: libvirt

Libguestfs appliance boot in under 600ms

$ ./run ./utils/boot-benchmark/boot-benchmark
Warming up the libguestfs cache ...
Running the tests ...

test version: libguestfs 1.33.28
 test passes: 10
host version: Linux moo.home.annexia.org 4.4.4-301.fc23.x86_64 #1 SMP Fri Mar 4 17:42:42 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
    host CPU: Intel(R) Core(TM) i7-5600U CPU @ 2.60GHz
     backend: direct               [to change set $LIBGUESTFS_BACKEND]
        qemu: /home/rjones/d/qemu/x86_64-softmmu/qemu-system-x86_64 [to change set $LIBGUESTFS_HV]
qemu version: QEMU emulator version 2.5.94, Copyright (c) 2003-2008 Fabrice Bellard
         smp: 1                    [to change use --smp option]
     memsize: 500                  [to change use --memsize option]
      append:                      [to change use --append option]

Result: 575.9ms ±5.3ms

There are various tricks here:

  1. I’m using the (still!) not upstream qemu DMA patches.
  2. I’ve compiled my own very minimal guest Linux kernel.
  3. I’m using my nearly upstream "crypto: Add a flag allowing the self-tests to be disabled at runtime." patch.
  4. I’ve got two sets of non-upstream libguestfs patches 1, 2
  5. I am not using libvirt, but if you do want to use libvirt, make sure you use the very latest version since it contains an important performance patch.

Previously

4 Comments

Filed under Uncategorized

virt-builder: Fedora 21 ppc64 and ppc64le images

virt-builder now has Fedora 21 ppc64 and ppc64le images available, and you can run these under emulation on an x86-64 host. Here’s how to do it:

$ virt-builder --arch ppc64 fedora-21 \
    -o fedora-21-ppc64.img

or:

$ virt-builder --arch ppc64le fedora-21 \
    -o fedora-21-ppc64le.img

To boot them:

$ qemu-system-ppc64 -M pseries -cpu POWER8 -m 4096 \
    -drive file=fedora-21-ppc64[le].img \
    -serial stdio

Oddly the boot messages will appear on the GUI, but the login prompt will only appear on the serial console. (Fixed)

Libvirt also has support, so with a sufficiently new version of the toolchain you can also use:

$ virt-install --import --name=guestname \
    --ram=4096 --vcpus=1 \
    --os-type=linux --os-variant=fedora21 \
    --arch=ppc64[le] --machine pseries \
    --disk=fedora-21-ppc64[le].img,format=raw
$ virsh start guestname

It’s quite fun to play with Big Iron, even in an emulator that runs at about 1/1000th the speed of the real thing. I know a lot about this, because we have POWER8 machines at Red Hat, and they really are the fastest computers alive, by a significant multiple. Of course, they also cost a fortune and use huge amounts of power.

Some random observations:

  1. The virt-builder --size parameter cannot resize the ppc64 guest filesystem correctly, because Anaconda uses an extended partition. Workaround is to either add a second disk or to create another extended partition in the extra space. (Fixed)
  2. The disks are ibmvscsi model (not virtio or ide). This is the default, but something to think about if you edit or create the libvirt XML manually.
  3. Somehow the same CPU/machine model works for both Big Endian and Little Endian guests. It must somehow auto-detect the guest type, but I couldn’t work out how that works. Anyway, it just works by magic. it’s done by the kernel
  4. libguestfs inspection is broken for ppc64le
  5. Because TCG (qemu software emulation) is single threaded, only use a single vCPU. If you use more, it’ll actually slow the thing down.

Thanks: Maros Zatko for working out the virt-install command line and implementing the virt-builder script to build the images.

Leave a comment

Filed under Uncategorized

Tip: Wake up a guest from screen blank

A few years ago Dan Berrange added a way to send fake keyboard events to libvirt guests. You can use this to inject just a press on the Left Shift key to wake up a guest from screen blank. Very useful if you need to take a screenshot!

$ virsh send-key guest KEY_LEFTSHIFT
$ sleep 1
$ virsh screenshot guest /tmp/screenshot.ppm

Update: A word of warning though. If you try this for Windows guests you’ll hit this message:

win_2003r2_x86_64_no_tools-20150401-130735

The solution is to hit other keys randomly. Grrr.

Leave a comment

Filed under Uncategorized

Mini Cloud/Cluster v2.0

Last year I wrote and rewrote a little command line tool for managing my virtualization cluster.

Of course I could use OpenStack RDO but OpenStack is a vast box of somewhat working bits and pieces. I think for a small cluster like mine you can get the essential functionality of OpenStack a lot more simply — in 1300 lines of code as it turns out.

The first thing that small cluster management software doesn’t need is any permanent daemon running on the nodes. The reason is that we already have sshd (for secure management access) and libvirtd (to manage the guests) out of the box. That’s quite sufficient to manage all the state we care about. My Mini Cloud/Cluster software just goes out and queries each node for that information whenever it needs it (in parallel of course). Nodes that are switched off are handled by ignoring them.

The second thing is that for a small cloud we can toss features that aren’t needed at all: multi-user/multi-tenant, failover, VLANs, a nice GUI.

The old mclu (Mini Cluster) v1.0 was written in Python and used Ansible to query nodes. If you’re not familiar with Ansible, it’s basically parallel ssh on steroids. This was convenient to get the implementation working, but I ended up rewriting this essential feature of Ansible in ~ 60 lines of code.

The huge down-side of Python is that even such a small program has loads of hidden bugs, because there’s no safety at all. The rewrite (in OCaml) is 1,300 lines of code, so a fraction larger, but I have a far higher confidence that it is mostly bug free.

I also changed around the way the software works to make it more “cloud like” (and hence the name change from “Mini Cluster” to “Mini Cloud”). Guests are now created from templates using virt-builder, and are stateless “cattle” (although you can mix in “pets” and mclu will manage those perfectly well because all it’s doing is remote libvirt-over-ssh commands).

$ mclu status
ham0                     on
                           total: 8pcpus 15.2G
                            used: 8vcpus 8.0G by 2 guest(s)
                            free: 6.2G
ham1                     on
                           total: 8pcpus 15.2G
                            free: 14.2G
ham2                     on
                           total: 8pcpus 30.9G
                            free: 29.9G
ham3                     off

You can grab mclu v2.0 from the git repository.

2 Comments

Filed under Uncategorized

libguestfs now works on 64 bit ARM

arm

Pictured above is my 64 bit ARM server. It’s under NDA so I cannot tell you who supplied it or even show you a proper photo.

However it runs Fedora 21 & Rawhide:

Linux arm64.home.annexia.org 3.16.0-0.rc6.git1.1.efirtcfix1.fc22.aarch64 #1 SMP Wed Jul 23 12:15:58 BST 2014 aarch64 aarch64 aarch64 GNU/Linux

libvirt and libguestfs run fine, with full KVM acceleration, although right now you have to use qemu from git as the Rawhide version of qemu is not new enough.

Also OCaml 4.02.0 beta works (after we found and fixed a few bugs in the arm64 native code generator last week).

3 Comments

Filed under Uncategorized

Setting up virtlockd on NFS

virtlockd is a lock manager implementation for libvirt. It’s designed to prevent you from starting two virtual machines (eg. on different nodes in your cluster) which are backed by the same writable disk image, something which can cause disk corruption. It uses plain fcntl-based file locking, so it is ideal for use when you are using NFS to share your disk images.

Since documentation is rather lacking, this post summarises how to set up virtlockd. I am using NFS to share /var/lib/libvirt/images across all the nodes in my virtualization cluster.

Firstly it is not clear from the documentation, but virtlockd runs alongside libvirtd on every node. The reason for this is so that libvirtd can be killed without having it drop all the locks, which would leave all your VMs unprotected. (You can restart virtlockd independently when it is safe to do so). I guess the other reason is because POSIX file locking is so fscking crazy unless you use it from an independent process.

Another thing which is not clear from the documentation: virtlockd doesn’t listen on any TCP ports, so you don’t need to open up the firewall. The local libvirtd and virtlockd processes communicate over a private Unix domain socket and virtlockd doesn’t need to communicate with anything else.

There are two ways that virtlockd can work: It can either lock the images directly (this is contrary to what the current documentation says, but Dan told me this so it must be true).

Or you can set up a separate lock file directory, where virtlockd will create zero-sized lock files. This lock file directory must be shared with all nodes over NFS. The lock directory is only needed if you’re not using disk image files (eg. you’re using iSCSI LUNs or something). The reason is that you can’t lock things like devices using fcntl. If you want to go down this route, apart from setting up the shared lock directory somewhere, exporting it from your NFS server, and mounting it on all nodes, you will also have to edit /etc/libvirt/qemu-lockd.conf. The comments are fairly self-explanatory.

However I’m using image files, so I’m going to opt for locking the files directly. This is easy to set up because there’s hardly configuration at all: as long as virtlockd is running, it will just lock the image files. All you have to do is make sure the virtlockd service is installed on every node. (It is socket-activated, so you don’t need to enable it), and tell libvirt’s qemu driver to use it:

--- /etc/libvirt/qemu.conf ---
lock_manager = "lockd"

2 Comments

Filed under Uncategorized

Caseless virtualization cluster: remote libvirt

Now to the question of how to manage the VMs on my virtualization cluster.

I don’t have a good answer yet, but two things are true:

  1. libvirt will be used to manage the VMs
  2. ssh is used for remote logins

It’s simple to set up ssh to allow remote logins as root using ssh-agent:

ham3$ sudo bash
ham3# cd /root
ham3# mkdir .ssh
ham3# cp /mnt/scratch/authorized_keys .ssh/

From a remote host, remote virsh commands now work:

$ virsh -c qemu+ssh://root@ham3/system list
 Id    Name                           State
----------------------------------------------------

Using libvirt URI aliases (thanks Kashyap) I can set up some aliases to make this quite easy:

$ cat .config/libvirt/libvirt.conf
uri_aliases = [
  "ham0=qemu+ssh://root@ham0/system",
  "ham1=qemu+ssh://root@ham1/system",
  "ham2=qemu+ssh://root@ham2/system",
  "ham3=qemu+ssh://root@ham3/system",
]
$ virsh -c ham0 list
 Id    Name                           State
----------------------------------------------------

However my bash history contains a lot of commands like these which don’t make me happy:

$ for i in 0 1 2 3 ; do ./bin/wol-ham$i; done
$ for i in 0 1 2 3 ; do virsh -c ham$i list; done

Leave a comment

Filed under Uncategorized

KVM working on the Cubietruck

ctg

I managed to get KVM working on the Cubietruck last week. It’s not exactly simple, but this post describes in overview how to do it.

(1) You will need a Cubietruck, a CP2102 serial cable, a micro SDHC card, a card reader for your host computer, and a network patch cable (the board supports wifi but it doesn’t work with the newer kernel we’ll be using). Optional: 2.5″ SATA HDD or SSD.

(2) Start with Hans De Goede’s AllWinner remix of Fedora 19, and get that working. It’s important to read his README file carefully.

(3) Build this upstream kernel with this configuration:

make oldconfig
make menuconfig

In menuconfig, enable Large Page Address Extension (LPAE), and then enable KVM in the Virtualization menu.

LOADADDR=0x40008000 make uImage dtbs
make modules
sudo cp arch/arm/boot/uImage /boot/uImage.sunxi-test
sudo cp arch/arm/boot/dts/sun7i-a20-cubietruck.dtb /boot/sun7i-a20-cubietruck.dtb.sunxi-test
sudo make modules_install

Reboot, interrupt u-boot (using the serial console), and type the following commands to load the new kernel:

setenv bootargs console=ttyS0,115200 loglevel=9 earlyprintk ro rootwait root=/dev/mmcblk0p3
ext2load mmc 0 0x46000000 uImage.sunxi-test
ext2load mmc 0 0x4b000000 sun7i-a20-cubietruck.dtb.sunxi-test
env set fdt_high ffffffff
bootm 0x46000000 - 0x4b000000

(4) Build this modified u-boot which supports Hyp mode.

make cubietruck_config
make
sudo dd if=u-boot-sunxi-with-spl.bin of=/dev/YOURSDCARD bs=1024 seek=8

Reboot again, use the commands above to boot into the upstream kernel, and if everything worked you should see:

Brought up 2 CPUs
SMP: Total of 2 processors activated.
CPU: All CPU(s) started in HYP mode.
CPU: Virtualization extensions available.

Also /dev/kvm should exist.

(5) Hack QEMU to create Cortex-A7 CPUs using this one-line patch.

Edit: dgilmore tells me this is no longer necessary. Instead make sure you use the qemu -cpu host option.

Then you should be able to create VMs using libvirt. Note if using libguestfs you will need to use the direct backend (LIBGUESTFS_BACKEND=direct) because of this libvirt bug.

7 Comments

Filed under Uncategorized

Creating a cloud-init config disk for non-cloud boots

There are lots of cloud disk images floating around. They are designed to run in clouds where there is a boot-time network service called cloud-init available that provides initial configuration. If that’s not present, or you’re just trying to boot these images in KVM/libvirt directly without any cloud, then things can go wrong.

Luckily it’s fairly easy to create a config disk (aka “seed disk”) which you attach to the guest and then let cloud-init in the guest get its configuration from there. No cloud, or even network, required.

I’m going to use a tool called virt-make-fs to make the config disk, as it’s easy to use and doesn’t require root. There are other tools around, eg. make-seed-disk which do a similar job. (NB: You might hit this bug in virt-make-fs, which should be fixed in the latest version).

I’m also using a cloud image downloaded from the Fedora project, but any cloud image should work.

First I create my cloud-init metadata. This consists of two files. meta-data contains host and network configuration:

instance-id: iid-123456
local-hostname: cloudy

user-data contains other custom configuration (note #cloud-config is
not a comment, it’s a directive to tell cloud-init the format of the file):

#cloud-config
password: 123456
runcmd:
 - [ useradd, -m, -p, "", rjones ]
 - [ chage, -d, 0, rjones ]

(The idea behind this split is probably not obvious, but apparently it’s because the meta-data is meant to be supplied by the Cloud, and the user-data is meant to be supplied by the Cloud’s customer. In this case, no cloud, so we’re going to supply both!)

I put these two files into a directory, and run virt-make-fs to create the config disk:

$ ls
meta-data  user-data
$ virt-make-fs --type=msdos --label=cidata . /tmp/seed.img
$ virt-filesystems -a /tmp/seed.img --all --long -h
Name      Type        VFS   Label   MBR  Size  Parent
/dev/sda  filesystem  vfat  cidata  -    286K  -
/dev/sda  device      -     -       -    286K  -

Now I need to pass some kernel options when booting the Fedora cloud image, and the only way to do that is if I boot from an external kernel & initrd. This is not as complicated as it sounds, and virt-builder has an option to get the kernel and initrd that I’m going to need:

$ virt-builder --get-kernel Fedora-cloud.raw
download: /boot/vmlinuz-3.9.5-301.fc19.x86_64 -> ./vmlinuz-3.9.5-301.fc19.x86_64
download: /boot/initramfs-3.9.5-301.fc19.x86_64.img -> ./initramfs-3.9.5-301.fc19.x86_64.img

Finally I’m going to boot the guest using KVM (you could also use libvirt with a little extra effort):

$ qemu-kvm -m 1024 \
    -drive file=Fedora-cloud.raw,if=virtio \
    -drive file=seed.img,if=virtio \
    -kernel ./vmlinuz-3.9.5-301.fc19.x86_64 \
    -initrd ./initramfs-3.9.5-301.fc19.x86_64.img \
    -append 'root=/dev/vda1 ro ds=nocloud-net'

You’ll be able to log in either as fedora/123456 or rjones (no password), and you should see that the hostname has been set to cloudy.

5 Comments

Filed under Uncategorized

New tool: virt-builder

New in libguestfs 1.24 will be a simple tool called virt-builder. This builds virtual machines of various free operating systems quickly and securely:

$ virt-builder fedora-19 --size 20G --install nmap
[     0.0] Downloading: http://libguestfs.org/download/builder/fedora-19.xz
[     2.0] Uncompressing: http://libguestfs.org/download/builder/fedora-19.xz
[    25.0] Running virt-resize to expand the disk to 20.0G
[    74.0] Opening the new disk
[    78.0] Random root password: RCuMKJ4NPak0ptJQ [did you mean to use --root-password?]
[    78.0] Installing packages: nmap
[    93.0] Finishing off

Some notable features:

  • Fast: As you can see above, once it has downloaded and cached the template first time, it can churn out new guests in around 90 seconds.
  • Install packages.
  • Set the hostname.
  • Generate a random seed for the guest.
  • Upload files.
  • Set passwords, create user accounts.
  • Run custom scripts.
  • Install firstboot scripts.
  • Fetch packages from private repos and ISOs.
  • Secure: Everything is assembled in a container (using SELinux if available).
  • Guest templates are PGP-signed.
  • No root or privileged access needed at all (no setuid, no sudo).
  • Fully scriptable.
  • Can be used in locked-down no-network scenarios.
  • Can use UML as a backend (good for use in a cloud).

13 Comments

Filed under Uncategorized