Tag Archives: xen

Great new changes coming to nbdkit

Eric Blake has been doing some great stuff for nbdkit, the flexible plugin-based NBD server.

  • Full parallel request handling.
    You’ve always been able to tell nbdkit that your plugin can handle multiple requests in parallel from a single client, but until now that didn’t actually do anything (only parallel requests from multiple clients worked).
  • An NBD forwarding plugin, so if you have another NBD server which doesn’t support a feature like encryption or new-style protocol, then you can front that server with nbdkit which does.

As well as that he’s fixed lots of small bugs with NBD compliance so hopefully we’re now much closer to the protocol spec (we always check that we interoperate with qemu’s nbd client, but it’s nice to know that we’re also complying with the spec). He also fixed a potential DoS where nbdkit would try to handle very large writes which would delay a thread in the server indefinitely.


Also this week, I wrote an nbdkit plugin for handling the weird Xen XVA file format. The whole thread is worth reading because 3 people came up with 3 unique solutions to this problem.

1 Comment

Filed under Uncategorized

New in libguestfs 1.27.34 – virt-v2v and virt-p2v

There haven’t been too many updates around here for a while, and that’s for a very good reason: I’ve been “heads down” writing the new versions of virt-v2v and virt-p2v, our tools for converting VMware and Xen virtual machines, or physical machines, to run on KVM.

The new virt-v2v [manual page] can slurp in a guest from a local disk image, local Xen, VMware vCenter, or (soon) an OVA file — convert it to run on KVM — and write it out to RHEV-M, OpenStack Glance, local libvirt or as a plain disk image.

It’s easy to use too. Unlike the old virt-v2v there are no hairy configuration files to edit or complicated preparations. You simply do:

$ virt-v2v -i disk xen_disk.img -o local -os /tmp

That command (which doesn’t need root, naturally) takes the Xen disk image, which could be any supported Windows or Enterprise Linux distro, converts it to run on KVM (eg. installing virtio drivers, adjusting dozens of configuration files), and writes it out to /tmp.

To connect to a VMware vCenter server, change the -i options to:

$ virt-v2v -ic vpx://vcenter/Datacenter/esxi "esx guest name" [-o ...]

To output the converted disk image to OpenStack glance, change the -o options to:

$ virt-v2v [-i ...] -o glance [-on glance_image_name]

Coming up: The new technology we’ve used to make virt-v2v much faster.

20 Comments

Filed under Uncategorized

Tip: Run Xen as a KVM guest

(Thanks to Dan for pointing out that this is possible)

You can run legacy Xen hypervisor as a KVM guest, which is useful for testing and development. Since the Xen HV won’t have access to the hardware (in particular, to hardware virt) you can only run Xen paravirt guests this way, which in reality means only Linux XenPV guests. [It is supposed to be possible, if you have AMD hardware supporting nested SVN, to get Xen fullvirt guests going, but I did not try this.]

Preparation:

  1. Install a RHEL 5 guest first. This will later be changed into a “Xen host” guest.
  2. You will definitely appreciate having libguestfs around since you can do virt-edit RHEL5Xen /boot/grub/grub.conf to adjust the boot configuration iteratively.

Boot the RHEL 5 guest ordinarily, and install the appropriate kernel-xen and xen packages, and edit grub.conf to enable the Xen hypervisor:

default=0
[...]
title Red Hat Enterprise Linux Server (2.6.18-194.17.1.el5xen)
        root (hd0,0)
        kernel /xen.gz-2.6.18-194.17.1.el5 noapic
        module /vmlinuz-2.6.18-194.17.1.el5xen ro root=/dev/VolGroup00/LogVol00
        module /initrd-2.6.18-194.17.1.el5xen.img

I added the noapic option, and removed rhgb quiet so I could see what was going on.

Furthermore you need to switch the guest to using emulated devices (IDE, rtl8139) instead of virtio, if it’s not already. Shut down the guest and do:

# virsh edit RHEL5Xen

and change <target dev=’vda’ bus=’virtio’/> to <target dev=’sda’ bus=’ide’/> and remove any <address> element in the same <disk> block. Similarly if there is a network card, make it an emulated rtl8139 instead of virtio.

If during boot you see the error:

request_module: runaway loop modprobe binfmt-464c

then this means you’ve got the wrong kernel-xen package installed, usually that you’ve got the 32 bit kernel-xen.i686 with a 64 bit userspace.

As I said before, I really appreciated guestfish / virt-edit to let me interactively edit the grub configuration and go back and forwards between Xen and ordinary kernel until I got it right.

But finally it did boot, and it was running the Xen hypervisor and dom0:

# /etc/init.d/xend start
Starting xend:                                             [  OK  ]
# /usr/sbin/xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0      864     1 r-----     48.3
# uname -a
Linux rhel5xenx64.home.annexia.org 2.6.18-194.17.1.el5xen #1 SMP Mon Sep 20 07:20:39 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux

Update #1

If you have libvirt running inside this guest, it will try to register a bridged network with the address 192.168.122.x which is the exact same address as your host, and that causes confusion. Typical symptoms are that you can’t ssh into the Xen guest from the host, and that networking inside the Xen guest seems broken in strange ways. The solution is simple. In the Xen guest type:

# virsh net-edit default

and change “122” everywhere to something else, eg. “123”.

Then restart the default network:

# virsh net-destroy default
# virsh net-start default

and everything will work again.

Update #2

Because this KVM guest is running a Xen hypervisor and guests, you need to give it a bit more memory. I bumped mine up to 2G, allowing me to comfortably install and run one (nested) guest:

# /usr/sbin/xm list
Name                                      ID Mem(MiB) VCPUs State   Time(s)
Domain-0                                   0     1224     1 r-----    103.8
RHEL5xenguest                              1      767     1 -b----     33.0

4 Comments

Filed under Uncategorized

Tip: Configure guest with filesystem directly on a host device

Xen lets you assign a host block device as a guest partition, synthesizing the partition table for you. So for example, the host /dev/VG/LV might appear in the guest as /dev/xvda1 with a filesystem directly on it. This means from the host you just see a filesystem, which you can create directly, mount etc. Some people like this and others like me have guests already configured like this.

It is possible to configure a libvirt/KVM host this way, and this post describes how. It is somewhat more manual than using virt-install. In fact I will assume that you either already have guests configured this way, or you know how to prepare a guest filesystem (eg. using debootstrap) directly onto a device like this.

There are two points about such a configuration on KVM.

Number one: the kernel and initrd of the guest live “outside” the guest, on the host. This can be beneficial, eg. if you want all your guests to have a common kernel which you will manage, compile and upgrade centrally. Number two is that KVM cannot synthesize the partition table like Xen, so inside the guest it’s going to see a filesystem on /dev/vda directly, with no partition table. Linux will work just fine. Windows wouldn’t, but that doesn’t matter because Windows cannot be configured like this anyway.

Decide where to put your external kernel and initrd on the host, and then configure libvirt like this:

<domain type='kvm'>
  <name>guest</name>
  ...
  <os>
    <type arch='x86_64' machine='pc'>hvm</type>
    <kernel>/usr/local/etc/virt/vmlinuz</kernel>
    <initrd>/usr/local/etc/virt/initrd</initrd>
    <cmdline>ro root=/dev/vda</cmdline>
    <boot dev='hd'/>
  </os>
  ...
  <devices>
    <disk type='block' device='disk'>
      <source dev='/dev/VG/LV'/>
      <target dev='vda' bus='virtio'/>
    </disk>

The kernel, initrd, cmdline elements are what libvirt calls “direct kernel boot”.

Secondly of course you need your guest filesystem directly on the device or logical volume. Its fstab should be set up accordingly, plus any other configuration files it needs. (guestfish can of course safely create, view and edit guest filesystems which are configured in this manner).

In fact I use several production guests configured like this, for historical reasons (they came from a Xen server). However I find in general this a more clumsy way to organize guests. It might scale better at very high-end configurations, if you wrote a lot of custom tools, but the vast majority of users don’t have systems on such a scale.

There is also a security issue: although you can mount a guest filesystem directly on the host this way, it’s likely that you shouldn’t. Only this week I found a kernel-crasher, possibly exploitable, in the minix filesystem driver. Use libguestfs to put a safe barrier between your guests and your host.

4 Comments

Filed under Uncategorized

Tip: Install a device driver in a Windows VM

Previously we looked at how to install a service in a Windows VM. You can use that technique or the RunOnce tip to install some device drivers too.

But what if Windows needs the device driver in order to boot? This is the problem we faced with converting old Xen and VMWare guests to use KVM. You can’t install viostor (the virtio disk driver) which KVM needs either on the source Xen/VMWare hypervisors (because those don’t use the virtio standard) or on the destination KVM hypervisor (because Windows needs to be able to see the disk first in order to be able to boot).

Nevertheless we can modify the Windows VM off line using libguestfs to install the virtio device driver and allow it to boot.

(Note: virt-v2v will do this for you. This article is for those interested in how it works).

There are three different aspects to installing a device driver in Windows. Two of these are Windows Registry changes, and one is to install the .SYS file (the device driver itself).

So first we make the two Registry changes. Device drivers are a bit like services under Windows, so the first change looks like installing a service in a Windows guest. The second Registry change adds viostor to the “critical device database”, a map of PCI addresses to device drivers used by Windows at boot time:

# virt-win-reg --merge Windows7x64

;
; Add the viostor service
;

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\viostor]
"Group"="SCSI miniport"
"ImagePath"=hex(2):73,00,79,00,73,00,74,00,65,00,6d,00,33,00,32,00,5c,00,64,\
  00,72,00,69,00,76,00,65,00,72,00,73,00,5c,00,76,00,69,00,6f,00,73,00,74,00,6f,\
  00,72,00,2e,00,73,00,79,00,73,00,00,00
"ErrorControl"=dword:00000001
"Start"=dword:00000000
"Type"=dword:00000001
"Tag"=dword:00000040

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\viostor\Parameters]
"BusType"=dword:00000001

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\viostor\Parameters\MaxTransferSize]
"ParamDesc"="Maximum Transfer Size"
"type"="enum"
"default"="0"

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\viostor\Parameters\MaxTransferSize\enum]
"0"="64  KB"
"1"="128 KB"
"2"="256 KB"

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\viostor\Parameters\PnpInterface]
"5"=dword:00000001

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Services\viostor\Enum]
"0"="PCI\\VEN_1AF4&DEV_1001&SUBSYS_00021AF4&REV_00\\3&13c0b0c5&2&20"
"Count"=dword:00000001
"NextInstance"=dword:00000001

;
; Add viostor to the critical device database
;

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\CriticalDeviceDatabase\PCI#VEN_1AF4&DEV_1001&SUBSYS_00000000]
"ClassGUID"="{4D36E97B-E325-11CE-BFC1-08002BE10318}"
"Service"="viostor"

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\CriticalDeviceDatabase\PCI#VEN_1AF4&DEV_1001&SUBSYS_00020000]
"ClassGUID"="{4D36E97B-E325-11CE-BFC1-08002BE10318}"
"Service"="viostor"

[HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\CriticalDeviceDatabase\PCI#VEN_1AF4&DEV_1001&SUBSYS_00021AF4]
"ClassGUID"="{4D36E97B-E325-11CE-BFC1-08002BE10318}"
"Service"="viostor"

Comparatively speaking, the second step of uploading viostor.sys to the right place in the image is simple:

# guestfish -i Windows7x64
><fs> upload viostor.sys /Windows/System32/drivers/viostor.sys

After that, the Windows guest can be booted on KVM using virtio. In virt-v2v we then reinstall the viostor driver (along with other drivers like the virtio network driver) so that we can be sure they are all installed correctly.

12 Comments

Filed under Uncategorized

Create a partitioned device from a collection of filesystems

Xen has a feature where it can export virtual partitions directly to virtual machines. You can configure a Xen VM like this example:

disk = ['phy:raidvg/devroot,hda1,w','phy:raidvg/devswap,hda2,w']

Notice that host device /dev/raidvg/devroot is mapped to a partition inside the guest (/dev/hda1), and on the host this device directly contains a filesystem:

host# file - < /dev/raidvg/devroot
/dev/stdin: Linux rev 1.0 ext3 filesystem data, UUID=... (needs journal recovery)

Inside the guest, it sees /dev/hda1, /dev/hda2, but no /dev/hda device or partition table.

This is actually a nice feature of Xen because resizing filesystems directly is much easier than resizing a partitioned block device. You can just make the host device bigger (lvresize -L sizeG /dev/raidvg/devroot), reboot the guest so it sees the increased device size, then resize the filesystem (resize2fs — this can even be done live if you want to make the filesystem bigger).

Imagine if we’d been dealing with a KVM partitioned block device instead:

+-+---------------------+------------+
|M| hda1                | hda2       |
|B| (root filesystem)   | (swap)     |
|R|                     |            |
+-+---------------------+------------+

Resizing this is much more painful. You first have to extend the host block device:

+-+---------------------+------------+-------+
|M| hda1                | hda2       | space |
|B| (root filesystem)   | (swap)     |       |
|R|                     |            |       |
+-+---------------------+------------+-------+

Now what do you do? Easiest is probably to create a third (hda3) partition in that extra space. If you didn’t have the foresight to use LVM, then this means your root filesystem cannot be extended — you can only create another extra filesystem (say for /var) and copy files over. This is very inflexible.

Instead you could recalculate the MBR and move (ie. copy block by block) hda2 up. (Imagine it wasn’t swap space since you can just throw that away and recreate it, but some valuable files). Recalculating the MBR is generally error-prone because partitions have strange limitations and alignment requirements.

One day I intend to write a program to do these kinds of complex resizing operations …

Anyhow, this wasn’t even what this rambling blog entry was about. It is a companion to last week’s tip about extracting filesystems from disk images. Can we do the opposite, ie. create a partitioned device from a collection of Xen filesystems?

Answer, yes we can, with guestfish.

I’m starting in fact with the filesystem and swap devices copied from my Xen server, and I need to know their exact sizes in 1024-byte-blocks first:

$ ls --block-size=1024 -l devroot devswap
-rw-rw-r--. 1 rjones rjones 3145728 2010-03-17 14:18 devroot
-rw-rw-r--. 1 rjones rjones 1048576 2010-03-17 14:19 devswap

I’m going to put this into a 5G disk image, giving me space to expand the root filesystem to fit. Inexplicably I’ve decided to keep the swap partition content even though in reality I would just throw it away and recreate the swap partition (imagine there’s some important filesystem content in there instead). I want devswap to precisely fit at the end of the new disk image.

Let’s create the disk image and find out how big it is in sectors:

$ rm -f disk.img
$ truncate -s 5G disk.img
$ guestfish -a disk.img -a devroot -a devswap
><fs> run
><fs> blockdev-getsz /dev/vda
10485760  # size in 512 byte sectors

Now I need to do some back of the envelope calculations to work out how I will size and place each partition. (This is a huge pain in the neck — I had to do several runs to get the numbers to come out right …)

><fs> part-init /dev/vda mbr
# numbers below are in units of 512 byte sectors:
><fs> part-add /dev/vda primary 64 8388607
><fs> part-add /dev/vda primary 8388608 -1
><fs> sfdisk-l /dev/vda

Disk /dev/vda: 10402 cylinders, 16 heads, 63 sectors/track
Units = cylinders of 516096 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/vda1          0+   8322-   8322-   4194272   83  Linux
/dev/vda2       8322+  10402-   2081-   1048576   83  Linux
/dev/vda3          0       -       0          0    0  Empty
/dev/vda4          0       -       0          0    0  Empty

Notice the number of (1024-byte) blocks for devswap is exactly the correct size: 1048576.

The sfdisk-l command is also telling me that my partitions aren’t aligned on “cylinders” which I don’t care about. But the swap partition should be aligned for the underlying device because sector 8388608 == 8192 * 1024.

Once the hard bit is out of the way, I can now copy across my filesystems. Notice I added devroot and devswap as devices (the -a option to guestfish). They appear in the guest as /dev/vdb and /dev/vdc respectively and I can just dd them to the right places:

><fs> dd /dev/vdb /dev/vda1
><fs> dd /dev/vdc /dev/vda2

and resize the root filesystem to fit the space available:

><fs> e2fsck-f /dev/vda1
><fs> resize2fs /dev/vda1

Now I have a single partitioned device, suitable for use with KVM (mind you, not bootable because it still contains a Xen paravirt kernel):

$ virt-list-filesystems -al disk.img
/dev/sda1 ext3
/dev/sda2 swap

As you can see there is much scope for automation …

1 Comment

Filed under Uncategorized

virt-df on Xen

Virt-df works on Xen so you can display free disk space in a Xen VM.

# virt-df -h 2>&1 | grep -v '/dev/kqemu'
Domain-0 seems to have no disk devices
Filesystem                                Size       Used  Available  Use%
RHEL39FV64:/dev/hda1                     98.7M      13.5M      80.1M 18.8%
RHEL39FV64:/dev/hda2                      6.8G       1.8G       4.7G 31.0%
RHEL39FV64:/dev/hdb1                     30.4M       1.0M      27.8M  8.5%
RHEL39FV32:/dev/hda1                     98.7M      14.5M      79.1M 19.9%
RHEL39FV32:/dev/hda2                      6.8G       1.5G       4.9G 27.8%
RHEL39FV32:/dev/hdb1                     30.4M       1.0M      27.8M  8.5%
RHEL53PV64:/dev/VolGroup00/LogVol00       6.6G       2.1G       4.2G 37.3%
RHEL53PV64:/dev/hda1                     98.7M      13.1M      80.5M 18.5%
RHEL48PV64:/dev/VolGroup00/LogVol00       6.7G       2.1G       4.3G 36.6%
RHEL48PV64:/dev/hda1                     98.7M       9.2M      84.4M 14.5%
RHEL52PV32:/dev/VolGroup00/LogVol00       6.6G       2.5G       3.7G 43.5%
RHEL52PV32:/dev/hda1                     98.7M      12.7M      80.9M 18.0%
RHEL53PV32:/dev/VolGroup00/LogVol00       6.6G       1.9G       4.4G 34.1%
RHEL53PV32:/dev/hda1                     98.7M      12.8M      80.8M 18.2%
RHEL52FV32:/dev/VolGroup00/LogVol00       6.7G       2.6G       3.8G 43.5%
RHEL52FV32:/dev/hda1                     98.7M      14.5M      79.1M 19.8%
RHEL52FV32:/dev/hdb1                     30.3M       1.4M      27.4M  9.7%
RHEL48PV32:/dev/VolGroup00/LogVol00       6.7G       1.8G       4.6G 31.7%
RHEL48PV32:/dev/hda1                     98.7M       8.6M      85.0M 13.9%
RHEL47FV32:/dev/VolGroup00/LogVol00       6.7G       1.9G       4.5G 32.6%
RHEL47FV32:/dev/hda1                     98.7M      13.9M      79.7M 19.2%
RHEL47PV32:/dev/VolGroup00/LogVol00       6.7G       1.8G       4.6G 31.3%
RHEL47PV32:/dev/hda1                     98.7M       8.6M      85.0M 13.9%

It’s a bit noisy giving a warning about opening /dev/kqemu for each guest. I’ve grepped out the warnings above. The “Domain-0 seems to have no disk devices” warning is bug 538041.

Leave a comment

Filed under Uncategorized