Tag Archives: qemu

nbdkit: Write plugins in Perl

nbdkit is a liberally licensed NBD server, that lets you serve “unconventional” sources of disk images and make them available to qemu, libguestfs, etc.

In the latest version of nbdkit, you can now write your plugins in Perl [example].

Coming soon: The ability to write plugins in Python too.

1 Comment

Filed under Uncategorized

New in libguestfs: Allow cache mode to be selected

libguestfs used to default to the qemu cache=none caching mode. By allowing other qemu caching modes to be used, you can get a considerable speed up, up to 25% in the best case.

qemu has a pretty confusing range of caching modes. It’s possible to select writeback (the default), writethrough, none, directsync or unsafe. Know what all those mean? Congratulations, you are Kevin Wolf.

It helps to understand there are many places where your data could be cached. Some are: in the guest’s RAM, in the host’s RAM, or in the small cache which is part of the circuitry on the disk drive. And there’s a conflict of interests too. For your precious novel, you’d hope that when you hit “Save” (and even before) it has actually been written in magnetic patterns on the spinning disk. But the vast majority of data that is written (eg. during a large compile) is not that important, can easily be reconstructed, and is best left in RAM for speedy access.

Qemu emulates disk drives that in some way resemble the real thing, and disk drives have a range of ways that the operating system tells the drive the importance of data and when it needs to be persisted. For example a guest using a (real or emulated) SCSI drive sets the Force Unit Access bit to force the data to be written to physical media before being acknowledged. Unfortunately not all guests know about the FUA flag or its equivalents, and even those that do sometimes don’t issue these flush commands in the right place. For example in very old RHEL, filesystems would issue the flush commands correctly, but if the filesystem used LVM underneath, LVM would not pass these commands through to the hardware.

So now to the qemu cache modes:

  • writeback lets writes be cached in the host cache. This is only safe provided the guest issues flush commands correctly (which translate into fdatasync on the host). If your guest is old and/or doesn’t do this, then you need:
  • writethrough is the same as writeback, but all writes are flushed to disk as well because qemu can never know when a write is important. This is of course dead slow.
  • none uses O_DIRECT on the host side, avoiding the host cache at all. This is not especially useful, and in particular it means you’re not using the potentially large host RAM to cache things.
  • directsync is like none + writethrough (and like writethrough it’s only applicable if your guest doesn’t send flush commands properly).
  • Finally unsafe is the bad boy of caching modes. Flush commands are ignored. That’s fast, but your data is toast if the machine crashes, even if you thought you did a sync.

Libguestfs ≥ 1.23.20 does not offer all this choice. For a start, libguestfs always uses a brand new kernel, so we can assume that flush commands work, and that means we can ignore writethrough and directsync.

none (ie. O_DIRECT) which is what libguestfs has always used up to now has proven to be an endless source of pain particularly for people who are already suffering under tmpfs. It also has the big disadvantage of bypassing the host RAM for caching.

That leaves writeback and unsafe which are the two choices of cachemode that you now have when using the libguestfs add_drive API. writeback is the new, safe default. Flush commands are obeyed so as long as you’re using a journalled filesystem or issue guestfs_sync calls your data will be safe. And there’s a small performance benefit because we are using the host RAM to cache writes.

cachemode=unsafe is the dangerous new choice you have. For scratch disks, testing, temporary disks, basically anything where you either don’t care about the data, or can easily reconstruct the data, this will give you a nice performance boost (I measured about 25%).


Filed under Uncategorized

Booting Fedora 19 ppc64 netinst under qemu on x86-64

My notes on getting the Fedora 19 ppc64 netinst image to boot under qemu on an x86-64 machine.

Note: I’ve no idea if this is a good way, or a recommended way, but it worked for me.

1. Prerequisites:

I’m using Fedora 19 on the host. Note qemu-1.4 in Fedora does not work. I’m not using libvirt to manage the guest, although it’d be nice to get this working one day.

2. Compile qemu-system-ppc64 from upstream git.

3. Create an empty hard disk to store the guest:

# lvcreate -L 16G -n f20ppc64 /dev/fedora

or use truncate or qemu-img create.

4. Boot the netinst ISO using this qemu command line:

$ ./ppc64-softmmu/qemu-system-ppc64 \
    -cpu POWER7 \
    -machine pseries \
    -m 2048 \
    -hda /dev/fedora/f20ppc64 \
    -cdrom Fedora-19-ppc64-netinst.iso \
    -netdev user,id=usernet,net= \
    -device virtio-net-pci,netdev=usernet

5. You should get to the yaboot prompt.

There seems to be a rendering bug with graphics (X) in the qemu console. Anaconda was obviously running, but no drawing was happening in X, making it impossible to start the install. Oddly the exact same thing happened with VNC. Therefore I used a text-mode install:

boot: linux text

6. That should boot into the textual Anaconda installer.

If it gets stuck at returning from prom_init (and you should wait a minute or two to ensure it’s really stuck) then the problem is broken qemu, or you’re using the wrong CPU/machine type, or you’re trying to use a 64 bit kernel on 32 bit qemu.

QEMU tip: Use [Ctrl] [Alt] 2 to switch to the monitor. Use the monitor command sendkey ctrl-alt-f1 to send keycodes to the guest. Use [Ctrl] [Alt] 1 to switch back to the guest console.

tmux tip: Use [Ctrl] b [1-5] to switch between tmux windows.

1 Comment

Filed under Uncategorized

Nested virtualization (not) enabled

Interesting thing I learned a few days ago:

kvm: Nested Virtualization enabled

does not always mean that nested virtualization is being used.

If you use qemu’s software emulation (more often known as TCG) then it emulates a generic-looking AMD CPU with SVM (AMD’s virtualization feature).

AMD virtualization easily supports nesting (unlike Intel’s VT which is a massive PITA to nest), and when the KVM module is loaded, it notices the “AMD” host CPU with SVM and willingly enables nested virt. There’s actually a little bit of benefit to this because it avoids a second layer of TCG being needed if you did run a L2 guest in there (although it’s still going to be slow).

Leave a comment

Filed under Uncategorized

Using libguestfs to open an ISO on a public website

The new curl support added to libguestfs 1.22 lets you open any ISO remotely from a public web site or FTP server:

$ export LIBGUESTFS_BACKEND=direct
$ guestfish --ro -i --format=raw \
    -a http://releases.ubuntu.com/precise/ubuntu-12.04.2-desktop-amd64.iso

Operating system: Ubuntu 12.04.2 LTS "Precise Pangolin" - Release amd64 (20130213)
/dev/sda1 mounted on /

><fs> ll /
total 2506
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 .
drwxr-xr-x 23 1000 1000    4096 May 28 13:55 ..
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 .disk
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 EFI
-r--r--r--  1 root root     236 Feb 13 22:21 README.diskdefines
-r--r--r--  1 root root     134 Feb 13 22:20 autorun.inf
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 boot
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 casper
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 dists
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 install
dr-xr-xr-x  1 root root   18432 Feb 13 22:21 isolinux
-r--r--r--  1 root root   16443 Feb 13 22:21 md5sum.txt
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 pics
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 pool
dr-xr-xr-x  1 root root    2048 Feb 13 22:21 preseed
lr-xr-xr-x  1 root root       1 Feb 13 22:21 ubuntu -> .
-r--r--r--  1 root root 2504624 Feb  8 22:58 wubi.exe

Of course it is slow as hell and not nice on the web host. It makes lots of byte-range requests on the host, downloading a few KB with each request, which is kind of the worst case for webservers to handle.

Note also that Fedora’s curl is broken. I compiled my own from upstream git.


Filed under Uncategorized

qemu 1.5.0 released, with ssh block device support

qemu 1.5.0 has been released, featuring ssh support so you can access remote disks over ssh, including from libguestfs.

Here’s how to use this from guestfish:

$ export LIBGUESTFS_BACKEND=direct
$ guestfish --ro -a ssh://onuma/mnt/scratch/winxp.img -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Microsoft Windows XP
/dev/sda1 mounted on /

><fs> ll /
total 1573209
drwxrwxrwx  1 root root       4096 Apr 16  2012 .
drwxr-xr-x 23 1000 1000       4096 May 20 19:47 ..
-rwxrwxrwx  1 root root          0 Oct 11  2011 AUTOEXEC.BAT
-rwxrwxrwx  1 root root          0 Oct 11  2011 CONFIG.SYS
drwxrwxrwx  1 root root       4096 Oct 11  2011 Documents and Settings
-rwxrwxrwx  1 root root          0 Oct 11  2011 IO.SYS
-rwxrwxrwx  1 root root          0 Oct 11  2011 MSDOS.SYS
-rwxrwxrwx  1 root root      47564 Apr 14  2008 NTDETECT.COM
drwxrwxrwx  1 root root       4096 Oct 11  2011 Program Files
drwxrwxrwx  1 root root       4096 Oct 11  2011 System Volume Information
drwxrwxrwx  1 root root      28672 Oct 11  2011 WINDOWS
-rwxrwxrwx  1 root root        211 Oct 11  2011 boot.ini
-rwxrwxrwx  1 root root     250048 Apr 14  2008 ntldr
-rwxrwxrwx  1 root root 1610612736 Oct 11  2011 pagefile.sys

Leave a comment

Filed under Uncategorized

Testing exabyte-sized filesystems using qcow2 and guestfish

You can use qcow2 backing files as a convenient way to test what happens when you try to create exabyte-sized filesystems. Just to remind you, 1 exabyte is a million terabytes, or a pile of ordinary hard disks stacked 8 miles high.

There is a bug in qemu that prevents you from creating very large disks unless you adjust the cluster_size option (thanks Kevin Wolf):

$ qemu-img create -f qcow2 huge.qcow2 \
      $((1024*1024))T -o cluster_size=2M
Formatting 'huge.qcow2', fmt=qcow2 size=1152921504606846976 encryption=off cluster_size=2097152 lazy_refcounts=off 

After that you can just attach the disk to guestfish and start playing with huge filesystems.

[I should note that virt-rescue is probably a better choice of tool here, especially for people who need to experiment with unusual filesystem or LVM options]

$ guestfish -a huge.qcow2

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

><fs> run
><fs> blockdev-getsize64 /dev/sda
><fs> part-disk /dev/sda gpt

Ext4 (according to Wikipedia) is supposed to support 1 exabyte disks, but I couldn’t get that to work, possibly because there was not enough RAM:

><fs> mkfs ext4 /dev/sda1
libguestfs: error: mkfs: ext4: /dev/sda1: mke2fs 1.42.5 (29-Jul-2012)
/dev/sda1: Not enough space to build proposed filesystem while setting up superblock

XFS could create a filesystem, but I didn’t let it run to completion because it would need about 5 petabytes to store the filesystem metadata:

><fs> mkfs xfs /dev/sda1
[ disks churn for many minutes while qcow2 file grows
and grows and grows ... ]

LVM2 PVs are possible, but creating a VG requires us to adjust the extent size:

><fs> pvcreate /dev/sda1
><fs> vgcreate VG /dev/sda1
libguestfs: error: vgcreate:   PV /dev/sda1 too large for extent size 4.00 MiB.
  Format-specific setup of physical volume '/dev/sda1' failed.
  Unable to add physical volume '/dev/sda1' to volume group 'VG'.
><fs> debug sh "vgcreate -s 1G VG /dev/sda1"
  Volume group "VG" successfully created
><fs> lvcreate LV VG 1000000000
><fs> lvs-full
[0] = {
  lv_name: LV
  lv_size: 1048576536870912

Previously …

Leave a comment

Filed under Uncategorized