Tag Archives: qemu

Tip: Use gdbserver to debug qemu running under libguestfs

If qemu crashes or fails when run under libguestfs, it can be a bit hard to debug things. However a small qemu wrapper and gdbserver can help.

Create a file called qemu-wrapper chmod +x and containing:

#!/bin/bash -

if ! echo "$@" | grep -sqE -- '-help|-version|-device \?' ; then
  gdbserver="gdbserver :1234"

exec $gdbserver /usr/bin/qemu-system-x86_64 "$@"

Set your environment variables so libguestfs will use the qemu wrapper instead of running qemu directly:

$ export LIBGUESTFS_BACKEND=direct
$ export LIBGUESTFS_HV=/path/to/qemu-wrapper

Now we run guestfish or another virt tool as normal:

$ guestfish -a /dev/null -v -x run

When qemu starts up, gdbserver will run and halt the process, printing:

Listening on port 1234

At this point you can connect gdb:

$ gdb
(gdb) file /usr/bin/qemu-system-x86_64
(gdb) target remote tcp::1234
set breakpoints etc here
(gdb) cont

Leave a comment

Filed under Uncategorized

Notes on getting VMware ESXi to run under KVM

This is mostly adapted from this long thread on the VMware community site.

I got VMware ESXi 5.5.0 running on upstream KVM today.

First I had to disable the “VMware backdoor”. When VMware runs, it detects that qemu underneath is emulating this port and tries to use it to query the machine (instead of using CPUID and so on). Unfortunately qemu’s emulation of the VMware backdoor is very half-assed. There’s no way to disable it except to patch qemu:

diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index eaf3e61..ca1c422 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -204,7 +204,7 @@ static void pc_init1(QEMUMachineInitArgs *args,
     pc_vga_init(isa_bus, pci_enabled ? pci_bus : NULL);
     /* init basic PC hardware */
-    pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy, xen_enabled(),
+    pc_basic_device_init(isa_bus, gsi, &rtc_state, &floppy, 1,
     pc_nic_init(isa_bus, pci_bus);

It would be nice if this was configurable in qemu. This is now being fixed upstream.

Secondly I had to turn off MSR emulation. This is, unfortunately, a machine-wide setting:

# echo 1 > /sys/module/kvm/parameters/ignore_msrs
# cat /sys/module/kvm/parameters/ignore_msrs

Thirdly I had to give the ESXi virtual machine an IDE disk and an e1000 vmxnet3 network card. Note also that ESXi requires ≥ 2 vCPUs and at least 2 GB of RAM.

Screenshot - 190514 - 16:04:38

Screenshot - 190514 - 16:21:09


Filed under Uncategorized

Caseless virtualization cluster, part 4

AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)

But how well does it perform? Not too badly as it happens.

I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.

L1 performance

Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.

This AMD hardware turns out to be pretty good:

$ time guestfish -a /dev/null run
real	0m2.585s

(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).

Forcing software emulation (disabling hardware acceleration):

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.995s

L2 performance

Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:

$ time guestfish -a /dev/null run
real	0m5.750s

Forcing software emulation:

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.949s


These are just some simple tests. I’ll be doing something more comprehensive later. However:

  1. First level hardware virtualization performance on these AMD chips is excellent.
  2. Nested virt is about 40% of non-nested speed.
  3. TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.

Other data

The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8320 Eight-Core Processor
stepping	: 0
microcode	: 0x6000822
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips	: 7031.39
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron 63xx class CPU
stepping	: 0
microcode	: 0x1000065
cpu MHz		: 3515.548
cache size	: 512 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat
bogomips	: 7031.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:


As part of the discussion in the comments about whether this has 4 or 8 physical cores, here is the lstopo output:



Filed under Uncategorized

nbdkit: Write plugins in Perl

nbdkit is a liberally licensed NBD server, that lets you serve “unconventional” sources of disk images and make them available to qemu, libguestfs, etc.

In the latest version of nbdkit, you can now write your plugins in Perl [example].

Coming soon: The ability to write plugins in Python too.

1 Comment

Filed under Uncategorized

New in libguestfs: Allow cache mode to be selected

libguestfs used to default to the qemu cache=none caching mode. By allowing other qemu caching modes to be used, you can get a considerable speed up, up to 25% in the best case.

qemu has a pretty confusing range of caching modes. It’s possible to select writeback (the default), writethrough, none, directsync or unsafe. Know what all those mean? Congratulations, you are Kevin Wolf.

It helps to understand there are many places where your data could be cached. Some are: in the guest’s RAM, in the host’s RAM, or in the small cache which is part of the circuitry on the disk drive. And there’s a conflict of interests too. For your precious novel, you’d hope that when you hit “Save” (and even before) it has actually been written in magnetic patterns on the spinning disk. But the vast majority of data that is written (eg. during a large compile) is not that important, can easily be reconstructed, and is best left in RAM for speedy access.

Qemu emulates disk drives that in some way resemble the real thing, and disk drives have a range of ways that the operating system tells the drive the importance of data and when it needs to be persisted. For example a guest using a (real or emulated) SCSI drive sets the Force Unit Access bit to force the data to be written to physical media before being acknowledged. Unfortunately not all guests know about the FUA flag or its equivalents, and even those that do sometimes don’t issue these flush commands in the right place. For example in very old RHEL, filesystems would issue the flush commands correctly, but if the filesystem used LVM underneath, LVM would not pass these commands through to the hardware.

So now to the qemu cache modes:

  • writeback lets writes be cached in the host cache. This is only safe provided the guest issues flush commands correctly (which translate into fdatasync on the host). If your guest is old and/or doesn’t do this, then you need:
  • writethrough is the same as writeback, but all writes are flushed to disk as well because qemu can never know when a write is important. This is of course dead slow.
  • none uses O_DIRECT on the host side, avoiding the host cache at all. This is not especially useful, and in particular it means you’re not using the potentially large host RAM to cache things.
  • directsync is like none + writethrough (and like writethrough it’s only applicable if your guest doesn’t send flush commands properly).
  • Finally unsafe is the bad boy of caching modes. Flush commands are ignored. That’s fast, but your data is toast if the machine crashes, even if you thought you did a sync.

Libguestfs ≥ 1.23.20 does not offer all this choice. For a start, libguestfs always uses a brand new kernel, so we can assume that flush commands work, and that means we can ignore writethrough and directsync.

none (ie. O_DIRECT) which is what libguestfs has always used up to now has proven to be an endless source of pain particularly for people who are already suffering under tmpfs. It also has the big disadvantage of bypassing the host RAM for caching.

That leaves writeback and unsafe which are the two choices of cachemode that you now have when using the libguestfs add_drive API. writeback is the new, safe default. Flush commands are obeyed so as long as you’re using a journalled filesystem or issue guestfs_sync calls your data will be safe. And there’s a small performance benefit because we are using the host RAM to cache writes.

cachemode=unsafe is the dangerous new choice you have. For scratch disks, testing, temporary disks, basically anything where you either don’t care about the data, or can easily reconstruct the data, this will give you a nice performance boost (I measured about 25%).


Filed under Uncategorized

Booting Fedora 19 ppc64 netinst under qemu on x86-64

My notes on getting the Fedora 19 ppc64 netinst image to boot under qemu on an x86-64 machine.

Note: I’ve no idea if this is a good way, or a recommended way, but it worked for me.

1. Prerequisites:

I’m using Fedora 19 on the host. Note qemu-1.4 in Fedora does not work. I’m not using libvirt to manage the guest, although it’d be nice to get this working one day.

2. Compile qemu-system-ppc64 from upstream git.

3. Create an empty hard disk to store the guest:

# lvcreate -L 16G -n f20ppc64 /dev/fedora

or use truncate or qemu-img create.

4. Boot the netinst ISO using this qemu command line:

$ ./ppc64-softmmu/qemu-system-ppc64 \
    -cpu POWER7 \
    -machine pseries \
    -m 2048 \
    -hda /dev/fedora/f20ppc64 \
    -cdrom Fedora-19-ppc64-netinst.iso \
    -netdev user,id=usernet,net= \
    -device virtio-net-pci,netdev=usernet

5. You should get to the yaboot prompt.

There seems to be a rendering bug with graphics (X) in the qemu console. Anaconda was obviously running, but no drawing was happening in X, making it impossible to start the install. Oddly the exact same thing happened with VNC. Therefore I used a text-mode install:

boot: linux text

6. That should boot into the textual Anaconda installer.

If it gets stuck at returning from prom_init (and you should wait a minute or two to ensure it’s really stuck) then the problem is broken qemu, or you’re using the wrong CPU/machine type, or you’re trying to use a 64 bit kernel on 32 bit qemu.

QEMU tip: Use [Ctrl] [Alt] 2 to switch to the monitor. Use the monitor command sendkey ctrl-alt-f1 to send keycodes to the guest. Use [Ctrl] [Alt] 1 to switch back to the guest console.

tmux tip: Use [Ctrl] b [1-5] to switch between tmux windows.

1 Comment

Filed under Uncategorized

Nested virtualization (not) enabled

Interesting thing I learned a few days ago:

kvm: Nested Virtualization enabled

does not always mean that nested virtualization is being used.

If you use qemu’s software emulation (more often known as TCG) then it emulates a generic-looking AMD CPU with SVM (AMD’s virtualization feature).

AMD virtualization easily supports nesting (unlike Intel’s VT which is a massive PITA to nest), and when the KVM module is loaded, it notices the “AMD” host CPU with SVM and willingly enables nested virt. There’s actually a little bit of benefit to this because it avoids a second layer of TCG being needed if you did run a L2 guest in there (although it’s still going to be slow).

Leave a comment

Filed under Uncategorized