Caseless virtualization cluster, part 4

AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)

But how well does it perform? Not too badly as it happens.

I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.

L1 performance

Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.

This AMD hardware turns out to be pretty good:

$ time guestfish -a /dev/null run
real	0m2.585s

(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).

Forcing software emulation (disabling hardware acceleration):

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.995s

L2 performance

Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:

$ time guestfish -a /dev/null run
real	0m5.750s

Forcing software emulation:

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.949s

Conclusions

These are just some simple tests. I’ll be doing something more comprehensive later. However:

  1. First level hardware virtualization performance on these AMD chips is excellent.
  2. Nested virt is about 40% of non-nested speed.
  3. TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.

Other data

The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8320 Eight-Core Processor
stepping	: 0
microcode	: 0x6000822
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips	: 7031.39
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron 63xx class CPU
stepping	: 0
microcode	: 0x1000065
cpu MHz		: 3515.548
cache size	: 512 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat
bogomips	: 7031.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Update

As part of the discussion in the comments about whether this has 4 or 8 physical cores, here is the lstopo output:

lstopo

9 Comments

Filed under Uncategorized

9 responses to “Caseless virtualization cluster, part 4

  1. problemchild68

    What did you use for the L0 host RHEl or Fedora 20?

    • rich

      Fedora 20. There are a few reasons for this:

      1. RHEL 6 might not have hardware enablement for this. Actually I don’t know, but didn’t especially want to find out the hard way.
      2. RHEL 7 might not enable/support/have the latest nested KVM. We’re going to enable it, but it’s not clear if the relatively “old” (but very heavily patched) qemu would have all the necessary AMD nested virt backports.
      3. Fedora 20 gives me a simple, tested path to running upstream qemu 2.0 in the immediate future.

      However this is not my final decision. By using virt-builder and interchangable USB keys it’s trivial to switch layers to run another OS or hypervisor. I might have to do that if I need to run a v2v hypervisor on baremetal. And then by switching the USB keys back I can go easily go back to another OS/HV.

  2. Do you happen to know why does /proc/cpuinfo in L0 say that the CPU has 4 cores?

    
    siblings	: 8
    cpu cores	: 4
    
  3. Seriously

    Hello, I don’t know if this is right or not, but it seems to me that the L1 guest having access to only 1 vCPU might be part of the slowdown as compared to the 8 (real) CPUs the L0 host gets to use. This difference would not appear when comparing software emulation as most (if not all) software emulators use only 1 thread for CPU emulation (even when emulating SMP).

    Would you perhaps mind running your test again with the same number of vCPUs available to L1 as (real) CPUs the L0 host is running on?
    I think it would shed more light on the efficiency/performance of nested virtualization under AMD-V

    • rich

      It causes the host to hard reboot. Not sure if that answers your questionđŸ˜¦

      At some point I’ll plug in a monitor and find out what the panic/problem is. At least the host does reboot and I don’t have to go over there and push a button.

  4. Rich, in part 1 and 2 you mentioned that you weren’t sure if the motherboard would drive the CPUs at 100%. I’ve recently purchased the same motherboard (mere coincidence) and would like to know if your testing has shown a limitation in driving the CPUs.
    Thanks,
    John

  5. Pingback: Super-nested KVM | Richard WM Jones

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s