AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)
But how well does it perform? Not too badly as it happens.
I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.
L1 performance
Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.
This AMD hardware turns out to be pretty good:
$ time guestfish -a /dev/null run real 0m2.585s
(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).
Forcing software emulation (disabling hardware acceleration):
$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run real 0m9.995s
L2 performance
Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:
$ time guestfish -a /dev/null run real 0m5.750s
Forcing software emulation:
$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run real 0m9.949s
Conclusions
These are just some simple tests. I’ll be doing something more comprehensive later. However:
- First level hardware virtualization performance on these AMD chips is excellent.
- Nested virt is about 40% of non-nested speed.
- TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.
Other data
The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:
processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-8320 Eight-Core Processor stepping : 0 microcode : 0x6000822 cpu MHz : 1400.000 cache size : 2048 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1 bogomips : 7031.39 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:
processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron 63xx class CPU stepping : 0 microcode : 0x1000065 cpu MHz : 3515.548 cache size : 512 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat bogomips : 7031.09 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
Update
As part of the discussion in the comments about whether this has 4 or 8 physical cores, here is the lstopo output:
What did you use for the L0 host RHEl or Fedora 20?
Fedora 20. There are a few reasons for this:
However this is not my final decision. By using virt-builder and interchangable USB keys it’s trivial to switch layers to run another OS or hypervisor. I might have to do that if I need to run a v2v hypervisor on baremetal. And then by switching the USB keys back I can go easily go back to another OS/HV.
Do you happen to know why does /proc/cpuinfo in L0 say that the CPU has 4 cores?
According to http://www.cpubenchmark.net/cpu.php?cpu=AMD+FX-8320+Eight-Core : “No of Cores: 4 (2 logical cores per physical)”
I’m pretty sure the CPU has 8 cores. See this die photo and the lstopo output I added to the article.
Edit: OK I’ll revise that slightly. It has shared i-cache and separate data caches, which is a peculiar architecture. ie. It’s not classical/Intel-style multi-threading.
Edit x 2: This explains it pretty well. Since I’m almost exclusively interested in integer performance, it’s pretty good for me, but it would be good to test the actual throughput by pinning some test processes to cores.
Hello, I don’t know if this is right or not, but it seems to me that the L1 guest having access to only 1 vCPU might be part of the slowdown as compared to the 8 (real) CPUs the L0 host gets to use. This difference would not appear when comparing software emulation as most (if not all) software emulators use only 1 thread for CPU emulation (even when emulating SMP).
Would you perhaps mind running your test again with the same number of vCPUs available to L1 as (real) CPUs the L0 host is running on?
I think it would shed more light on the efficiency/performance of nested virtualization under AMD-V
It causes the host to hard reboot. Not sure if that answers your question 😦
At some point I’ll plug in a monitor and find out what the panic/problem is. At least the host does reboot and I don’t have to go over there and push a button.
Rich, in part 1 and 2 you mentioned that you weren’t sure if the motherboard would drive the CPUs at 100%. I’ve recently purchased the same motherboard (mere coincidence) and would like to know if your testing has shown a limitation in driving the CPUs.
Thanks,
John
Pingback: Super-nested KVM | Richard WM Jones