AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)
But how well does it perform? Not too badly as it happens.
I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.
Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.
This AMD hardware turns out to be pretty good:
$ time guestfish -a /dev/null run real 0m2.585s
(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).
Forcing software emulation (disabling hardware acceleration):
$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run real 0m9.995s
Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:
$ time guestfish -a /dev/null run real 0m5.750s
Forcing software emulation:
$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run real 0m9.949s
These are just some simple tests. I’ll be doing something more comprehensive later. However:
- First level hardware virtualization performance on these AMD chips is excellent.
- Nested virt is about 40% of non-nested speed.
- TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.
The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:
processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD FX(tm)-8320 Eight-Core Processor stepping : 0 microcode : 0x6000822 cpu MHz : 1400.000 cache size : 2048 KB physical id : 0 siblings : 8 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1 bogomips : 7031.39 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 48 bits physical, 48 bits virtual power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro
The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:
processor : 0 vendor_id : AuthenticAMD cpu family : 21 model : 2 model name : AMD Opteron 63xx class CPU stepping : 0 microcode : 0x1000065 cpu MHz : 3515.548 cache size : 512 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat bogomips : 7031.09 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management:
As part of the discussion in the comments about whether this has 4 or 8 physical cores, here is the lstopo output:
$ virt-builder fedora-20 --update -o /dev/sdX [ 1.0] Downloading: http://libguestfs.org/download/builder/fedora-20.xz [ 1.0] Planning how to build this image [ 1.0] Uncompressing [ 13.0] Resizing (using virt-resize) to expand the disk to 14.9G [1497.0] Opening the new disk [1501.0] Setting a random seed [1501.0] Updating core packages [1693.0] Setting passwords Setting random password of root to pNooenUMhHz8n6iX [1693.0] Finishing off Output: /dev/sdc Output size: 14.9G Output format: raw Total usable space: 13.9G Free space: 12.9G (92%)
One small fact that will save you a lot of hair-pulling: This motherboard will not boot from the blue USB 3.0 ports! You have to put the USB key into one of the regular (black) USB ports.
As you can see from the timings (left hand column of virt-builer output above), writing to cheap consumer USB keys is slooowwwww. If we were using even hard disks, virt-builder would have built that image in under 60 seconds.
The OS has to be modified to avoid writes as far as possible. I’m using NFS for home directories, logging remotely with rsyslog, and exploring where to store the VM disk images. (And, yes, /tmp is on tmpfs here — it makes sense in this application)
I did some simple experiments using my development machine as an NFS (v4) server. The two machines are connected through a consumer gigabit ethernet switch. The NFS server has 32 GB of RAM and SSDs and of course the virtualization cluster is “diskless” (just a USB key to boot). Performance is pretty good:
Reads: 115 MBytes/sec
Writes: 62 MBytes/sec
One final (anticipated) problem with a caseless system is that it generates large amounts of radio frequency interference. You can show this simply by putting a transistor radio next to the machine. I have a plan to build a metal case which should reduce this.
The second layer goes on:
I’ve changed motherboards. As outlined in part 1 I bought a motherboard without onboard graphics, which means I’m waiting for a cheap PCI-Express graphics card so I can turn on the first layer. This time I switched to the cheaper, more compact GIGABYTE GA-78LMT-USB3, which comes with on-board graphics. Thanks again Karanbir for this excellent suggestion.
The cost of the second layer is £289.96 (includes VAT and delivery).
I’m going to use the same motherboards etc for the third and fourth layers, so if you were building this cluster the total cost would be:
| GIGABYTE GA-78LMT-USB3, AMD FX8320,
Crucial Ballistix BLS2C8G3D169DS3CEU 16 GB, Corsair CX 430
|Power strip||1||£10 (est.)|
|8 port gigabit ethernet switch, cables||1||£40 (est.)|
|Stand-offs or equivalent||1||£50 (est.)|
|USB keys for booting||4||£40 (est.)|
|TOTAL PRICE OF CLUSTER||£1300|
- All prices include tax and delivery.
- The system is diskless so this does not include a fileserver that you will need to provide.
- The cost per core is around £38.
This is my slightly mad plan to build a 32 core, 64 GB virtualization cluster for as little money as possible.
I bought the first “layer” of this infinitely expandable cluster design to check that all the parts work together (in fact they don’t — see below).
The total cost was £304 (includes sales tax and delivery). The cost per core + 2 GB RAM is a very reasonable £38.
I’m planning to run the cluster caseless (or at least, I’m first going to examine the heat and EM-radiation by running this first layer caseless to see if it is feasible). And diskless, using PXE or a cheap USB key to boot, with the OS and guests located on a fast NFS server.
To stack up caseless motherboards in the final cluster, I’m using these aluminium stand-offs. Each stand-off is 1″ high. [Edit: See comments for a cheaper alternative]
Unfortunately even with 3 inches of stand-offs, the clearance over the processor fan wouldn’t be very much. If I go for 4 spacers (4″) then the total height of the final four board cluster would be more than a foot!
The second problem is that I’d forgotten that being AMD
there is no integrated graphics [not quite true, see comments]. These boards appear not to boot without a graphics card. The card will be completely useless in normal operation, just taking up space and power and adding to the price per core.
Another issue is whether I should just purchase one PSU per motherboard, or invest in Y-splitters such as this one. It’s not clear to me that a Y-splitter can power the CPU.
Thanks to Karanbir Singh for suggesting these processors. They are very cheap per core.
My HP Microservers are getting on a bit. I need some high performance, small form factor server(s) for virtualization testing.
The key features for me would be:
- Lots of RAM. They must come with or be upgradable to
32GB16GB, but ideally 32GB.
- Small form factor, like the Intel NUC.
- Lots of cores. I’m loving the Intel Avoton 8-core Atom, but it looks like no one is building systems with these yet.
Suggestions most welcome!
Virt-builder ≥ 1.26 now lets you flexibly edit configuration files before you install packages. (1.24 didn’t). So finally you can enable the Fedora updates-testing repository and build a guest with packages from that:
$ virt-builder fedora-20 \ --edit '/etc/yum.repos.d/fedora-updates-testing.repo: s/enabled=0/enabled=1/' \ --install git,emacs,yum-utils,net-tools,libguestfs [ 0.0] Downloading: http://libguestfs.org/download/builder/fedora-20.xz [ 1.0] Planning how to build this image [ 1.0] Uncompressing [ 11.0] Opening the new disk [ 16.0] Setting a random seed [ 16.0] Updating core packages [ 269.0] Editing: /etc/yum.repos.d/fedora-updates-testing.repo [ 269.0] Installing packages: git emacs yum-utils net-tools libguestfs [ 349.0] Setting passwords Setting random password of root to *** [ 349.0] Finishing off Output: fedora-20.img Output size: 4.0G Output format: raw Total usable space: 5.2G Free space: 3.7G (71%)