Tag Archives: AMD

Caseless virtualization cluster, part 5

My caseless virtualization cluster is now complete. 32 cores (arguably), 64 GB of RAM, for about £1300:

20140428_123312

The power supplies cause a real wiring nightmare! It would be great to have a better solution to delivering power:

20140428_123326

20140428_123333

It runs almost silently.

I plan to encase the whole thing in a metal case, firstly to make it more portable, and secondly to reduce the amount of RF given off. You would probably not be able to legally run this in a commercial environment because of EMC regulations. You definitely would not be allowed to sell it.

The next problem is management software. While it’s certainly possible to log in to each of the four individual hosts and run virsh commands, that’s going to get tedious rather quickly.

The problem is that all “solutions” to this are rather heavyweight. I could manage the hosts using Puppet and install OpenStack, but it would probably take longer to set that up than the time saved. There’s a lot of cloud software out there, but not much that nicely manages 4 hosts without requiring huge dependencies. What I really want is a small command line tool that uses libvirt remotely so I don’t have to install anything on the hosts.

24 Comments

Filed under Uncategorized

Caseless virtualization cluster, part 4

AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)

But how well does it perform? Not too badly as it happens.

I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.

L1 performance

Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.

This AMD hardware turns out to be pretty good:

$ time guestfish -a /dev/null run
real	0m2.585s

(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).

Forcing software emulation (disabling hardware acceleration):

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.995s

L2 performance

Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:

$ time guestfish -a /dev/null run
real	0m5.750s

Forcing software emulation:

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.949s

Conclusions

These are just some simple tests. I’ll be doing something more comprehensive later. However:

  1. First level hardware virtualization performance on these AMD chips is excellent.
  2. Nested virt is about 40% of non-nested speed.
  3. TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.

Other data

The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8320 Eight-Core Processor
stepping	: 0
microcode	: 0x6000822
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips	: 7031.39
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron 63xx class CPU
stepping	: 0
microcode	: 0x1000065
cpu MHz		: 3515.548
cache size	: 512 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat
bogomips	: 7031.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Update

As part of the discussion in the comments about whether this has 4 or 8 physical cores, here is the lstopo output:

lstopo

9 Comments

Filed under Uncategorized

Caseless virtualization cluster, part 3

virt-builder can be used to build baremetal images on (eg) USB keys, so that’s what I’m using to create the bootable keys for my virtualization cluster:

$ virt-builder fedora-20 --update -o /dev/sdX
[   1.0] Downloading: http://libguestfs.org/download/builder/fedora-20.xz
[   1.0] Planning how to build this image
[   1.0] Uncompressing
[  13.0] Resizing (using virt-resize) to expand the disk to 14.9G
[1497.0] Opening the new disk
[1501.0] Setting a random seed
[1501.0] Updating core packages
[1693.0] Setting passwords
Setting random password of root to pNooenUMhHz8n6iX
[1693.0] Finishing off
Output: /dev/sdc
Output size: 14.9G
Output format: raw
Total usable space: 13.9G
Free space: 12.9G (92%)

One small fact that will save you a lot of hair-pulling: This motherboard will not boot from the blue USB 3.0 ports! You have to put the USB key into one of the regular (black) USB ports.

As you can see from the timings (left hand column of virt-builer output above), writing to cheap consumer USB keys is slooowwwww. If we were using even hard disks, virt-builder would have built that image in under 60 seconds.

The OS has to be modified to avoid writes as far as possible. I’m using NFS for home directories, logging remotely with rsyslog, and exploring where to store the VM disk images. (And, yes, /tmp is on tmpfs here — it makes sense in this application)

I did some simple experiments using my development machine as an NFS (v4) server. The two machines are connected through a consumer gigabit ethernet switch. The NFS server has 32 GB of RAM and SSDs and of course the virtualization cluster is “diskless” (just a USB key to boot). Performance is pretty good:

Reads: 115 MBytes/sec
Writes: 62 MBytes/sec

One final (anticipated) problem with a caseless system is that it generates large amounts of radio frequency interference. You can show this simply by putting a transistor radio next to the machine. I have a plan to build a metal case which should reduce this.

1 Comment

Filed under Uncategorized

Caseless virtualization cluster, part 2

The second layer goes on:

20140416_104436

I’ve changed motherboards. As outlined in part 1 I bought a motherboard without onboard graphics, which means I’m waiting for a cheap PCI-Express graphics card so I can turn on the first layer. This time I switched to the cheaper, more compact GIGABYTE GA-78LMT-USB3, which comes with on-board graphics. Thanks again Karanbir for this excellent suggestion.

The cost of the second layer is £289.96 (includes VAT and delivery).

I’m going to use the same motherboards etc for the third and fourth layers, so if you were building this cluster the total cost would be:

Part Qty. Cost
GIGABYTE GA-78LMT-USB3, AMD FX8320,
Crucial Ballistix BLS2C8G3D169DS3CEU 16 GB, Corsair CX 430
4 £1,159.84
Power strip 1 £10 (est.)
8 port gigabit ethernet switch, cables 1 £40 (est.)
Stand-offs or equivalent 1 £50 (est.)
USB keys for booting

4 £40 (est.)
TOTAL PRICE OF CLUSTER £1300

Notes:

  1. All prices include tax and delivery.
  2. The system is diskless so this does not include a fileserver that you will need to provide.
  3. The cost per core is around £38.

12 Comments

Filed under Uncategorized

Caseless virtualization cluster, part 1

This is my slightly mad plan to build a 32 core, 64 GB virtualization cluster for as little money as possible.

I bought the first “layer” of this infinitely expandable cluster design to check that all the parts work together (in fact they don’t — see below).

20140413_154143

In the box: Gigabyte 970A-DS3P AMD 970 motherboard, AMD FX 8320 8 core processor, Crucial Ballistix BLS2C8G3D169DS3CEU 16GB RAM, Corsair Builder Series CX 430W PSU.

The total cost was £304 (includes sales tax and delivery). The cost per core + 2 GB RAM is a very reasonable £38.

I’m planning to run the cluster caseless (or at least, I’m first going to examine the heat and EM-radiation by running this first layer caseless to see if it is feasible). And diskless, using PXE or a cheap USB key to boot, with the OS and guests located on a fast NFS server.

To stack up caseless motherboards in the final cluster, I’m using these aluminium stand-offs. Each stand-off is 1″ high. [Edit: See comments for a cheaper alternative]

20140413_154643

20140413_154902

Unfortunately even with 3 inches of stand-offs, the clearance over the processor fan wouldn’t be very much. If I go for 4 spacers (4″) then the total height of the final four board cluster would be more than a foot!

20140413_155728

The second problem is that I’d forgotten that being AMD there is no integrated graphics [not quite true, see comments]. These boards appear not to boot without a graphics card. The card will be completely useless in normal operation, just taking up space and power and adding to the price per core.

Another issue is whether I should just purchase one PSU per motherboard, or invest in Y-splitters such as this one. It’s not clear to me that a Y-splitter can power the CPU.

Thanks to Karanbir Singh for suggesting these processors. They are very cheap per core.

14 Comments

Filed under Uncategorized

Nested virtualization (not) enabled

Interesting thing I learned a few days ago:

kvm: Nested Virtualization enabled

does not always mean that nested virtualization is being used.

If you use qemu’s software emulation (more often known as TCG) then it emulates a generic-looking AMD CPU with SVM (AMD’s virtualization feature).

AMD virtualization easily supports nesting (unlike Intel’s VT which is a massive PITA to nest), and when the KVM module is loaded, it notices the “AMD” host CPU with SVM and willingly enables nested virt. There’s actually a little bit of benefit to this because it avoids a second layer of TCG being needed if you did run a L2 guest in there (although it’s still going to be slow).

Leave a comment

Filed under Uncategorized