Tag Archives: virtualization

Caseless virtualization cluster, part 4

AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)

But how well does it perform? Not too badly as it happens.

I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.

L1 performance

Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.

This AMD hardware turns out to be pretty good:

$ time guestfish -a /dev/null run
real	0m2.585s

(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).

Forcing software emulation (disabling hardware acceleration):

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.995s

L2 performance

Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:

$ time guestfish -a /dev/null run
real	0m5.750s

Forcing software emulation:

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.949s


These are just some simple tests. I’ll be doing something more comprehensive later. However:

  1. First level hardware virtualization performance on these AMD chips is excellent.
  2. Nested virt is about 40% of non-nested speed.
  3. TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.

Other data

The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8320 Eight-Core Processor
stepping	: 0
microcode	: 0x6000822
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips	: 7031.39
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron 63xx class CPU
stepping	: 0
microcode	: 0x1000065
cpu MHz		: 3515.548
cache size	: 512 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat
bogomips	: 7031.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:


As part of the discussion in the comments about whether this has 4 or 8 physical cores, here is the lstopo output:



Filed under Uncategorized

Caseless virtualization cluster, part 3

virt-builder can be used to build baremetal images on (eg) USB keys, so that’s what I’m using to create the bootable keys for my virtualization cluster:

$ virt-builder fedora-20 --update -o /dev/sdX
[   1.0] Downloading: http://libguestfs.org/download/builder/fedora-20.xz
[   1.0] Planning how to build this image
[   1.0] Uncompressing
[  13.0] Resizing (using virt-resize) to expand the disk to 14.9G
[1497.0] Opening the new disk
[1501.0] Setting a random seed
[1501.0] Updating core packages
[1693.0] Setting passwords
Setting random password of root to pNooenUMhHz8n6iX
[1693.0] Finishing off
Output: /dev/sdc
Output size: 14.9G
Output format: raw
Total usable space: 13.9G
Free space: 12.9G (92%)

One small fact that will save you a lot of hair-pulling: This motherboard will not boot from the blue USB 3.0 ports! You have to put the USB key into one of the regular (black) USB ports.

As you can see from the timings (left hand column of virt-builer output above), writing to cheap consumer USB keys is slooowwwww. If we were using even hard disks, virt-builder would have built that image in under 60 seconds.

The OS has to be modified to avoid writes as far as possible. I’m using NFS for home directories, logging remotely with rsyslog, and exploring where to store the VM disk images. (And, yes, /tmp is on tmpfs here — it makes sense in this application)

I did some simple experiments using my development machine as an NFS (v4) server. The two machines are connected through a consumer gigabit ethernet switch. The NFS server has 32 GB of RAM and SSDs and of course the virtualization cluster is “diskless” (just a USB key to boot). Performance is pretty good:

Reads: 115 MBytes/sec
Writes: 62 MBytes/sec

One final (anticipated) problem with a caseless system is that it generates large amounts of radio frequency interference. You can show this simply by putting a transistor radio next to the machine. I have a plan to build a metal case which should reduce this.

1 Comment

Filed under Uncategorized

Caseless virtualization cluster, part 2

The second layer goes on:


I’ve changed motherboards. As outlined in part 1 I bought a motherboard without onboard graphics, which means I’m waiting for a cheap PCI-Express graphics card so I can turn on the first layer. This time I switched to the cheaper, more compact GIGABYTE GA-78LMT-USB3, which comes with on-board graphics. Thanks again Karanbir for this excellent suggestion.

The cost of the second layer is £289.96 (includes VAT and delivery).

I’m going to use the same motherboards etc for the third and fourth layers, so if you were building this cluster the total cost would be:

Part Qty. Cost
Crucial Ballistix BLS2C8G3D169DS3CEU 16 GB, Corsair CX 430
4 £1,159.84
Power strip 1 £10 (est.)
8 port gigabit ethernet switch, cables 1 £40 (est.)
Stand-offs or equivalent 1 £50 (est.)
USB keys for booting 4 £40 (est.)


  1. All prices include tax and delivery.
  2. The system is diskless so this does not include a fileserver that you will need to provide.
  3. The cost per core is around £38.


Filed under Uncategorized

Caseless virtualization cluster, part 1

This is my slightly mad plan to build a 32 core, 64 GB virtualization cluster for as little money as possible.

I bought the first “layer” of this infinitely expandable cluster design to check that all the parts work together (in fact they don’t — see below).


In the box: Gigabyte 970A-DS3P AMD 970 motherboard, AMD FX 8320 8 core processor, Crucial Ballistix BLS2C8G3D169DS3CEU 16GB RAM, Corsair Builder Series CX 430W PSU.

The total cost was £304 (includes sales tax and delivery). The cost per core + 2 GB RAM is a very reasonable £38.

I’m planning to run the cluster caseless (or at least, I’m first going to examine the heat and EM-radiation by running this first layer caseless to see if it is feasible). And diskless, using PXE or a cheap USB key to boot, with the OS and guests located on a fast NFS server.

To stack up caseless motherboards in the final cluster, I’m using these aluminium stand-offs. Each stand-off is 1″ high. [Edit: See comments for a cheaper alternative]



Unfortunately even with 3 inches of stand-offs, the clearance over the processor fan wouldn’t be very much. If I go for 4 spacers (4″) then the total height of the final four board cluster would be more than a foot!


The second problem is that I’d forgotten that being AMD there is no integrated graphics [not quite true, see comments]. These boards appear not to boot without a graphics card. The card will be completely useless in normal operation, just taking up space and power and adding to the price per core.

Another issue is whether I should just purchase one PSU per motherboard, or invest in Y-splitters such as this one. It’s not clear to me that a Y-splitter can power the CPU.

Thanks to Karanbir Singh for suggesting these processors. They are very cheap per core.


Filed under Uncategorized

A couple of ARM items …

I was going to title this post something like “ARM – from miniscule to enormous” because it refers to the Cortex M0+ (check out this picture!) and the 64 bit ARMs processors. But since they are such radically different beasts which don’t even share the same instruction set, let’s say this is about two items produced by ARM Holdings.

Firstly I ordered a two fun little Cortex-M0-based development boards, the Element 14 Freescale Freedom Board (buy hereYou have to order two because they are individually too cheap to meet the minimum order value on the Farnell site).

This is very very different from the other ARM hardware I have, because you can’t run Linux on it (it has only 128 KB of programmable flash, and a mere 16 KB of running memory). Nevertheless, it’s a proper 32 bit processor which runs FORTH (eg) or you can program to the metal with ease.


Secondly, Linaro Connect Asia 2014 starts very early tomorrow morning (around 2am UTC, or about 7 hours from now). It looks like it will be streamed as Google Hangouts, and available on YouTube shortly after. There are interesting talks on virtualization, big.LITTLE scheduling and ARMv8 and Red Hat’s own Jon Masters is giving a keynote.

1 Comment

Filed under Uncategorized

Creating a cloud-init config disk for non-cloud boots

There are lots of cloud disk images floating around. They are designed to run in clouds where there is a boot-time network service called cloud-init available that provides initial configuration. If that’s not present, or you’re just trying to boot these images in KVM/libvirt directly without any cloud, then things can go wrong.

Luckily it’s fairly easy to create a config disk (aka “seed disk”) which you attach to the guest and then let cloud-init in the guest get its configuration from there. No cloud, or even network, required.

I’m going to use a tool called virt-make-fs to make the config disk, as it’s easy to use and doesn’t require root. There are other tools around, eg. make-seed-disk which do a similar job. (NB: You might hit this bug in virt-make-fs, which should be fixed in the latest version).

I’m also using a cloud image downloaded from the Fedora project, but any cloud image should work.

First I create my cloud-init metadata. This consists of two files. meta-data contains host and network configuration:

instance-id: iid-123456
local-hostname: cloudy

user-data contains other custom configuration (note #cloud-config is
not a comment, it’s a directive to tell cloud-init the format of the file):

password: 123456
 - [ useradd, -m, -p, "", rjones ]
 - [ chage, -d, 0, rjones ]

(The idea behind this split is probably not obvious, but apparently it’s because the meta-data is meant to be supplied by the Cloud, and the user-data is meant to be supplied by the Cloud’s customer. In this case, no cloud, so we’re going to supply both!)

I put these two files into a directory, and run virt-make-fs to create the config disk:

$ ls
meta-data  user-data
$ virt-make-fs --type=msdos --label=cidata . /tmp/seed.img
$ virt-filesystems -a /tmp/seed.img --all --long -h
Name      Type        VFS   Label   MBR  Size  Parent
/dev/sda  filesystem  vfat  cidata  -    286K  -
/dev/sda  device      -     -       -    286K  -

Now I need to pass some kernel options when booting the Fedora cloud image, and the only way to do that is if I boot from an external kernel & initrd. This is not as complicated as it sounds, and virt-builder has an option to get the kernel and initrd that I’m going to need:

$ virt-builder --get-kernel Fedora-cloud.raw
download: /boot/vmlinuz-3.9.5-301.fc19.x86_64 -> ./vmlinuz-3.9.5-301.fc19.x86_64
download: /boot/initramfs-3.9.5-301.fc19.x86_64.img -> ./initramfs-3.9.5-301.fc19.x86_64.img

Finally I’m going to boot the guest using KVM (you could also use libvirt with a little extra effort):

$ qemu-kvm -m 1024 \
    -drive file=Fedora-cloud.raw,if=virtio \
    -drive file=seed.img,if=virtio \
    -kernel ./vmlinuz-3.9.5-301.fc19.x86_64 \
    -initrd ./initramfs-3.9.5-301.fc19.x86_64.img \
    -append 'root=/dev/vda1 ro ds=nocloud-net'

You’ll be able to log in either as fedora/123456 or rjones (no password), and you should see that the hostname has been set to cloudy.


Filed under Uncategorized


While the ODROID is on ice until I get some more cables, in the search for a workable, available virt development platform I took a punt on the Cubietruck.


I think this is even less likely to arrive than the last one, because the specs and price are rather too good to be true: Allwinner A20 (Cortex-A7), which supports KVM. 2GB of RAM. A VGA port (all my dreams have come true!). Wifi on board. SATA(!) .. etc.

The total cost was $89 for the board, $12 shipping, potentially about £15 import duty if they catch it.


Filed under Uncategorized