Tag Archives: guestfish

Caseless virtualization cluster, part 4

AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)

But how well does it perform? Not too badly as it happens.

I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.

L1 performance

Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.

This AMD hardware turns out to be pretty good:

$ time guestfish -a /dev/null run
real	0m2.585s

(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).

Forcing software emulation (disabling hardware acceleration):

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.995s

L2 performance

Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:

$ time guestfish -a /dev/null run
real	0m5.750s

Forcing software emulation:

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.949s

Conclusions

These are just some simple tests. I’ll be doing something more comprehensive later. However:

  1. First level hardware virtualization performance on these AMD chips is excellent.
  2. Nested virt is about 40% of non-nested speed.
  3. TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.

Other data

The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8320 Eight-Core Processor
stepping	: 0
microcode	: 0x6000822
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips	: 7031.39
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron 63xx class CPU
stepping	: 0
microcode	: 0x1000065
cpu MHz		: 3515.548
cache size	: 512 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat
bogomips	: 7031.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

2 Comments

Filed under Uncategorized

Transactions with guestfish

I was asked a few days ago if libguestfs has a way to apply a group of changes to an image together. The question was really about transaction support — applying a group of changes and then committing them or doing a rollback, with the final image either containing all the changes or none of them.

Although libguestfs doesn’t support this, you can do it using libguestfs and the qemu-img tool together. This post shows you how.

First I use virt-builder to quickly get a test image that I can play with:

$ virt-builder fedora-20

We create an overlay which will store the changes until we decide to commit or rollback:

$ qemu-img create -f qcow2 -b fedora-20.img overlay.img

Now open the overlay and make your changes:

$ guestfish -a overlay.img -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Fedora release 20 (Heisenbug)
/dev/sda3 mounted on /
/dev/sda1 mounted on /boot

><fs> write-append /etc/issue.net \
    "THIS IS A CHANGE TO ISSUE.NET\n"
><fs> cat /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET
><fs> exit

The base image (fedora-20.img) is untouched, and the overlay contains the changes we made:

$ virt-cat -a fedora-20.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
$ virt-cat -a overlay.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET

Rollback

Rollback is pretty simple!

$ rm overlay.img

Commit

The more interesting one is how to commit the changes back to the original file. Using qemu-img you just do:

$ qemu-img commit overlay.img
Image committed.
$ rm overlay.img

The changes are now contained in the original image file:

$ virt-cat -a fedora-20.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET

ACID

Have we discovered the ACID properties of disk images? Not quite.

Although the change is atomic (A)1, the disk image is consistent (C) before and after the change, and the change is durable (D)2, the final property is not satisfied.

There is no isolation (I). Because it is infeasible to resolve conflicts at the block layer where qemu-img operates, it would be guaranteed corruption if you tried this technique in parallel on the same disk image. The only way to make it work reliably is to serialize every operation on the disk image with a mutex.

1 The change is only atomic if you don’t look at the backing file for the short time that qemu-img commit runs.

2 Strictly speaking, you must call sync or fsync after the qemu-img commit in order for the change to be durable.

Leave a comment

Filed under Uncategorized

New in nbdkit: Run nbdkit as a captive process

New in nbdkit ≥ 1.1.6, you can run nbdkit as a “captive process” under external programs like qemu or guestfish. This means that nbdkit runs for as long as qemu/guestfish is running, and when they exit it cleans up and exits too.

Here is a rather involved way to boot a Fedora 20 guest:

$ virt-builder fedora-20
$ nbdkit file file=fedora-20.img \
    --run 'qemu-kvm -m 1024 -drive file=$nbd,if=virtio'

The --run parameter is what tells nbdkit to run as a captive under qemu-kvm. $nbd on the qemu command line is substituted automatically with the right nbd: URL for the port or socket that nbdkit listens on. As soon as qemu-kvm exits, nbdkit is killed and cleaned up.

Here is another example using guestfish:

$ nbdkit file file=fedora-20.img \
    --run 'guestfish --format=raw -a $nbd -i'

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Fedora release 20 (Heisenbug)
/dev/sda3 mounted on /
/dev/sda1 mounted on /boot

><fs>

The main use for this is not to run the nbdkit file plugin like this, but in conjunction with perl and python plugins, to let people easily open and edit OpenStack Glance/Cinder and other unconventional disk images.

1 Comment

Filed under Uncategorized

Masking systemd services in a guest

In the previous post I told you how to get cloud-init to work in non-cloud environments.

What if you need to disable cloud-init entirely?

With systemd services and guestfish this is easy:

$ guestfish -a disk.img -i \
     ln-sf /dev/null /etc/systemd/system/cloud-init.service

Why not use this opportunity to get rid of tmp-on-tmpfs at the same time:

$ guestfish -a disk.img -i \
     ln-sf /dev/null /etc/systemd/system/tmp.mount

Systemd’s design of mapping services to files also makes it easy to list the available services in a guest:

$ virt-ls -a /tmp/fedora-19.img -R /lib/systemd/system

Leave a comment

Filed under Uncategorized

(Not) getting Fedora 19 on the ODROID XU

Not buying the eMMC module turned out to be a mistake. You can’t buy them from regular suppliers, and I’m not even sure they come in standard sizes. So I’m using a micro SD card instead.

You can download a Fedora 19 image from the ODROID forums here, but I preferred to start with the official Fedora 19 ARM image:

$ xzcat Fedora-XFCE-armhfp-19-1-sda.raw.xz > /dev/mmcblk0

In the wonderful world of ARM there’s of course no chance that this would just work (and it doesn’t). Instead I copied the /boot files from the forum image:

$ virt-copy-out -a fedora19_armhf_odroidxu_20130927.img /boot .

To make the card bootable, it requires that the first partition is VFAT and contains the /boot files extracted from the forum image. This is pretty straightforward with guestfish:

$ guestfish -a /dev/mmcblk0
><fs> run
><fs> list-filesystems
/dev/sda1: ext3
/dev/sda2: swap
/dev/sda3: ext4
><fs> mkfs vfat /dev/sda1
><fs> mount /dev/sda1 /
><fs> copy-in /tmp/boot /
><fs> ll /
total 12868
drwxr-xr-x  3 0 0   16384 Oct 19 13:43 .
drwxr-xr-x 20 0 0    4096 Oct 19 13:42 ..
-rwxr-xr-x  1 0 0     169 Oct 19 12:14 .vmlinuz-3.10.10-200.fc19.armv7hl.hmac
-rwxr-xr-x  1 0 0     174 Oct 19 12:14 .vmlinuz-3.10.10-200.fc19.armv7hl.lpae.hmac
-rwxr-xr-x  1 0 0   88811 Oct 19 12:15 config-3.4.5
-rwxr-xr-x  1 0 0 6518534 Oct 19 12:14 initramfs-3.4.5
-rwxr-xr-x  1 0 0 6518598 Oct 19 12:14 uInitrd-3.4.5
drwxr-xr-x  2 0 0    8192 Oct 19 12:41 uboot
><fs> umount-all
><fs> exit

Note that /dev/sda1 inside libguestfs corresponds to the host /dev/mmcblk0p1, and / above is the boot partition. If you prefer you could make this clearer by using disk labels and filesystem labels.

As a general tip, the large DisplayPort is apparently useless, or at least, I couldn’t get it to do anything. (Edit: Apparently you have to edit boot.ini in a manner reminiscent of modelines from x86 circa 1995. Go ARM!)

So you have to have a micro HDMI (type D) connector and be able to plug that into a digital monitor.

The boot process is a bit complex, but explained to some degree here. I copied the bootloader from the forum image to the micro SD card I was using like this:

$ xzcat fedora19_armhf_odroidxu_20130927.img.xz |
  dd bs=512 skip=1 count=1263 of=bootloader
$ dd if=bootloader bs=512 seek=1 of=/dev/mmcblk0

Also you have to flip some seriously tiny dip switches on the motherboard in order to get it to boot from the SD card.

The result anyway is: (a) Green light (b) Fan spins around (c) No ethernet lights (d) Nothing on any display (well, of course this is ARM so what did I expect?)

4 Comments

Filed under Uncategorized

Experimental User-Mode Linux backend for libguestfs

I have just pushed an experimental User-Mode Linux (UML) backend for libguestfs ≥ 1.23.15. What this means is you can now try using UML instead of KVM, which may be more lightweight and/or faster for you.

Update: The User Mode Linux book is available as a free PDF download from the publisher here.

If your distro doesn’t ship UML, you will need to compile UML from source. This was very straightforward and took me only 5 minutes following these instructions.

You will also need to install uml_utilities (specifically we need the uml_mkcow program to work around a bug in UML).

Set LIBGUESTFS_BACKEND=uml and LIBGUESTFS_QEMU to point to the UML “linux” or “vmlinux” program that you compiled. (Note that we’re just reusing the “qemu” variable name for convenience; when using UML, qemu/KVM is not involved).

You can try using guestfish or other virt tools as normal (being an experimental backend, they may not work quite right …)

$ export LIBGUESTFS_BACKEND=uml
$ export LIBGUESTFS_QEMU=/home/rjones/d/linux/vmlinux
$ guestfish -a /tmp/test1.img

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

><fs> run
><fs> list-filesystems
/dev/ubda1: ext2
><fs> mount /dev/ubda1 /
><fs> ll /
total 17
drwxr-xr-x  3 root root  1024 Aug 11 20:46 .
drwxr-xr-x 23 1000 1000  4096 Aug 11 20:54 ..
-rw-r--r--  1 root root     0 Aug 11 18:31 foobar
-rw-r--r--  1 root root     0 Aug 11 18:35 foobarbar
-rw-r--r--  1 root root     0 Aug 11 20:46 foobarbarbar
drwx------  2 root root 12288 Aug  9 22:47 lost+found

The main restriction of the UML backend is that only raw format disks are supported, no qcow2, no NBD or other remote storage.

Leave a comment

Filed under Uncategorized

Journal support in libguestfs

New in libguestfs ≥ 1.23.11 is support for reading the systemd journal from a guest.

The support is rudimentary at the moment. It would be nice to have a guestfish journal command for easy browsing of the journal (somewhat like journalctl), but we’re not there yet.

You can use journalctl from guestfish (this is true even without the journal APIs that I just added), but it involves downloading the whole journal first so it’s rather slow:

><fs> copy-out /var/log/journal /tmp
><fs> ! journalctl -D /tmp/journal

Leave a comment

Filed under Uncategorized

Recent libguestfs improvements

Nothing earth-shattering …

Support for setting UUIDs on filesystems. In particular, virt-sysprep will now choose random UUIDs for each filesystem in the guest (previously it only did this for LVM2 objects).

There’s a new add-drive-scratch API (and equivalent scratch in guestfish) which creates a temporary drive which is automatically discarded when the libguestfs handle closes.

You can now use:

guestfish -N filename=fs

to select an alternate name for the prepared disk image instead of the old test1.img, test2.img etc.

And lots of bug fixes

Leave a comment

Filed under Uncategorized

xz plugin for nbdkit

I’ve now written an xz plugin for nbdkit (previous discussion on this blog).

This is useful if you’re building up a library of xz-compressed disk images using virt-sparsify and xz, and you want to access them without having to uncompress them.

I certainly learned a lot about the xz file format and liblzma this weekend …

The xz file format consists of multiple streams (but usually one). Each stream contains zero or more blocks of compressed data, followed by an “index”. Like zip, everything in an xz file happens from the end, so the block index is at the end of the stream (this allows xz files to be streamed when writing without needing any reverse seeks).

Crucially the index contains the offset of each block both in the actual xz file and in the uncompressed data, so once you’ve read the index from a file you can find the position of any uncompressed byte and seek to the beginning of that block and read the data. Random access!

Preparing xz files correctly is important in order to be able to get good random access performance with low memory overhead:

$ xz --list /tmp/winxp.img.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1     384  2,120.1 MiB  6,144.0 MiB  0.345  CRC64   /tmp/winxp.img.xz

A file with lots of small blocks like the above (16 MB block size) is relatively easy to seek inside. At most 16 MB of data has to be uncompressed to reach any byte.

Perhaps ironically, if your machine has lots of free memory then xz appears to choose a large block size, resulting in some one-block files. Here’s the same file when I originally compressed it for my guest library:

$ xz --list guest-library/winxp.img.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1  2,100.0 MiB  6,144.0 MiB  0.342  CRC64   guest-library/winxp.img.xz

So unfortunately you may need to recompress some of your xz files using the new xz --block-size option:

$ xz --best --block-size=$((16*1024*1024)) winxp.img

Here’s how you use the new nbdkit xz plugin:

$ nbdkit plugins/nbdkit-xz-plugin.so file=winxp.img.xz
$ guestfish --ro -a nbd://localhost -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Microsoft Windows XP
/dev/sda1 mounted on /

><fs> ll /
total 1573209
drwxrwxrwx  1 root root       4096 Apr 16  2012 .
drwxr-xr-x 23 1000 1000       4096 Jun 24 13:57 ..
-rwxrwxrwx  1 root root          0 Oct 11  2011 AUTOEXEC.BAT
-rwxrwxrwx  1 root root          0 Oct 11  2011 CONFIG.SYS
drwxrwxrwx  1 root root       4096 Oct 11  2011 Documents and Settings
-rwxrwxrwx  1 root root          0 Oct 11  2011 IO.SYS
-rwxrwxrwx  1 root root          0 Oct 11  2011 MSDOS.SYS
[...]

4 Comments

Filed under Uncategorized

qemu 1.5.0 released, with ssh block device support

qemu 1.5.0 has been released, featuring ssh support so you can access remote disks over ssh, including from libguestfs.

Here’s how to use this from guestfish:

$ export LIBGUESTFS_BACKEND=direct
$ guestfish --ro -a ssh://onuma/mnt/scratch/winxp.img -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Microsoft Windows XP
/dev/sda1 mounted on /

><fs> ll /
total 1573209
drwxrwxrwx  1 root root       4096 Apr 16  2012 .
drwxr-xr-x 23 1000 1000       4096 May 20 19:47 ..
-rwxrwxrwx  1 root root          0 Oct 11  2011 AUTOEXEC.BAT
-rwxrwxrwx  1 root root          0 Oct 11  2011 CONFIG.SYS
drwxrwxrwx  1 root root       4096 Oct 11  2011 Documents and Settings
-rwxrwxrwx  1 root root          0 Oct 11  2011 IO.SYS
-rwxrwxrwx  1 root root          0 Oct 11  2011 MSDOS.SYS
-rwxrwxrwx  1 root root      47564 Apr 14  2008 NTDETECT.COM
drwxrwxrwx  1 root root       4096 Oct 11  2011 Program Files
drwxrwxrwx  1 root root       4096 Oct 11  2011 System Volume Information
drwxrwxrwx  1 root root      28672 Oct 11  2011 WINDOWS
-rwxrwxrwx  1 root root        211 Oct 11  2011 boot.ini
-rwxrwxrwx  1 root root     250048 Apr 14  2008 ntldr
-rwxrwxrwx  1 root root 1610612736 Oct 11  2011 pagefile.sys

Leave a comment

Filed under Uncategorized