November 11, 2009

mkfs compared on different filesystems

How long does it take to mkfs a 10GB disk with all the different filesystems out there?

See my test results here using the new guestfish sparse / filesystem support. btrfs is “best” and ext3 comes off “worst”.

As a test this is interesting, but it’s not that relevant for most users — they will be most interested in how well the filesystem performs for their workload, which is not affected by mkfs time and hard to measure in general benchmarks anyway.

Update

In response to Stephen’s comment, I retested this using a memory-backed block device so there is no question about whether the host backing store affects the test:

$ for fs in ext2 ext3 ext4 xfs jfs reiserfs nilfs2 ntfs msdos btrfs hfs hfsplus gfs gfs2
    do guestfish sparse /dev/shm/test.img 10G : run : echo $fs : sfdiskM /dev/sda , : \
        time mkfs $fs /dev/sda1
    done
ext2
elapsed time: 1.45 seconds
ext3
elapsed time: 2.71 seconds
ext4
elapsed time: 2.58 seconds
xfs
elapsed time: 0.13 seconds
jfs
elapsed time: 0.27 seconds
reiserfs
elapsed time: 0.33 seconds
nilfs2
elapsed time: 0.08 seconds
ntfs
elapsed time: 2.07 seconds
msdos
elapsed time: 0.14 seconds
btrfs
elapsed time: 0.07 seconds
hfs
elapsed time: 0.17 seconds
hfsplus
elapsed time: 0.17 seconds
gfs
elapsed time: 0.84 seconds
gfs2
elapsed time: 2.76 seconds

November 10, 2009

libguestfs launch times

Indulge me while a make a “note to self” about efforts to reduce the time taken by guestfs_launch which boots up the libguestfs appliance.

Time (s) Operation
2s Create supermin appliance: This has crept up over time from originally taking about 1/5th of a second to around 2s. Needs attention. Fixed see this note about cpio blocksize and update below.
2-3s qemu startup: The time is mainly spent reading in the large -kernel and particularly -initrd files specified on the command line. The released qemu code is quite rubbish, but luckily kraxel beat me to fixing the problem with this patch.
3s BIOS waits for keypress: As discussed yesterday, I’ve posted a patch. In the meantime the qemu devs have abandoned the old bochs BIOS for SeaBIOS, which I haven’t tested yet,
3s Kernel boot time: Not very many easy wins here, since the kernel is already pretty efficient. We could try to remove some busy waits and sleeps — for example the kernel waits ¼s for the serial ports, which we don’t use.
1-2s Userspace boot time: This is mainly time spent on udev and partition detection. I have never really understood why the kernel needs to pause so long on partition detection. Not much easy meat here, but improving the speed of udev in itself would be worthwhile.

I spotted a nice tool for turning strace output into timelines.

Update — bash globbing

Having solved the cpio problem, the largest bottleneck in creating the supermin appliance becomes globbing.

It’s probably not a well-known fact, but if you do:

ls *.c *.h

then bash reads the directory twice. It treats the two globs on the command line as completely separate entities. This is not so bad for a few globs, but when we make the supermin appliance we need to do over 120 globs, which takes bash about 0.8s to complete, contributing about 10% to the overall launch time. Bash is literally reading the same directory over and over again, 50 or more times.

At the moment it’s not obvious to me how to solve this.

November 9, 2009

Quick win

I got the libguestfs launch time down from 12 seconds to 9 seconds today, 25% faster!

It turns out that the appliance’s BIOS was waiting for 3 seconds for someone to hit [F12] on the imaginary keyboard. A simple patch to bochs BIOS fixes that … This patch benefits everyone using Fedora and virtualization, since all boot times will be reduced by 3 seconds.

This is only the start of the optimizations. I’m pretty certain we can get it down to a 4 or 5 second launch time.

November 4, 2009

Petabytes? Exabytes? Why not

Frankly I’m a bit surprised this works …

$ guestfish 

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help with commands
      'quit' to quit the shell

><fs> sparse /mnt/tmp/test/test.img 1P
><fs> run
><fs> blockdev-getsize64 /dev/vda
1125899906842624

$ guestfish 

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help with commands
      'quit' to quit the shell

><fs> sparse /mnt/tmp/test/test.img 1E
><fs> run
><fs> blockdev-getsize64 /dev/vda
1152921504606846976

November 4, 2009

Terabyte virtual disks

This is fun. I added a new command to guestfish which lets you create sparse disk files. This makes it really easy to test out the limits of partitions and Linux filesystems.

Starting modestly, I tried a 1 terabyte disk:

$ guestfish

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help with commands
      'quit' to quit the shell

><fs> sparse /tmp/test.img 1T
><fs> run

The real disk image so far isn’t so big, just 4K according to “du”:

$ ll -h /tmp/test.img
-rw-rw-r-- 1 rjones rjones 1T 2009-11-04 17:52 /tmp/test.img
$ du -h /tmp/test.img
4.0K	/tmp/test.img

Let’s partition it:

><fs> sfdiskM /dev/vda ,

The partition table only uses 1 sector, so the disk image has increased to just 8K. Let’s make an ext2 filesystem on the first partition:

><fs> mkfs ext2 /dev/vda1

This command takes some time, and the sparse disk file has grown. To 17 GB, so ext2 has an approx 1.7% overhead.

We can mount the filesystem and look at it:

><fs> mount /dev/vda1 /
><fs> df-h
Filesystem            Size  Used Avail Use% Mounted on
/dev/vda1            1008G   72M  957G   1% /sysroot

Can we try this with larger and larger virtual disks? In theory yes, in practice the 1.7% overhead proves to be a problem. A 10T experiment would require a very real 170GB of local disk space, and where I was hoping to go, 100T and beyond, would be too large for my test machines.

In fact there is another limitation before we reach there. Local sparse files on my host ext4 filesystem are themselves limited to under 16T:

><fs> sparse /tmp/test.img 16T
write: File too large
><fs> sparse /tmp/test.img 15T

Although the appliance does boot with that 15T virtual disk:

><fs> blockdev-getsize64 /dev/vda
16492674416640

Update

I noticed from Wikipedia that XFS has a maximum file size of 8 exabytes – 1 byte. By creating a temporary XFS filesystem on the host, I was able to create a 256TB virtual disk:

><fs> sparse /mnt/tmp/test/test.img 256T
><fs> run
><fs> blockdev-getsize64 /dev/vda
281474976710656

Unfortunately at this point things break down. MBR partitions won’t work on such a huge disk, or at least sfdisk can’t partition it correctly.

I’m not sure what my options are at this point, but at least this is an interesting experiment in hitting limitations.

November 4, 2009

Tip: virt-win-reg: CurrentControlSet in Windows Registry

I was asked today why this command doesn’t work (this Registry key would be visible if you were inside the Windows guest):

$ virt-win-reg Win2003x32 \
  '\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Keyboard Layout\DosKeybCodes'
hivexget: \CurrentControlSet: CurrentControlSet: path element not found

It’s because CurrentControlSet (and several other “Current*” keys) are synthetic. They don’t exist in the underlying Registry “hive” (file), but are created by Windows when it is running to refer to the currently selected setting for the local user (This isn’t quite correct. For the specifics refer to the Microsoft KB article and this stackoverflow posting).

Instead you have to refer to one of the possible selections. Usually ControlSet001 will work, so:

$ virt-win-reg Win2003x32 \
  '\HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001\Control\Keyboard Layout\DosKeybCodes'
"00000402"="bg"
"00000404"="ch"
"00000405"="cz"
[etc]

November 3, 2009

guestmount and virt-inspector

I was asked on IRC what the purpose of the $(virt-inspector ...) clause is from the previous example:

$ guestmount $(virt-inspector --ro-fish /dev/vg_trick/Debian5x64) /tmp/rich

Firstly $(...) is the cool modern way to write shell `backquotes`. As well as being cool and modern, it’s also better than using backquotes because you can nest it.

What does the virt-inspector subcommand do? The output of the virt-inspector command is this, split into multiple lines just to make it easier to see:

--ro
-a /dev/vg_trick/Debian5x64
-m /dev/debian5x64/root:/
-m /dev/sda1:/boot
-m /dev/debian5x64/home:/home
-m /dev/debian5x64/tmp:/tmp
-m /dev/debian5x64/usr:/usr
-m /dev/debian5x64/var:/var

Just for comparison I want to run the guestfish commands list-partitions and lvs on the same virtual machine:

$ guestfish -a /dev/vg_trick/Debian5x64 \
    run : echo Partitions : list-partitions : echo Logical volumes : lvs
Partitions
/dev/sda1
/dev/sda2
Logical volumes
/dev/debian5x64/home
/dev/debian5x64/root
/dev/debian5x64/swap_1
/dev/debian5x64/tmp
/dev/debian5x64/usr
/dev/debian5x64/var

virt-inspector has used libguestfs to examine each mountable partition in the guest, has looked at /etc/fstab and other clues, and has decided on how the Debian guest, if running, would mount those partitions.

It then prints the guestfish-compatible -a and -m options to add the disk and mount the partitions up. guestmount uses a broadly compatible set of options, so it works too.

And finally I’d like to note that this also works well with libvirt domain names, so we could equally have written:

$ guestmount $(virt-inspector --ro-fish Debian5x64) /tmp/rich

November 3, 2009

Example: Mount a Debian guest on the host using FUSE and libguestfs

Example — mount my Debian guest on my host Fedora server, using FUSE support which we added to libguestfs today:

$ mkdir /tmp/rich
$ guestmount $(virt-inspector --ro-fish /dev/vg_trick/Debian5x64) /tmp/rich
$ cat /tmp/rich/etc/debian_version
squeeze/sid
$ cat /tmp/rich/etc/hostname
debian5x64
$ ls -l /tmp/rich/etc/apt/
total 19
-rw-r--r-- 1 root root   51 2009-05-14 18:07 apt.conf
drwxr-xr-x 2 root root 1024 2009-08-13 18:10 apt.conf.d
drwxr-xr-x 2 root root 1024 2009-08-06 14:42 preferences.d
-rw------- 1 root root    0 2009-05-14 18:04 secring.gpg
-rw-r--r-- 1 root root  669 2009-05-24 12:02 sources.list
-rw-r--r-- 1 root root    0 2009-05-14 18:04 sources.list~
drwxr-xr-x 2 root root 1024 2009-02-07 21:18 sources.list.d
-rw------- 1 root root 1200 2009-05-14 18:04 trustdb.gpg
-rw------- 1 root root 5801 2009-05-14 18:04 trusted.gpg
-rw------- 1 root root 5801 2009-05-14 18:04 trusted.gpg~
$ nautilus /tmp/rich/home/rjones/d/libguestfs/

As you can see we can browse the Debian guest with nautilus. That’s actually a git checkout of the libguestfs source that we use to verify the build on Debian:

November 3, 2009

Browsing guests using FUSE

What’s interesting about this screenshot is that I’m browsing into a guest filesystem using the GNOME file browser (nautilus), with the guest mounted using FUSE and libguestfs. You can visit directories, open files, and drag files out of the guest (but not drag them into the guest yet — we haven’t enabled file writes at the moment).

(This is /var/log from a Fedora 12 Alpha guest, displayed in a Fedora 11 host. Notice the “Free Space: 5.7 GB” label which accurately shows the amount of free space in the guest filesystem).

Update

FUSE support via the guestmount command is in libguestfs 1.0.77.