Category Archives: Uncategorized

OfficeMaster

Amazon have launched WorkSpaces. Back in around 2000, I nearly launched a similar product called OfficeMaster — “your office on the net”.

Here is a web page and some screenshots I dug up:

officemaster-web

officemaster-4

officemaster-3

officemaster-2

officemaster-1

How did this work?

This was before open source high performance virtualization was available, and so what we had at the back end was a small collection of Linux servers, which you would literally log in to using a VNC browser plugin (or a native VNC client for advanced users).

Inside your user account you had a basic X window manager, KDE, StarOffice, GIMP and a web browser. We had a nice graphical welcome screen written as a Tcl/Tk app (and planned to replace it with a Flash demo).

I even patched the kernel to support 32 bit UIDs, expecting enormous numbers of users (or at least, more than 64K users). And I spent some time hardening the distro (RHL-derived) to remove obvious points of exploitation.

IIRC I reckoned that each logged in user would consume around 32MB of RAM, at a time when perhaps 512MB was the most RAM you could physically fit into a server. We planned to aggressively swap out users who disconnected. Some benefit was had because everyone was running the same binaries of KDE, StarOffice and so on, resulting in a fair amount of sharing.

Why did it fail (to launch)?

At about this time I had broadband at home, but that was pretty unusual. Most people were on 56k modems, and I did a lot of testing around peoples’ houses and it was pretty obvious that running applications over VNC was not going to be very usable. Faster broadband adoption might have saved the idea.

We also felt it was a solution in search of a problem (and I still think the same about this Amazon announcement, and also things like Cloud-based desktop apps). If your company already has PCs running Windows, why would you want to rely on an unreliable remote third party service to do what you could already do on your local machine? You’re not really saving on management costs either because you still have to license and manage your own hardware.

The pricing was also uncertain. You can fit a number of concurrent users on each machine — say 16. Office-type users tend to use their machines at the same time of day, so you can at most oversubscribe by a factor of, say, 2. That machine might have cost you £1000, plus there is a substantial cost of colocation, bandwidth and administration (remember this was before the days of Puppet, so each physical machine had to be tediously installed and managed by hand). I think we looked at charging people £20/month, which would have been a theoretical revenue of ~£7000/machine/year. I’m skipping a lot of detail here: you also needed an NFS server per several machines, a web server, database, spare servers and so on. But that’s both quite a lot of money for providing dubious value to the end user, so we never found out if the market would have supported that, and I don’t think we would have made a profit at that subscription level.

Leave a comment

Filed under Uncategorized

New tool: virt-customize

The final big feature of libguestfs 1.26 has arrived. Virt-customize is the customization bits from virt-builder, in a separate program. This lets you take any virtual machine and install packages, edit configuration files, run scripts, set passwords and so on.

One of the most requested features for virt-builder is the ability to customize templates while keeping a shared backing file, and virt-customize lets you do this.

Here’s how to use virt-customize:

$ virt-customize -a fedora-20.img \
    --update --install gcc
[   0.0] Examining the guest ...
[  37.0] Setting a random seed
[  37.0] Updating core packages
[ 238.0] Installing packages: gcc

virt-inspector has a way to list out the packages installed in a virtual machine disk image, and we can use it to show that gcc was installed:

$ virt-inspector -a fedora-20.img |
    xmlstarlet sel -t -c '//application[name="gcc"]'
<application>
        <name>gcc</name>
        <version>4.8.2</version>
        <release>7.fc20</release>
        <arch>x86_64</arch>
</application>

4 Comments

Filed under Uncategorized

Analysis of the size of libguestfs dependencies

In libguestfs ≥ 1.26 we are going to start splitting the package up into smaller dependencies. Since the full libguestfs package has lots of dependencies because it has to be able to process lots of obscure filesystems, the question is how best to split up the dependencies? We could split off, say, XFS support into a subpackage, but how do we know if that will save any space?

Given the set of dependencies, we want to know the incremental cost of adding another dependency.

We can get an exact measure of this by using supermin to build a chroot containing the set of dependencies, and a second chroot containing the set of dependencies + the additional package. Then we simply compare the sizes of the two chroots. The advantage of using supermin is that the exact same script [see end of posting] will work for Fedora and Debian/Ubuntu since supermin hides the complexity of dealing with the different package managers through its package manager abstraction.

The results of this, using the libguestfs appliance dependencies, on Fedora 20, sorted by dependency size, with my comments added:

  1. gdisk adds 25420 KB

    This is a surprising result in first place, since gdisk is a fairly small, unassuming C++ program (only ~11KLoC). My initial thought was it must be something to do with being written in C++, but I tested that and it’s not true. The real problem is that gdisk depends on libicu (a Unicode library) which adds 24.6 MB to the appliance. [Note: this issue has been fixed in Rawhide.]

  2. lvm2 adds 19432 KB

    The default disk layout of many Linux distros uses LVM so this and similar dependencies have to stay in base libguestfs.

  3. binutils adds 16604 KB

    This is a sorry tale. The one file we use from binutils is /usr/bin/strings (33KB). Unfortunately this single binary pulls in a huge dependency (even worse, it’s a development package, and this causes problems on production systems). I don’t really understand why strings is included in binutils.

  4. gfs2-utils adds 9648 KB
  5. zfs-fuse adds 5208 KB

    Split off in the proposed reorganization.

  6. ntfsprogs adds 4572 KB
  7. e2fsprogs adds 4312 KB

    Most Linux distros use ext4, and we want to support Windows out of the box, so these are included in base libguestfs.

  8. xfsprogs adds 3532 KB

    Split off in the proposed reorganization.

  9. iproute adds 3180 KB

    We use /sbin/ip to set up the network card inside the appliance. It’s a shame this “better” replacement for ifconfig is so large.

  10. tar adds 2896 KB
  11. btrfs-progs adds 2800 KB
  12. openssh-clients adds 2428 KB
  13. parted adds 2420 KB
  14. jfsutils adds 1668 KB
  15. genisoimage adds 1644 KB
  16. syslinux-extlinux adds 1420 KB
  17. augeas-libs adds 1404 KB
  18. iputils adds 1128 KB
  19. reiserfs-utils adds 1076 KB
  20. mdadm adds 1032 KB
  21. strace adds 976 KB
  22. lsof adds 972 KB
  23. vim-minimal adds 912 KB
  24. rsync adds 812 KB
  25. libldm adds 616 KB
  26. psmisc adds 592 KB
  27. nilfs-utils adds 520 KB
  28. hfsplus-tools adds 480 KB

The test script used to produce these results:

#!/bin/bash -

# NB: For this program to work, you must have the following
# packages (or as many as possible) installed locally.
pkgs='acl attr augeas-libs bash binutils bsdmainutils btrfs-progs
bzip2 coreutils cpio cryptsetup cryptsetup-luks diffutils dosfstools
e2fsprogs extlinux file findutils gawk gdisk genisoimage gfs2-utils
grep grub grub-pc gzip hfsplus hfsplus-tools hivex iproute iputils
jfsutils kernel kmod less libaugeas0 libcap libcap2 libhivex0 libldm
libpcre3 libselinux libsystemd-id128-0 libsystemd-journal0 libxml2
libyajl2 linux-image lsof lsscsi lvm2 lzop mdadm module-init-tools
mtools nilfs-utils ntfs-3g ntfsprogs openssh-clients parted pcre
procps procps-ng psmisc reiserfs-utils reiserfsprogs rsync scrub sed
strace syslinux syslinux-extlinux systemd sysvinit tar udev ufsutils
util-linux util-linux-ng vim-minimal vim-tiny xfsprogs xz xz-utils
yajl zerofree zfs-fuse'

# These are the packages (from the above list) that we want to test.
testpkgs="$pkgs"

# Helper function to construct an appliance and see how big it is.
function appliance_size
{
    set -e
    supermin --prepare -o /tmp/supermin.d "$@" >&/dev/null
    supermin --build -f chroot -o /tmp/appliance.d \
      /tmp/supermin.d >&/dev/null
    du -s /tmp/appliance.d | awk '{print $1}'
}

# Construct entire appliance to see how big that would be.
totalsize=`appliance_size $pkgs`

# Remove each package from the list in turn, and find out
# how much extra that package contributes.
for p in $testpkgs; do
    opkgs=
    for o in $pkgs; do
        if [ $o != $p ]; then opkgs="$opkgs $o"; fi
    done
    size=`appliance_size $opkgs`
    extra=$(($totalsize - $size))

    echo $p adds $extra KB
done

1 Comment

Filed under Uncategorized

Transactions with guestfish

I was asked a few days ago if libguestfs has a way to apply a group of changes to an image together. The question was really about transaction support — applying a group of changes and then committing them or doing a rollback, with the final image either containing all the changes or none of them.

Although libguestfs doesn’t support this, you can do it using libguestfs and the qemu-img tool together. This post shows you how.

First I use virt-builder to quickly get a test image that I can play with:

$ virt-builder fedora-20

We create an overlay which will store the changes until we decide to commit or rollback:

$ qemu-img create -f qcow2 -b fedora-20.img overlay.img

Now open the overlay and make your changes:

$ guestfish -a overlay.img -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Fedora release 20 (Heisenbug)
/dev/sda3 mounted on /
/dev/sda1 mounted on /boot

><fs> write-append /etc/issue.net \
    "THIS IS A CHANGE TO ISSUE.NET\n"
><fs> cat /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET
><fs> exit

The base image (fedora-20.img) is untouched, and the overlay contains the changes we made:

$ virt-cat -a fedora-20.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
$ virt-cat -a overlay.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET

Rollback

Rollback is pretty simple!

$ rm overlay.img

Commit

The more interesting one is how to commit the changes back to the original file. Using qemu-img you just do:

$ qemu-img commit overlay.img
Image committed.
$ rm overlay.img

The changes are now contained in the original image file:

$ virt-cat -a fedora-20.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET

ACID

Have we discovered the ACID properties of disk images? Not quite.

Although the change is atomic (A)1, the disk image is consistent (C) before and after the change, and the change is durable (D)2, the final property is not satisfied.

There is no isolation (I). Because it is infeasible to resolve conflicts at the block layer where qemu-img operates, it would be guaranteed corruption if you tried this technique in parallel on the same disk image. The only way to make it work reliably is to serialize every operation on the disk image with a mutex.

1 The change is only atomic if you don’t look at the backing file for the short time that qemu-img commit runs.

2 Strictly speaking, you must call sync or fsync after the qemu-img commit in order for the change to be durable.

Leave a comment

Filed under Uncategorized

My 10 minute lightning talk on virt-builder from FOSDEM 2014

image

My 10 minute lightning talk about virt-builder is available to download now (video).

Since there are a few sound problems early on in the talk, I have also created a subtitles file: Advanced_disk_image_management_with_libguestfs.srt With VLC you can just drop this file into the same directory as the video file, and VLC will automatically display the subs. With other players you might need to load the subs separately.

Leave a comment

Filed under Uncategorized

New in virt-sparsify: In place sparsification

New in virt-sparsify ≥ 1.25.44, you can now sparsify disk images without copying them, so-called in-place sparsification.

It’s easy to use:

$ virt-sparsify --in-place fedora.img
Trimming /dev/sda1 ...
Clearing Linux swap on /dev/sda2 ...
Trimming /dev/sda3 ...

Sparsify in-place operation completed with no errors.

… and much faster. However it does require very recent kernel and qemu support.

Thanks: Paolo Bonzini, Eric Sandeen & Kevin Wolf for implementing discard support and patiently helping out when we started to test and use it.

5 Comments

Filed under Uncategorized

BLKDISCARD, BLKZEROOUT, BLKDISCARDZEROES, BLKSECDISCARD

Recent Linux has four ioctls related to discarding blocks on block devices: BLKDISCARD, BLKZEROOUT,
BLKDISCARDZEROES, BLKSECDISCARD
. As far as I’m aware these are not documented anywhere, but this posting describes what they do and how to use them. For a good all round introduction to thin provisioning, see Paolo Bonzini’s talk from DevConf (video here).

BLKDISCARD

This is the simplest ioctl. Given a range described as offset and length (both expressed in bytes), this code:

uint64_t range[2] = { offset, length };
ioctl (fd, BLKDISCARD, range);

will tell the underlying block device (fd) that it may discard the blocks which are contained in the given byte range.

The kernel code wants you to pass a range which is aligned to 512 bytes, and there may be further restrictions on the range you can pass which you can find out about by reading /sys/block/disk/queue/discard_alignment, /sys/block/disk/queue/discard_granularity, and /sys/block/disk/queue/discard_max_bytes.

If discard_max_bytes == 0 then discard isn’t supported at all on this device.

Discard is voluntary. The device might ignore it silently. Also what you read back from the discarded blocks might not be zeroes — you might read back stale data or random data (but see below).

BLKZEROOUT

BLKZEROOUT is a bit like BLKDISCARD but it writes zeroes. The code is similar:

uint64_t range[2] = { offset, length };
ioctl (fd, BLKZEROOUT, range);

Again note that offset and length are in bytes, but the kernel wants you to pass a 512-byte aligned range.

As far as I can tell from the implementation, the kernel implements this call itself. There is no help needed from devices, nor any device-specific optimization available.

BLKDISCARDZEROES

I mentioned above that discarded blocks might read back as stale data. However some devices guarantee that discarded blocks read back as zeroes (which means, I assume, that BLKZEROOUT would not be needed on such block devices).

You can find out if the device you are currently using has this guarantee, either by reading the sysfs file /sys/block/disk/queue/discard_zeroes_data, or by using this code:

unsigned int arg;
discard_zeroes =
    ioctl (fd, BLKDISCARDZEROES, &arg) == 0 && arg;

BLKSECDISCARD

Finally secure discard tells the device that you want to do a secure erase operation on the blocks. Again, pass a byte range (which has the same alignment requirements as BLKDISCARD):

uint64_t range[2] = { offset, length };
ioctl (fd, BLKSECDISCARD, range);

The ioctl will return an error (EOPNOTSUPP) for devices which cannot do secure erase.

2 Comments

Filed under Uncategorized

Tip: Old supermin, new libguestfs and v.v.

Problem:

You want to compile libguestfs ≥ 1.25.38, but your distro only has old supermin 4.

Solution:

Compile supermin from source. Note do not install it!

git clone https://github.com/libguestfs/supermin supermin5
cd supermin5
./autogen.sh
make

Create a file called localenv in the libguestfs build directory with the following content:

export SUPERMIN=/path/to/supermin5/src/supermin

and a file localconfigure containing:

source localenv
./configure "$@"
chmod +x localconfigure

Rebuild libguestfs as normal, except you use ./localconfigure instead of ./configure


Problem:

You want to compile libguestfs ≤ 1.24, but you’ve installed new supermin 5.

Solution:

Compile supermin 4 from source. Note do not install it!

git clone -b supermin-4.x \
    https://github.com/libguestfs/supermin supermin4
cd supermin4
./autogen.sh
make

Create a file called localenv in the libguestfs build directory with the following content:

export SUPERMIN=/path/to/supermin4/src/supermin
export SUPERMIN_HELPER=/path/to/supermin4/helper/supermin-helper

and a file localconfigure containing:

source localenv
./configure "$@"
chmod +x localconfigure

Rebuild libguestfs as normal, except you use ./localconfigure instead of ./configure

Leave a comment

Filed under Uncategorized

Supermin version 5

Recently myself with help from Pino Toscano and Hilko Bengen rewrote supermin [git repo] to make it more featureful and robust. Supermin is a clever tool that lets you distribute very tiny (< 100K) Linux appliances, which are reconstituted to full appliances just before they run.

wallpaper-batman-1966-movie-un

There’s an Adam West-era Batman film where the UN Security Council is dehydrated by The Penguin, leaving little piles of powder. Batman (sorry to reveal the ending here!) saves the day by adding water and rehydrating the UN. Supermin works in much the same way — by observing that you don’t need to store (eg) /bin/bash in the appliance, since /bin/bash already exists on the host. By just storing a pointer to /bin/bash instead, you get the amazing compression ratios. Just add water.

Packages

Supermin 4 (the previous version) stored a list of filenames that would be copied into the full appliance. This was somewhat fragile if the host distro changed, eg. moving files around.

Supermin 5 stores the list of package names, and it resolves the dependencies and filenames using the RPM/dpkg/other package manager database just before reconstituting the full appliance. This solves the fragility problem completely. It also means we are now able to split libguestfs dependencies as described here.

Locking and caching

The other part of supermin which made it difficult to use in practice was locking, or rather the absence of locking. If you wanted to use supermin 4 from multiple threads, or multiple processes, you had the problem that they could race to build the full appliance potentially overwriting each other’s output. So you had to implement some sort of locking in the higher layers. And you also had to work out yourself if the full appliance needed to be rebuilt at all or if you could use a cached copy.

In supermin 5, locking and caching is now managed entirely by supermin itself and all the caller has to do is to pass two simple command line arguments:

supermin --build \
    --if-newer \
    --lock /run/lock/supermin.lock \
    [...]

and supply the location of a lock file. This also works if the application is multithreaded, and if the application wants to build multiple appliances (you supply a different lock file per appliance).

Chroots

Supermin used to be called “febootstrap”. That was a bad, confusing name. However one feature that febootstrap had was the ability to build chroots, which we dropped in supermin. It turns out that lots of people liked building chroots for containers — indeed it is even the recommended way to build RHEL/CentOS 6 chroots for Docker.

Supermin 5 can build chroots again by selecting the --format chroot output format. Here’s how you could build a chroot from scratch:

$ supermin --prepare -o /tmp/supermin.d \
    bash coreutils
$ supermin --build --format chroot \
    -o /tmp/appliance.d /tmp/supermin.d

Tarballs, hostfiles, excludefiles

Supermin has always allowed you to customize the full appliance by specifying extra static files or host-copied files in the supermin appliance. However in supermin 4 you had to use a specially formatted cpio file to do this.

In supermin 5 you just use a regular tarball, eg:

$ tar zcf /tmp/supermin.d/init.tar.gz ./init

Supermin 5 also lets you specify files to be copied from the host. Just create a list of wildcards, one per line:

$ cat > /tmp/supermin.d/hostfiles <<EOF
/usr/share/augeas/lenses/*.aug
EOF

Additionally you can specify a list of files to be excluded from the full appliance, which is useful for dropping documentation and other irrelevant stuff:

$ cat > /tmp/supermin.d/excludefiles <<EOF
-/usr/share/doc/*
-/usr/share/info/*
-/usr/share/man/*
EOF

More information

For more information about supermin, read the supermin(1) manual page online. You can also build it from the git repository, and it is available in Fedora Rawhide and Debian/experimental.

Supermin 5 will be required for libguestfs ≥ 1.26 when it is released shortly.

4 Comments

Filed under Uncategorized

Two Freescale Freedom Boards (ARM Cortex M0)

image

The total cost including tax and delivery for two boards was £26, so at £13 each they are pretty reasonable. Of course you don’t get very much, just a few KB of flash and RAM. Enough to run a hand-written assembly program, small C programs, or a FORTH interpreter. The Cortex-M0 is a real 32 bit processor.

4 Comments

March 4, 2014 · 1:42 pm