Tag Archives: minimization

libguestfs: Access and modify virtual machine disk images

Time to reveal what we’re doing with qemu, febootstrap and minimal Fedora images.

A constant problem with virtualization management is how to “see inside” the guest virtual machine. You can measure things like how much CPU it’s using or how much network traffic it generates, but people often ask how much disk space a guest is using (not allocated, but actually using)? Did the guest boot up correctly? Can you install virtio drivers in that guest? Can you add a disk to a guest and change its /etc/fstab at the same time? Can you migrate a guest from Xen to KVM, uninstall the Xen PV kernel and add the right KVM drivers?

These operations require us to fish around inside the guest's disk image, perhaps just reading the image and printing out simple stats as we do already with virt-df, but perhaps modifying the guest's disk quite substantially, such as installing drivers or running programs.

Now step aside for a minute and think about virtual machine disk images. From the host, they look like files or partitions, but they contain a partition table, partitions, perhaps LVM, a variety of filesystems like ext3 or NTFS or btrfs, and the whole lot might be wrapped in a tricky container such as qcow2. For virt-df we actually wrote new code that can decode a lot of this, but it's a massive effort to keep up with changes in the formats.

What we need to use is the Linux kernel code directly.

The way I'm going to do this is to boot a Linux kernel. A small, you might say "minimal" Linux distro, with a bit of userspace. The whole thing runs inside a qemu container, and we talk from a small library to the userspace inside qemu, giving it instructions like "edit this file", "run this program", "install this device driver".

The beauty of this is that Linux already has the drivers to handle numerous different filesystems, LVM, partition types and so on. The qemu process can decode qcow2 and a variety of other containers. We can even run an NFS server inside the guest and mount filesystems on the host. The whole thing can build and run as non-root, because you don't need to be root to run either febootstrap or qemu.

We hide the complexity inside a library, libguestfs.

Nothing much is coded at the moment. I have documented what the API will probably look like, and written a lot of code that I might throw away. However the fundamental components are there, such as febootstrap and rpmdepsize, and I've done some work upstream with making fakechroot, yum and rpm work together.

I have a hard deadline to release both a working libguestfs and some migration tools by April 8th (just 1 week away). Ideas are welcome, but I'm probably not going to really be open to contributions before next week, when I should have something working to show.

3 Comments

Filed under Uncategorized

RPM dependency size viewer now available

The interactive RPM dependency size viewer (discussed earlier here and here) has now reached a stable 1.0 version. You can grab source and x86-64 RPMs from my website.

The diagram below shows an example, ocaml-camlp4-devel, a rather large OCaml package which I’m partly responsible for.

If you were to take a fresh Fedora 10 machine and yum install ocaml-camlp4-devel, you'd need a minimum of 404 MB (I mean, including things like glibc and the filesystem and everything else that camlp4 ultimately depends on). Where does the space go?

Click to enlarge

If you look at the top of the enlarged diagram, you can see that ocaml-camlp4-devel is not a tiny package, consuming 33 MB just for itself. The packages below it, ocaml, ocaml-camlp4, ocaml-runtime and glibc are ordered by the greatest to smallest total size (size inc. all dependencies) left to right. But the width shown is the incremental size, roughly speaking meaning if we got rid of that package, how much would we save.

We can see the ocaml-runtime pulls in util-linux-ng which has a huge chain of requirements (including python, interestingly). So one area to look at would be whether that dependency can be removed or narrowed down.

If you mouse-over a particular package, the colours change to show "parent packages" (above us, so sky blue), and dependent packages (below us, so grassy green):

Click to enlarge

There are still many ways to improve this viewer. I'm now out of time on this little project, but there is a git repository (kindly supplied by Red Hat / Jim Meyering) so if you want to hack away and supply patches, please send them. If you analyzed your own RPMs, let us know in the comments.

4 Comments

Filed under Uncategorized

febootstrap “minimal” now 15.9 MB

febootstrap now includes an image minimization tool which can remove some “non-essential” data, such as locales. (You can configure exactly what it removes).

The bootable minimal install is now around 16 MB. There’s still room for improvement, for example by removing shared libraries that are never used.

filelight2

Even my realistic image, which includes LVM, NTFS, NFS server and rescue-disk utilities, isn’t too bad, weighing in comfortably under 32 MB:

filelight31

Previously minimal was 225 MB …

22 Comments

Filed under Uncategorized

Why not use a minimal distribution?

In a comment on my previous post, Zod asked:

We do you try to strip down your fedora and not just build a minimal linux distribution with tools like t2?

There’s nothing wrong at all with using one of the many minimal linux kits around. Using busybox and a replacement libc it’s possible to squeeze Linux down to a floppy disk, or into the 4 MB of flash on a router. However there are also reasons to use a mainstream distribution such as Fedora (or Debian) as a basis.

  1. We can write code against glibc, using ordinary compilers and tools.
  2. We can reuse any existing package. They don’t need to be rebuilt or compiled in a special way.
  3. We can reuse existing packaging tools, like rpm and yum.
  4. We get the benefit of all the testing, bug reports and fixes from the distribution. In particular, we won’t get our own “special” class of bugs appearing. If we see a bug, it’s likely the bug would also happen in Fedora, and we can report it back to the Fedora maintainer.
  5. Any minimization benefits we make (both size and boot speed) can be pushed back into Fedora.

I’m aiming for a small, but not super-minimal, image, in the region of 16-48 MB, containing an NFS server, LVM tools, and some tools of my own.

1 Comment

Filed under Uncategorized

Why “minimal” is 225 MB

As I mentioned in the last post a “minimal” febootstrap Fedora install clocks in at a staggering 225 MB. When I say minimal, I mean just bash and the simplest command-line tools from coreutils:

$ ls /bin
arch      chgrp  cut   echo   fgrep  ls      mv    rmdir  stty   true
basename  chmod  date  egrep  grep   mkdir   nice  sh     su     uname
bash      chown  dd    env    link   mknod   pwd   sleep  sync   unlink
cat       cp     df    false  ln     mktemp  rm    sort   touch

Where does all the space go?

Thanks to KDE's filelight tool, we can easily visualize the disk usage, in a nice interactive graphical way.

filelight

34% of the total disk space (76 MB) is taken up with a single file, /usr/lib/locale/locale-archive. We suspect this is an optional file that contains all locale information and is mapped into every glibc-using process. Since the minimal image I have in mind is non-interactive, there doesn't seem to be much point in having locales at all, and this can be deleted. Obviously if you wanted an interactive, internationalized Fedora, you can't just go and remove this file.

Another 34% is taken up with the yum cache, ie. the packages that we installed. This just needs to be deleted, and febootstrap should have an option to do this automatically.

6% (15 MB) are the locale files. As explained above, these can go.

3% (8 MB) is, extraordinarily, cracklib. It turns out that coreutils requires pam, which requires cracklib to test the strength of passwords. This is completely useless for us, because the virtual machine image won't even have a login prompt, never mind the ability to change passwords.

A further 5% is documentation, man pages and i18n stuff that we don't care about.

Just removing the above brings the image down to 38 MB. The next step will be to do some much more aggressive minimization, based on analyzing the binaries that we're actually going to use and their dependencies.

26 Comments

Filed under Uncategorized