Analysis of the size of libguestfs dependencies

In libguestfs ≥ 1.26 we are going to start splitting the package up into smaller dependencies. Since the full libguestfs package has lots of dependencies because it has to be able to process lots of obscure filesystems, the question is how best to split up the dependencies? We could split off, say, XFS support into a subpackage, but how do we know if that will save any space?

Given the set of dependencies, we want to know the incremental cost of adding another dependency.

We can get an exact measure of this by using supermin to build a chroot containing the set of dependencies, and a second chroot containing the set of dependencies + the additional package. Then we simply compare the sizes of the two chroots. The advantage of using supermin is that the exact same script [see end of posting] will work for Fedora and Debian/Ubuntu since supermin hides the complexity of dealing with the different package managers through its package manager abstraction.

The results of this, using the libguestfs appliance dependencies, on Fedora 20, sorted by dependency size, with my comments added:

  1. gdisk adds 25420 KB

    This is a surprising result in first place, since gdisk is a fairly small, unassuming C++ program (only ~11KLoC). My initial thought was it must be something to do with being written in C++, but I tested that and it’s not true. The real problem is that gdisk depends on libicu (a Unicode library) which adds 24.6 MB to the appliance. [Note: this issue has been fixed in Rawhide.]

  2. lvm2 adds 19432 KB

    The default disk layout of many Linux distros uses LVM so this and similar dependencies have to stay in base libguestfs.

  3. binutils adds 16604 KB

    This is a sorry tale. The one file we use from binutils is /usr/bin/strings (33KB). Unfortunately this single binary pulls in a huge dependency (even worse, it’s a development package, and this causes problems on production systems). I don’t really understand why strings is included in binutils.

  4. gfs2-utils adds 9648 KB
  5. zfs-fuse adds 5208 KB

    Split off in the proposed reorganization.

  6. ntfsprogs adds 4572 KB
  7. e2fsprogs adds 4312 KB

    Most Linux distros use ext4, and we want to support Windows out of the box, so these are included in base libguestfs.

  8. xfsprogs adds 3532 KB

    Split off in the proposed reorganization.

  9. iproute adds 3180 KB

    We use /sbin/ip to set up the network card inside the appliance. It’s a shame this “better” replacement for ifconfig is so large.

  10. tar adds 2896 KB
  11. btrfs-progs adds 2800 KB
  12. openssh-clients adds 2428 KB
  13. parted adds 2420 KB
  14. jfsutils adds 1668 KB
  15. genisoimage adds 1644 KB
  16. syslinux-extlinux adds 1420 KB
  17. augeas-libs adds 1404 KB
  18. iputils adds 1128 KB
  19. reiserfs-utils adds 1076 KB
  20. mdadm adds 1032 KB
  21. strace adds 976 KB
  22. lsof adds 972 KB
  23. vim-minimal adds 912 KB
  24. rsync adds 812 KB
  25. libldm adds 616 KB
  26. psmisc adds 592 KB
  27. nilfs-utils adds 520 KB
  28. hfsplus-tools adds 480 KB

The test script used to produce these results:

#!/bin/bash -

# NB: For this program to work, you must have the following
# packages (or as many as possible) installed locally.
pkgs='acl attr augeas-libs bash binutils bsdmainutils btrfs-progs
bzip2 coreutils cpio cryptsetup cryptsetup-luks diffutils dosfstools
e2fsprogs extlinux file findutils gawk gdisk genisoimage gfs2-utils
grep grub grub-pc gzip hfsplus hfsplus-tools hivex iproute iputils
jfsutils kernel kmod less libaugeas0 libcap libcap2 libhivex0 libldm
libpcre3 libselinux libsystemd-id128-0 libsystemd-journal0 libxml2
libyajl2 linux-image lsof lsscsi lvm2 lzop mdadm module-init-tools
mtools nilfs-utils ntfs-3g ntfsprogs openssh-clients parted pcre
procps procps-ng psmisc reiserfs-utils reiserfsprogs rsync scrub sed
strace syslinux syslinux-extlinux systemd sysvinit tar udev ufsutils
util-linux util-linux-ng vim-minimal vim-tiny xfsprogs xz xz-utils
yajl zerofree zfs-fuse'

# These are the packages (from the above list) that we want to test.
testpkgs="$pkgs"

# Helper function to construct an appliance and see how big it is.
function appliance_size
{
    set -e
    supermin --prepare -o /tmp/supermin.d "$@" >&/dev/null
    supermin --build -f chroot -o /tmp/appliance.d \
      /tmp/supermin.d >&/dev/null
    du -s /tmp/appliance.d | awk '{print $1}'
}

# Construct entire appliance to see how big that would be.
totalsize=`appliance_size $pkgs`

# Remove each package from the list in turn, and find out
# how much extra that package contributes.
for p in $testpkgs; do
    opkgs=
    for o in $pkgs; do
        if [ $o != $p ]; then opkgs="$opkgs $o"; fi
    done
    size=`appliance_size $opkgs`
    extra=$(($totalsize - $size))

    echo $p adds $extra KB
done

1 Comment

Filed under Uncategorized

One response to “Analysis of the size of libguestfs dependencies

  1. Pingback: libguestfs 1.26 released | Richard WM Jones

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s