Why “minimal” is 225 MB

As I mentioned in the last post a “minimal” febootstrap Fedora install clocks in at a staggering 225 MB. When I say minimal, I mean just bash and the simplest command-line tools from coreutils:

$ ls /bin
arch      chgrp  cut   echo   fgrep  ls      mv    rmdir  stty   true
basename  chmod  date  egrep  grep   mkdir   nice  sh     su     uname
bash      chown  dd    env    link   mknod   pwd   sleep  sync   unlink
cat       cp     df    false  ln     mktemp  rm    sort   touch

Where does all the space go?

Thanks to KDE’s filelight tool, we can easily visualize the disk usage, in a nice interactive graphical way.

filelight

34% of the total disk space (76 MB) is taken up with a single file, /usr/lib/locale/locale-archive. We suspect this is an optional file that contains all locale information and is mapped into every glibc-using process. Since the minimal image I have in mind is non-interactive, there doesn’t seem to be much point in having locales at all, and this can be deleted. Obviously if you wanted an interactive, internationalized Fedora, you can’t just go and remove this file.

Another 34% is taken up with the yum cache, ie. the packages that we installed. This just needs to be deleted, and febootstrap should have an option to do this automatically.

6% (15 MB) are the locale files. As explained above, these can go.

3% (8 MB) is, extraordinarily, cracklib. It turns out that coreutils requires pam, which requires cracklib to test the strength of passwords. This is completely useless for us, because the virtual machine image won’t even have a login prompt, never mind the ability to change passwords.

A further 5% is documentation, man pages and i18n stuff that we don’t care about.

Just removing the above brings the image down to 38 MB. The next step will be to do some much more aggressive minimization, based on analyzing the binaries that we’re actually going to use and their dependencies.

About these ads

26 Comments

Filed under Uncategorized

26 responses to “Why “minimal” is 225 MB

  1. zod

    We do you try to strip down your fedora and not just build a minimal linux distribution with tools like t2 (www.t2-project.org)?

  2. rwmj

    That’s a very good question, and something which I want to cover in another posting.

    • sankarshan

      are you also planning to put up a kickstart with the appropriate %post so that others can check out and tweak the minimalist install ?

      • rwmj

        Sankarshan, I guess if you want kickstarts etc then you may want to look at the thincrust appliance OS tools.

        The reason I didn’t go that route is because all of those tools require root privileges, which makes it impossible to run them as part of an ordinary build (eg. ./configure; make or rpmbuild). I discussed this with Bryan Kearney, who is one of the thincrust developers.

        What I’m doing instead is to write a simple script (febootstrap-minimize which will be in the next febootstrap tarball) that performs minimization by doing rm -rf on selected directories. It is effectively the same as the %post script used in thincrust’s appliance builder tool.

        zod, next blog posting explains why I’m not using a minimal Linux kit.

  3. Pingback: Why not use a minimal distribution? « Richard WM Jones

  4. @rwmj : yep, the privilege part is something I dig. So, let’s just wait for your next post (you’ve been more than prolific compared to your standard rate of posts) :)

  5. anon

    > 34% of the total disk space (76 MB) is taken up
    > with a single file, /usr/lib/locale/locale-archive

    This seems like it could be improved. Did you file a bug on bugzilla?

    • MarcH

      A similar bug was already filed, but RedHat seemed not interested in reducing the size of this massive locale-archive file:

      https://bugzilla.redhat.com/show_bug.cgi?id=156477

      Please note that the stripping suggestion found in this bug entry does not seem possible anymore. In Fedora 10 for instance, a massive 77MB locale archive “template” (!) comes pre-built inside the RPM of glibc-common. So with Fedora 10 it seems you can only *add* your own locales, not strip any of the pre-built ones (unless you rebuild the glibc-common RPM).

      Maybe RedHat’s recent interest for appliances could somehow revert their policy on this?

      • rwmj

        OK thanks for the link to the bug. I wish I had the clout to change peoples’ minds on this, but sadly I don’t :-(

      • marc

        After some more source code and mailing-list searching I found these two nice livecd-creator tricks:

        %packages –excludedocs –instLangs=en_EN.UTF8

        These two %packages options are passed to RPM at creation time and also persisted into /etc/rpm/macros.imgcreate

        The brand new, undocumented “–instLangs” option saves about 100MB of /usr/share/locale/

        This other trick is reducing (as opposed to destroying) the “locale-archive” file. Saves about 75 uncompressed Mbytes (found in RedHat’s Thincrust ace/resources/cobbler/cobbler.ks)

        %post
        localedef –list-archive | grep -v -i en_US.utf8 | xargs localedef –delete-from-archive
        mv /usr/lib/locale/locale-archive /usr/lib/locale/locale-archive.tmpl
        /usr/sbin/build-locale-archive

      • rwmj

        Hmm for some reason WordPress.com won’t let me reply to your comment above …

        Anyway I just wanted to add, thanks for your research, and I think this is related to what Tom Callaway and Bill Rugolsky say in the comments below.

        At the moment however I’m relying mainly on “rm -rf”-ing the files I don’t need.

      • marc

        The problem with the rm -rf approach is that you never know what it can break or could break in a future revision. The tricks above get the same benefit without the risk so you can sleep better at night.

  6. Tom Callaway

    It would be interesting to see if it were possible to extend rpm with an option to only install locale files that match the system locale. We tag all of the .mo files inside the RPM package to map them to the specific locale, but we still install them…

  7. rwmj

    anon, spot: Thanks for commenting.

    No I didn’t file a bug. It’s not really clear to me that these are bugs. Fedora is normally configured to work well on desktop machines, and let’s face it, 76 MB isn’t much for a desktop machine these days.

    We could argue for a Debian-style approach where you have different levels of RPM dependencies, but the problem with that is you end up with lots of subtly different Fedora systems to test (and consequently a combinatorial explosion in bug reports). The same applies if you have “small system configuration” Fedora.

    spot’s idea of marking locales in RPM is a good one though.

    • Tom Callaway

      The thing is that we _already_ mark translation files by their local, we’ve been doing it for some time in RPM. This is hidden inside the %find_lang macro, but you can see the visible end result if you look at the gambas2 spec (it was too much for %find_lang to handle):

      http://cvs.fedora.redhat.com/viewvc/rpms/gambas2/devel/gambas2.spec?view=markup

      (Search on Translation files)

      The kicker is that aside from marking them, rpm doesn’t use them (at least not in any way that I am aware of). The trick would be to add an option to rpm to exclude unused locales in the same vein as –excludedocs.

  8. anon

    One big file for everything isn’t very modular.

    Adding a bug will highlight the problem. It doesn’t matter if you are wrong or right, opening a bug will get it in front of the developers’ eyes.

  9. Bill Rugolsky

    I haven’t looked at febootstrap yet, but one should be aware of two RPM options that one can put in /etc/rpm/macros or similar:
    %_excludedocs 1
    %_install_langs %{nil}

    The latter will eliminate most of the locale data, but is only really useful if running in the C locale instead of en_US.UTF-8.

  10. Pingback: febootstrap “minimal” now 15.9 MB « Richard WM Jones

  11. rwmj

    Thanks Bill, very useful.

  12. Maybe this is a dumb question, but what good is a non-interactive Fedora install?

  13. rwmj

    Scott, no such thing as a dumb question! The install I have in mind will be an NFS server. It’ll run inside a virtual machine (qemu, specifically) and just serve files over NFS. No one will even see the virtual machine console.

  14. Pingback: Size of RPM dependencies « Richard WM Jones

  15. Tilman Baumann

    I think if anyone needs a small fedora. He should do it right.
    Make locales extra packages. Strip out doc packages. Strip out i18n stuff as extra packages.
    In ohter words, do it right. Just removing files breaks rpm and updatability.

    And if you do it, I think many peple will thank you. Because 225 MB really is ridiculous for a minimal system.

    PS: You could even build all packages without i18n supprt. But this breaks compatibility.

    • rich

      Updatability isn’t our primary concern, because we can rebuild the appliance from scratch in under 3 minutes. However I agree with you that splitting packages to make the minimal install smaller would help Fedora.

  16. And if you do it, I think many peple will thank you. Because 225 MB really is ridiculous for a minimal system.

    Agree – this post saved me some work getting rid of the extra locales when I stripped down the OS for my Black Magic ( https://bitbucket.org/darkfader/black-magic ) Xen build

    • rich

      I guess this post is a little out of date because (shock! horror!!) we no longer spend too much time minimizing libguestfs appliances.

      But within the context of the blog posting, 225 MB is a minimal Fedora distro, using a standard kernel, standard glibc, and standard set of tools. Fedora kernel etc are not designed to fit on a floppy disk — if they did it would bring little advantage and many disadvantages for Fedora users. However libguestfs over the past 2 years has helped in many ways:

      • Fedora Rawhide always boots (because we hit it first when it doesn’t).
      • Fedora kernel always works on latest qemu.
      • qemu always boots the kernel very quickly (we see and raise regressions when this slows down)
      • Dozens of bugs in the Rawhide system are found first by libguestfs builds
      • We aggressively push back against bloat and slowness.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s