Tip: Making a disk image sparse

Update: libguestfs ≥ 1.14 includes a new tool called virt-sparsify which can make guests sparse (thin-provisioned).

A sparse file is one where file blocks that would contain all zeroes are omitted from the file (and don’t take up any space in the filesystem). A sparse virtual disk image is the same sort of thing: blocks that the guest hasn’t written to yet are not stored by the host, and read as all zeroes. Sparse disk images can be implemented using sparse files on the host, or you can use a format like qcow2 which inherently supports sparse files.

The problem with sparse files is that they gradually grow. When a guest writes a block it is allocated, and potentially this is never freed, even if the guest deletes the file or writes all zeroes to the block. [Eventually this problem will be solved by implementing the TRIM command which lets the host know that the guest no longer requires a block, but we’re not quite there yet.]

This is of course a problem if you fill up the guest disk and then delete the files. The host file does not regain its sparseness.

How do you therefore sparsify a disk image?

There is a technique that you can use, which is simple to understand and implement, but it does require taking the guest offline.

First, fill the empty space in the guest with zeroes. A simple way to do this for a Linux guest is to run this command (run it within each guest filesystem):

dd if=/dev/zero of=zerofile bs=1M
# note that the 'dd' command fills up all free space and eventually fails
sync
rm zerofile

Now shut down the guest.

Copy the guest disk image using either qemu-img convert or cp --sparse=always. “cp” is the fastest but only works to sparsify a raw-format disk image:

cp --sparse=always guest-disk.img guest-disk-copy.img

A little-known feature of the qemu-img convert subcommand is that it automatically sparsifies any disk sector which contains all zeroes, and of course it can convert the format at the same time:

qemu-img convert -f raw -O qcow2 guest-disk.img guest-disk-copy.qcow2

Now the copy in both cases is sparsified, and hopefully a lot smaller than before.

Addendum: Instead of running “dd” by hand inside each guest, you can use the following libguestfs script to achieve the same (but note the guest must be shut down otherwise you will get disk corruption):

#!/usr/bin/perl -w
# ./phil-space.pl (disk.img|GuestName)
# Requires libguestfs >= 1.5.

use strict;
use Sys::Virt;
use Sys::Guestfs;
use Sys::Guestfs::Lib qw(open_guest);

die "$0: recent version of libguestfs >= 1.5 is required\n"
    unless defined (Sys::Guestfs->can ("list_filesystems"));

die "$0 (disk.img|GuestName)\n" unless @ARGV >= 1;

my $g = open_guest (\@ARGV, rw => 1);
$g->launch ();

my %filesystems = $g->list_filesystems ();

foreach (keys %filesystems) {
    eval {
        $g->mount_options ("", $_, "/");

        print "filling empty space in $_ with zeroes ...\n";

        my $filename = "/deleteme.tmp";
        eval { $g->dd ("/dev/zero", $filename) };
        $g->sync (); # make sure the last part of the file is written
        $g->rm ($filename);
    };
    $g->umount_all ();
}

$g->sync ()

2 Comments

Filed under Uncategorized

2 responses to “Tip: Making a disk image sparse

  1. Eli

    Hi
    Need help
    I create VM centos 7 on physical machine centos 7
    On the VM I run the command
    #dd if=/dev/zero of=zerofile bs=1M
    #sync
    #rm zerofile
    I shut down the VM and from the physical machine i run the command
    #cp –sparse=always um.img /tmp/guest.img
    but the file size is the same of the size of the guest

    • A sparse file’s size isn’t the same as its allocated size. For example, lets create a 10G sparse file:
      $ truncate -s 10G foo

      Its size is 10G:
      $ ls -lh foo
      -rw-rw-r–. 1 mbooth mbooth 10G Jan 21 10:22 foo

      However, its disk usage is zero:
      $ du -h foo
      0 foo

      Lets write 100M of data into it:
      $ dd if=/dev/urandom of=foo bs=1M count=100 conv=nocreat,notrunc

      It’s still 10G:
      $ ls -lh foo
      -rw-rw-r–. 1 mbooth mbooth 10G Jan 21 10:22 foo

      But its disk usage is now 100M:
      $ du -h foo
      100M foo

      The above technique preserves the file’s size, but reduces its disk usage.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.