Tag Archives: xz

Creating a modifiable gzipped disk image

Dusty Mabe set me a challenge yesterday. He wants to create several compressed disk images that have slightly different content, but are otherwise mostly the same. The disk images are large and compressing them takes a long time (30 minutes each, apparently), so ideally what we’d want to do is compress the disk image just the once and then do the updates on the gzipped image.

Modifying a file which has already been compressed is not usually possible.

However if we make some relatively uncontroversial assumptions and accept a few limitations then we can create a compressed disk image which is modifiable in this way, certainly for gzip and xz (I need to investigate zstd).

Firstly let’s assume we’re using some kind of block-based compression with fixed size blocks. This is true for gzip and xz already. Secondly let’s assume that we want to only modify a single, small partition of the image (Dusty only wants to modify the /boot partition). Lastly we’ll assume that the partition boundaries are aligned to the compression blocks. Since partitions can be placed under control of whoever creates the disk image this last one is pretty easy to arrange.

The trick is to use uncompressed blocks for the part of the input covering the partition you want to modify, and compress the rest of the file normally. Both gzip and xz have an uncompressed block type. (In fact, just about any reasonable compression algorithm works like this – if the input data doesn’t become smaller after trying to compress it, the algorithm will emit the data uncompressed since that takes less space.)

Normal tools won’t let you create files like this, but I wrote one for gzip here, and I’m confident that one could be written for xz (exercise for readers! … or me if we decide to productise this).

I created a normal Fedora 36 image using virt-builder and used guestfish to find the partition boundaries:

$ guestfish --ro -a /var/tmp/fedora-36.img -i part-list /dev/sda
[0] = {
  part_num: 1
  part_start: 1048576
  part_end: 2097151
  part_size: 1048576
}
[1] = {
  part_num: 2
  part_start: 2097152
  part_end: 1075838975
  part_size: 1073741824
}
[2] = {
  part_num: 3
  part_start: 1075838976
  part_end: 6441402367
  part_size: 5365563392
}

We will compress partition #3 while leaving partitions #1 and #2 uncompressed so they can be modified in place later. The tool is a bit crude, but:

$ ./partial-deflate /var/tmp/fedora-36.img /var/tmp/fedora-36.img.gz 0 1075838976 6441402368 

The output is a regular gzip file (albeit rather large because the first 1GB is uncompressed – if I was doing this for real I’d make that boot partition smaller):

$ ll /var/tmp/fedora-36.img*
-rw-r--r--. 1 rjones rjones 6442450944 Nov 30 17:03 /var/tmp/fedora-36.img
-rw-r--r--. 1 rjones rjones 1698849910 Dec  1 16:23 /var/tmp/fedora-36.img.gz
$ zcat /var/tmp/fedora-36.img.gz | md5sum 
57cbd5ebcbe59613378c8aee7ad9e40d  -
$ md5sum /var/tmp/fedora-36.img
57cbd5ebcbe59613378c8aee7ad9e40d  /var/tmp/fedora-36.img

Then I went ahead and modified some known content inside the gzip file (but not compressed). I used a hex editor for this, but you could play around with guestfish + nbdkit-offset-filter for something more supportable:

And the result is a gzipped file with the modifications:

$ zcat /var/tmp/fedora-36.img.gz > /var/tmp/fedora-36.img-modified
gzip: /var/tmp/fedora-36.img.gz: invalid compressed data--crc error
$ guestfish --ro -a /var/tmp/fedora-36.img-modified -i cat /boot/mydata.txt 
helloworld------

…and a CRC error. That’s to be expected, as I haven’t yet worked out how to update CRC-32 after making changes, but it should be easy to solve (with brute force if necessary).

1 Comment

Filed under Uncategorized

libnbd + FUSE = nbdfuse

I’ve talked before about libnbd, our NBD client library. New in libnbd 1.2 is a tool called nbdfuse which lets you turn NBD servers into virtual files.

A couple of weeks ago I mentioned you can use libnbd as a C library to edit qcow2 files. Now you can turn qcow2 files into virtual raw files:

$ mkdir dir
$ nbdfuse dir/file.raw \
      --socket-activation qemu-nbd -f qcow2 file.qcow2
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 1073741824 Jan  1 10:10 file.raw

Reads and writes to file.raw are backed by the original qcow2 file which is updated in real time.

Another fun thing to do is to use nbdkit, xz filter and curl to turn xz-compressed remote disk images into uncompressed local files:

$ mkdir dir
$ nbdfuse dir/disk.img \
      --command nbdkit -s curl --filter=xz \
                       http://builder.libguestfs.org/fedora-30.xz
$ ls -l dir/
total 0
-rw-rw-rw-. 1 nbd nbd 6442450944 Jan  1 10:10 disk.img
$ file dir/disk.img
dir/disk.img: DOS/MBR boot sector
$ qemu-system-x86_64 -m 4G \
      -drive file=dir/disk.img,format=raw,if=virtio,snapshot=on

1 Comment

Filed under Uncategorized

nbdkit + xz + curl

I’ve submitted a talk about nbdkit, our flexible, pluggable NBD server, to FOSDEM next year about how you can use nbdkit as a replacement for loopback mounts (or “loop mounts” as I was told off for not calling them last week). In preparation for that talk I ran through it in private to a small Red Hat audience on Monday. If I can I will release that video some time, but I may have to edit out Red Hat “super-secret” stuff first (or most likely not because there aren’t any secrets in it, but I’m still waiting for the internal video to be released).

Anyway this attracted a lot of interest and one question that was asked was why the xz plugin which lets you transparently open and uncompress XZ files on the fly was a plugin at all. Surely it would make more sense for it to be a filter? So it could be used not just to uncompress local files, but also xz-compressed cloud images over HTTPS.

The answer is yes it would! So I fixed it. XZ is now a filter (the plugin is left around but we’ll deprecate it eventually).

You can use it on top of the file plugin, curl plugin or other plugins:

$ nbdkit --filter=xz file file.xz
$ nbdkit --filter=xz curl https://download.fedoraproject.org/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.raw.xz

This is fun and you can use this to boot the cloud image entirely remotely:

$ qemu-system-x86_64 -machine accel=kvm:tcg \
    -cpu host -m 2048 \
    -drive file=nbd:localhost:10809,if=virtio

However it’s incredibly slow. One problem is that the Fedora mirror sites aren’t very happy about you issuing lots of small HTTP Range requests and I observed that they throttle the connection quite aggressively. The second problem is that the xz block size for these cloud images is too large.

The XZ format (or rather, LZMA format) is divided into streams and blocks. We don’t normally use streams, and many XZ files use a single block. But it’s possible to tell the xz program to use a smaller than default block size, and in that case the output is divided into indexed blocks. Note the block size applies to the uncompressed input, the compressed blocks will have varying sizes, but the index that is created lets us find the block boundaries easily. When a byte is requested we can use a binary search to take us quickly to the compressed block, uncompress it (and cache it), and answer the request. We will only uncompress at most one block instead of the whole file.

For disk images I normally advocate a 16M block size. The current cloud images use (I think) a 192M block size, so both a huge amount of data has to be read over HTTPS to read one uncompressed byte, plus we have to cache very large blocks in RAM.

As an experiment I recompressed the cloud image using xz --block-size=$((16 * 1024 * 1024)) and hosted it locally, and booting is much quicker (albeit still slow because the cloud image contains cloud-init).

But even better we already ship a variety of disk images compressed with a 16M block size for virt-builder here, and these can be booted directly too:

$ nbdkit -U - --filter=xz curl \
        http://builder.libguestfs.org/fedora-29.xz \
        --run \
    'qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -m 2048 -drive file=$nbd,if=virtio'

… although you can’t log in because they all have locked root accounts (virt-builder normally customizes them after download).

3 Comments

Filed under Uncategorized

nbdkit for loopback pt 4: loopback-mounting compressed images

nbdkit is a pluggable NBD server with lots of plugins and filters. Two of the plugins[1] handle compressed files (for gzip and xz respectively). We can uncompress and serve a file on the fly. For gzip it’s kind of inefficient. For xz it’s very efficient provided you prepared your xz files ahead of time with a smaller than default block size.

Let’s use nbdkit to loopback mount an xz file:

$ nbdkit -fv xz file=/var/tmp/fedora-28.img.xz
# nbd-client -b 512 localhost /dev/nbd0
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 6144MB
Connected /dev/nbd0
# ls /dev/nbd0p*
/dev/nbd0p1  /dev/nbd0p2  /dev/nbd0p3  /dev/nbd0p4
# fdisk -l /dev/nbd0
Device        Start      End Sectors  Size Type
/dev/nbd0p1    2048     4095    2048    1M BIOS boot
/dev/nbd0p2    4096  2101247 2097152    1G Linux filesystem
/dev/nbd0p3 2101248  3360767 1259520  615M Linux swap
/dev/nbd0p4 3360768 12580863 9220096  4.4G Linux filesystem
# mount -o ro /dev/nbd0p4 /mnt

Of course it’s read-only. To write to a compressed file would involve changing the size of inner parts of the file. Use qcow2 compression if you want a writable compressed file (although writes to that format are not compressed).

Also loopback mounting in general is unsafe. Use libguestfs to safely mount untrusted disk images.

[1] These should really be filters, not plugins, so that you can chain an uncompression filter into an existing plugin, and one day I’ll get around to writing that.

Leave a comment

Filed under Uncategorized

Parallel xzcat .. causes virt-builder to slow down

Eric Sandeen solved the mystery. See the end.

If you look at the ordinary output of virt-builder:

$ virt-builder ubuntu-12.04 
[   1.0] Downloading: http://libguestfs.org/download/builder/ubuntu-12.04.xz
[   3.0] Creating disk image: ubuntu-12.04.img
[   3.0] Uncompressing: http://libguestfs.org/download/builder/ubuntu-12.04.xz
[  29.0] Running virt-resize to expand the disk to 4.2G
[  86.0] Opening the new disk
[  89.0] Setting a random seed
[  89.0] Random root password: 2oMCHc3GaqoPKpXf [did you mean to use --root-password?]
[  89.0] Finishing off
Output: ubuntu-12.04.img
Total usable space: 3.1G
Free space: 2.4G (75%)

The (in bold) uncompressing step takes 26 seconds or so. This is running xzcat which is single threaded.

This afternoon I wrote a parallel pxzcat tool. On its own, it’s about 3½ times faster on 4 cores, so that’s pretty scalable.

Unfortunately, although the compression step goes faster, virt-builder actually slows down:

$ virt-builder ubuntu-12.04 
[   1.0] Downloading: file:///home/rjones/d/libguestfs/builder/website/ubuntu-12.04.xz
[   1.0] Creating disk image: ubuntu-12.04.img
[   2.0] Uncompressing: file:///home/rjones/d/libguestfs/builder/website/ubuntu-12.04.xz
[  12.0] Running virt-resize to expand the disk to 4.2G
[  86.0] Opening the new disk
[ 100.0] Setting a random seed
[ 100.0] Random root password: KQIhD8gKxksbC36k [did you mean to use --root-password?]
[ 100.0] Finishing off
Output: ubuntu-12.04.img
Total usable space: 3.1G
Free space: 2.4G (75%)

This result is quite reproducible.

(Also we should run virt-resize at all if the user didn’t request a --size argument).

Update:

Thanks to Eric Sandeen who solved the problem. It turns out to be a heuristic in ext4 (auto_da_alloc), which causes a file that has been truncated to be flushed out on close.

2 Comments

Filed under Uncategorized

xz plugin for nbdkit

I’ve now written an xz plugin for nbdkit (previous discussion on this blog).

This is useful if you’re building up a library of xz-compressed disk images using virt-sparsify and xz, and you want to access them without having to uncompress them.

I certainly learned a lot about the xz file format and liblzma this weekend …

The xz file format consists of multiple streams (but usually one). Each stream contains zero or more blocks of compressed data, followed by an “index”. Like zip, everything in an xz file happens from the end, so the block index is at the end of the stream (this allows xz files to be streamed when writing without needing any reverse seeks).

Crucially the index contains the offset of each block both in the actual xz file and in the uncompressed data, so once you’ve read the index from a file you can find the position of any uncompressed byte and seek to the beginning of that block and read the data. Random access!

Preparing xz files correctly is important in order to be able to get good random access performance with low memory overhead:

$ xz --list /tmp/winxp.img.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1     384  2,120.1 MiB  6,144.0 MiB  0.345  CRC64   /tmp/winxp.img.xz

A file with lots of small blocks like the above (16 MB block size) is relatively easy to seek inside. At most 16 MB of data has to be uncompressed to reach any byte.

Perhaps ironically, if your machine has lots of free memory then xz appears to choose a large block size, resulting in some one-block files. Here’s the same file when I originally compressed it for my guest library:

$ xz --list guest-library/winxp.img.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1  2,100.0 MiB  6,144.0 MiB  0.342  CRC64   guest-library/winxp.img.xz

So unfortunately you may need to recompress some of your xz files using the new xz --block-size option:

$ xz --best --block-size=$((16*1024*1024)) winxp.img

Here’s how you use the new nbdkit xz plugin:

$ nbdkit plugins/nbdkit-xz-plugin.so file=winxp.img.xz
$ guestfish --ro -a nbd://localhost -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Microsoft Windows XP
/dev/sda1 mounted on /

><fs> ll /
total 1573209
drwxrwxrwx  1 root root       4096 Apr 16  2012 .
drwxr-xr-x 23 1000 1000       4096 Jun 24 13:57 ..
-rwxrwxrwx  1 root root          0 Oct 11  2011 AUTOEXEC.BAT
-rwxrwxrwx  1 root root          0 Oct 11  2011 CONFIG.SYS
drwxrwxrwx  1 root root       4096 Oct 11  2011 Documents and Settings
-rwxrwxrwx  1 root root          0 Oct 11  2011 IO.SYS
-rwxrwxrwx  1 root root          0 Oct 11  2011 MSDOS.SYS
[...]

8 Comments

Filed under Uncategorized

Three plugins for nbdkit

So far I’ve written six plugins for nbdkit. However three of those are examples which don’t really count.

The first (more) interesting plugin is called file and it just turns nbdkit into a regular NBD server, serving files or devices. It’s almost complete, the only significant missing features being access logging and hole punching.

The second interesting plugin is called libvirt-plugin. It serves disks from libvirt guests. An example of using it can be found here.

The final interesting plugin is called gzip. It uses the zlib API to open a .gz file, exposing it uncompressed. Because zlib is a stream-oriented API it’s not very usable at the moment (especially for large images) because it has to uncompress the data stream as it’s seeking. However it may be possible to improve that by caching positions in the stream. What would be more interesting for me would be a lzma-based plugin to support xz files.

2 Comments

Filed under Uncategorized

Using virt-sparsify and xz, we can really compress VMs for storage

17G	debian5x64.img.orig

The original image has about 4.7 GB of data, plus a large swap partition, according to virt-df:

$ virt-df -a debian5x64.img.orig -h
Filesystem                        Size       Used  Available  Use%
debian5x64.img.orig:/dev/sda1     322M        66M       239M   21%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/home
                                  3.4G       359M       2.9G   11%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/root
                                  320M       301M       2.4M   95%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/tmp
                                  300M       8.2M       276M    3%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/usr
                                  3.4G       2.0G       1.2G   60%
debian5x64.img.orig:/dev/debian5x64.home.annexia.org/var
                                  2.6G       2.0G       536M   76%

Using virt-sparsify, all unused space in the image is made sparse. The most recent version can sparsify swap partitions too:

4.6G	debian5x64.img

xz --best -T 0 reduces the final image to under a gigabyte:

971M	debian5x64.img.xz

5 Comments

Filed under Uncategorized