Tag Archives: gzip

Creating a modifiable gzipped disk image

Dusty Mabe set me a challenge yesterday. He wants to create several compressed disk images that have slightly different content, but are otherwise mostly the same. The disk images are large and compressing them takes a long time (30 minutes each, apparently), so ideally what we’d want to do is compress the disk image just the once and then do the updates on the gzipped image.

Modifying a file which has already been compressed is not usually possible.

However if we make some relatively uncontroversial assumptions and accept a few limitations then we can create a compressed disk image which is modifiable in this way, certainly for gzip and xz (I need to investigate zstd).

Firstly let’s assume we’re using some kind of block-based compression with fixed size blocks. This is true for gzip and xz already. Secondly let’s assume that we want to only modify a single, small partition of the image (Dusty only wants to modify the /boot partition). Lastly we’ll assume that the partition boundaries are aligned to the compression blocks. Since partitions can be placed under control of whoever creates the disk image this last one is pretty easy to arrange.

The trick is to use uncompressed blocks for the part of the input covering the partition you want to modify, and compress the rest of the file normally. Both gzip and xz have an uncompressed block type. (In fact, just about any reasonable compression algorithm works like this – if the input data doesn’t become smaller after trying to compress it, the algorithm will emit the data uncompressed since that takes less space.)

Normal tools won’t let you create files like this, but I wrote one for gzip here, and I’m confident that one could be written for xz (exercise for readers! … or me if we decide to productise this).

I created a normal Fedora 36 image using virt-builder and used guestfish to find the partition boundaries:

$ guestfish --ro -a /var/tmp/fedora-36.img -i part-list /dev/sda
[0] = {
  part_num: 1
  part_start: 1048576
  part_end: 2097151
  part_size: 1048576
}
[1] = {
  part_num: 2
  part_start: 2097152
  part_end: 1075838975
  part_size: 1073741824
}
[2] = {
  part_num: 3
  part_start: 1075838976
  part_end: 6441402367
  part_size: 5365563392
}

We will compress partition #3 while leaving partitions #1 and #2 uncompressed so they can be modified in place later. The tool is a bit crude, but:

$ ./partial-deflate /var/tmp/fedora-36.img /var/tmp/fedora-36.img.gz 0 1075838976 6441402368 

The output is a regular gzip file (albeit rather large because the first 1GB is uncompressed – if I was doing this for real I’d make that boot partition smaller):

$ ll /var/tmp/fedora-36.img*
-rw-r--r--. 1 rjones rjones 6442450944 Nov 30 17:03 /var/tmp/fedora-36.img
-rw-r--r--. 1 rjones rjones 1698849910 Dec  1 16:23 /var/tmp/fedora-36.img.gz
$ zcat /var/tmp/fedora-36.img.gz | md5sum 
57cbd5ebcbe59613378c8aee7ad9e40d  -
$ md5sum /var/tmp/fedora-36.img
57cbd5ebcbe59613378c8aee7ad9e40d  /var/tmp/fedora-36.img

Then I went ahead and modified some known content inside the gzip file (but not compressed). I used a hex editor for this, but you could play around with guestfish + nbdkit-offset-filter for something more supportable:

And the result is a gzipped file with the modifications:

$ zcat /var/tmp/fedora-36.img.gz > /var/tmp/fedora-36.img-modified
gzip: /var/tmp/fedora-36.img.gz: invalid compressed data--crc error
$ guestfish --ro -a /var/tmp/fedora-36.img-modified -i cat /boot/mydata.txt 
helloworld------

…and a CRC error. That’s to be expected, as I haven’t yet worked out how to update CRC-32 after making changes, but it should be easy to solve (with brute force if necessary).

1 Comment

Filed under Uncategorized

nbdkit for loopback pt 4: loopback-mounting compressed images

nbdkit is a pluggable NBD server with lots of plugins and filters. Two of the plugins[1] handle compressed files (for gzip and xz respectively). We can uncompress and serve a file on the fly. For gzip it’s kind of inefficient. For xz it’s very efficient provided you prepared your xz files ahead of time with a smaller than default block size.

Let’s use nbdkit to loopback mount an xz file:

$ nbdkit -fv xz file=/var/tmp/fedora-28.img.xz
# nbd-client -b 512 localhost /dev/nbd0
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 6144MB
Connected /dev/nbd0
# ls /dev/nbd0p*
/dev/nbd0p1  /dev/nbd0p2  /dev/nbd0p3  /dev/nbd0p4
# fdisk -l /dev/nbd0
Device        Start      End Sectors  Size Type
/dev/nbd0p1    2048     4095    2048    1M BIOS boot
/dev/nbd0p2    4096  2101247 2097152    1G Linux filesystem
/dev/nbd0p3 2101248  3360767 1259520  615M Linux swap
/dev/nbd0p4 3360768 12580863 9220096  4.4G Linux filesystem
# mount -o ro /dev/nbd0p4 /mnt

Of course it’s read-only. To write to a compressed file would involve changing the size of inner parts of the file. Use qcow2 compression if you want a writable compressed file (although writes to that format are not compressed).

Also loopback mounting in general is unsafe. Use libguestfs to safely mount untrusted disk images.

[1] These should really be filters, not plugins, so that you can chain an uncompression filter into an existing plugin, and one day I’ll get around to writing that.

Leave a comment

Filed under Uncategorized

Three plugins for nbdkit

So far I’ve written six plugins for nbdkit. However three of those are examples which don’t really count.

The first (more) interesting plugin is called file and it just turns nbdkit into a regular NBD server, serving files or devices. It’s almost complete, the only significant missing features being access logging and hole punching.

The second interesting plugin is called libvirt-plugin. It serves disks from libvirt guests. An example of using it can be found here.

The final interesting plugin is called gzip. It uses the zlib API to open a .gz file, exposing it uncompressed. Because zlib is a stream-oriented API it’s not very usable at the moment (especially for large images) because it has to uncompress the data stream as it’s seeking. However it may be possible to improve that by caching positions in the stream. What would be more interesting for me would be a lzma-based plugin to support xz files.

2 Comments

Filed under Uncategorized