I’ve now written an xz plugin for nbdkit (previous discussion on this blog).
This is useful if you’re building up a library of xz-compressed disk images using virt-sparsify and xz, and you want to access them without having to uncompress them.
I certainly learned a lot about the xz file format and liblzma this weekend …
The xz file format consists of multiple streams (but usually one). Each stream contains zero or more blocks of compressed data, followed by an “index”. Like zip, everything in an xz file happens from the end, so the block index is at the end of the stream (this allows xz files to be streamed when writing without needing any reverse seeks).
Crucially the index contains the offset of each block both in the actual xz file and in the uncompressed data, so once you’ve read the index from a file you can find the position of any uncompressed byte and seek to the beginning of that block and read the data. Random access!
Preparing xz files correctly is important in order to be able to get good random access performance with low memory overhead:
$ xz --list /tmp/winxp.img.xz
Strms Blocks Compressed Uncompressed Ratio Check Filename
1 384 2,120.1 MiB 6,144.0 MiB 0.345 CRC64 /tmp/winxp.img.xz
A file with lots of small blocks like the above (16 MB block size) is relatively easy to seek inside. At most 16 MB of data has to be uncompressed to reach any byte.
Perhaps ironically, if your machine has lots of free memory then xz appears to choose a large block size, resulting in some one-block files. Here’s the same file when I originally compressed it for my guest library:
$ xz --list guest-library/winxp.img.xz
Strms Blocks Compressed Uncompressed Ratio Check Filename
1 1 2,100.0 MiB 6,144.0 MiB 0.342 CRC64 guest-library/winxp.img.xz
So unfortunately you may need to recompress some of your xz files using the new xz --block-size
option:
$ xz --best --block-size=$((16*1024*1024)) winxp.img
Here’s how you use the new nbdkit xz plugin:
$ nbdkit plugins/nbdkit-xz-plugin.so file=winxp.img.xz
$ guestfish --ro -a nbd://localhost -i
Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.
Type: 'help' for help on commands
'man' to read the manual
'quit' to quit the shell
Operating system: Microsoft Windows XP
/dev/sda1 mounted on /
><fs> ll /
total 1573209
drwxrwxrwx 1 root root 4096 Apr 16 2012 .
drwxr-xr-x 23 1000 1000 4096 Jun 24 13:57 ..
-rwxrwxrwx 1 root root 0 Oct 11 2011 AUTOEXEC.BAT
-rwxrwxrwx 1 root root 0 Oct 11 2011 CONFIG.SYS
drwxrwxrwx 1 root root 4096 Oct 11 2011 Documents and Settings
-rwxrwxrwx 1 root root 0 Oct 11 2011 IO.SYS
-rwxrwxrwx 1 root root 0 Oct 11 2011 MSDOS.SYS
[...]