Tag Archives: compression

Compressed RAM disks

There was a big discussion last week about whether zram swap should be the default in a future version of Fedora.

This lead me to think about the RAM disk implementation in nbdkit. In nbdkit up to 1.20 it supports giant virtual disks up to 8 exabytes using a sparse array implemented with a 2-level page table. However it’s still a RAM disk and so you can’t actually store more real data in these disks than you have available RAM (plus swap).

But what if we compressed the data? There are some fine, very fast compression libraries around nowadays — I’m using Facebook’s Zstandard — so the overhead of compression can be quite small, and this lets you make limited RAM go further.

So I implemented allocators for nbdkit ≥ 1.22, including:

$ nbdkit memory 1T allocator=zstd

Compression ratios can be really good. I tested this by creating a RAM disk and filling it with a filesystem containing text and source files, and was getting 10:1 compression. (Note that filesystems start with very regular, easily compressible metadata, so you’d expect this ratio to quickly drop if you filled the filesystem up with a lot of files).

The compression overhead is small, although the current nbdkit-memory-plugin isn’t very smart about locking so it has rather poor performance under multi-threaded loads anyway. (A fun little project to fix that for someone who loves pthread and C.)

I also implemented allocator=malloc which is a non-sparse direct-mapped RAM disk. This is simpler and a bit faster, but has rather obvious limitations compared to using the sparse allocator.

2 Comments

Filed under Uncategorized

xz plugin for nbdkit

I’ve now written an xz plugin for nbdkit (previous discussion on this blog).

This is useful if you’re building up a library of xz-compressed disk images using virt-sparsify and xz, and you want to access them without having to uncompress them.

I certainly learned a lot about the xz file format and liblzma this weekend …

The xz file format consists of multiple streams (but usually one). Each stream contains zero or more blocks of compressed data, followed by an “index”. Like zip, everything in an xz file happens from the end, so the block index is at the end of the stream (this allows xz files to be streamed when writing without needing any reverse seeks).

Crucially the index contains the offset of each block both in the actual xz file and in the uncompressed data, so once you’ve read the index from a file you can find the position of any uncompressed byte and seek to the beginning of that block and read the data. Random access!

Preparing xz files correctly is important in order to be able to get good random access performance with low memory overhead:

$ xz --list /tmp/winxp.img.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1     384  2,120.1 MiB  6,144.0 MiB  0.345  CRC64   /tmp/winxp.img.xz

A file with lots of small blocks like the above (16 MB block size) is relatively easy to seek inside. At most 16 MB of data has to be uncompressed to reach any byte.

Perhaps ironically, if your machine has lots of free memory then xz appears to choose a large block size, resulting in some one-block files. Here’s the same file when I originally compressed it for my guest library:

$ xz --list guest-library/winxp.img.xz
Strms  Blocks   Compressed Uncompressed  Ratio  Check   Filename
    1       1  2,100.0 MiB  6,144.0 MiB  0.342  CRC64   guest-library/winxp.img.xz

So unfortunately you may need to recompress some of your xz files using the new xz --block-size option:

$ xz --best --block-size=$((16*1024*1024)) winxp.img

Here’s how you use the new nbdkit xz plugin:

$ nbdkit plugins/nbdkit-xz-plugin.so file=winxp.img.xz
$ guestfish --ro -a nbd://localhost -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Microsoft Windows XP
/dev/sda1 mounted on /

><fs> ll /
total 1573209
drwxrwxrwx  1 root root       4096 Apr 16  2012 .
drwxr-xr-x 23 1000 1000       4096 Jun 24 13:57 ..
-rwxrwxrwx  1 root root          0 Oct 11  2011 AUTOEXEC.BAT
-rwxrwxrwx  1 root root          0 Oct 11  2011 CONFIG.SYS
drwxrwxrwx  1 root root       4096 Oct 11  2011 Documents and Settings
-rwxrwxrwx  1 root root          0 Oct 11  2011 IO.SYS
-rwxrwxrwx  1 root root          0 Oct 11  2011 MSDOS.SYS
[...]

8 Comments

Filed under Uncategorized