Tag Archives: block device

FUSE mounting on top of a file

Our tool nbdfuse lets you mount an NBD block device as a file, using Linux FUSE. For example you could create a directory with a single file in it (called nbd) which contains the contents of the NBD export:

$ mkdir /var/tmp/test
$ nbdfuse /var/tmp/test --command nbdkit -s memory 1G &
$ ls -l /var/tmp/test/
total 0
 -rw-rw-rw-. 1 rjones rjones 1073741824 Nov  4 13:25 nbd
$ fusermount -u /var/tmp/test

This is cool, but wouldn’t it be nice to get rid of the directory and create the file anywhere? Recently Max Reitz found out you can mount a FUSE filesystem over a regular file.

It works! (After a few adjustments to the nbdfuse code)

$ touch /var/tmp/disk.img
$ nbdfuse /var/tmp/disk.img --command nbdkit -s memory 1G &
$ ls -l /var/tmp/disk.img
 -rw-rw-rw-. 1 rjones rjones 1073741824 Nov  4 13:29 /var/tmp/disk.img
$ fusermount -u /var/tmp/disk.img 

1 Comment

Filed under Uncategorized

BLKDISCARD, BLKZEROOUT, BLKDISCARDZEROES, BLKSECDISCARD

Recent Linux has four ioctls related to discarding blocks on block devices: BLKDISCARD, BLKZEROOUT,
BLKDISCARDZEROES, BLKSECDISCARD
. As far as I’m aware these are not documented anywhere, but this posting describes what they do and how to use them. For a good all round introduction to thin provisioning, see Paolo Bonzini’s talk from DevConf (video here).

BLKDISCARD

This is the simplest ioctl. Given a range described as offset and length (both expressed in bytes), this code:

uint64_t range[2] = { offset, length };
ioctl (fd, BLKDISCARD, range);

will tell the underlying block device (fd) that it may discard the blocks which are contained in the given byte range.

The kernel code wants you to pass a range which is aligned to 512 bytes, and there may be further restrictions on the range you can pass which you can find out about by reading /sys/block/disk/queue/discard_alignment, /sys/block/disk/queue/discard_granularity, and /sys/block/disk/queue/discard_max_bytes.

If discard_max_bytes == 0 then discard isn’t supported at all on this device.

Discard is voluntary. The device might ignore it silently. Also what you read back from the discarded blocks might not be zeroes — you might read back stale data or random data (but see below).

BLKZEROOUT

BLKZEROOUT is a bit like BLKDISCARD but it writes zeroes. The code is similar:

uint64_t range[2] = { offset, length };
ioctl (fd, BLKZEROOUT, range);

Again note that offset and length are in bytes, but the kernel wants you to pass a 512-byte aligned range.

As far as I can tell from the implementation, the kernel implements this call itself. There is no help needed from devices, nor any device-specific optimization available.

BLKDISCARDZEROES

I mentioned above that discarded blocks might read back as stale data. However some devices guarantee that discarded blocks read back as zeroes (which means, I assume, that BLKZEROOUT would not be needed on such block devices).

You can find out if the device you are currently using has this guarantee, either by reading the sysfs file /sys/block/disk/queue/discard_zeroes_data, or by using this code:

unsigned int arg;
discard_zeroes =
    ioctl (fd, BLKDISCARDZEROES, &arg) == 0 && arg;

BLKSECDISCARD

Finally secure discard tells the device that you want to do a secure erase operation on the blocks. Again, pass a byte range (which has the same alignment requirements as BLKDISCARD):

uint64_t range[2] = { offset, length };
ioctl (fd, BLKSECDISCARD, range);

The ioctl will return an error (EOPNOTSUPP) for devices which cannot do secure erase.

5 Comments

Filed under Uncategorized