BLKDISCARD, BLKZEROOUT, BLKDISCARDZEROES, BLKSECDISCARD

Recent Linux has four ioctls related to discarding blocks on block devices: BLKDISCARD, BLKZEROOUT,
BLKDISCARDZEROES, BLKSECDISCARD
. As far as I’m aware these are not documented anywhere, but this posting describes what they do and how to use them. For a good all round introduction to thin provisioning, see Paolo Bonzini’s talk from DevConf (video here).

BLKDISCARD

This is the simplest ioctl. Given a range described as offset and length (both expressed in bytes), this code:

uint64_t range[2] = { offset, length };
ioctl (fd, BLKDISCARD, range);

will tell the underlying block device (fd) that it may discard the blocks which are contained in the given byte range.

The kernel code wants you to pass a range which is aligned to 512 bytes, and there may be further restrictions on the range you can pass which you can find out about by reading /sys/block/disk/queue/discard_alignment, /sys/block/disk/queue/discard_granularity, and /sys/block/disk/queue/discard_max_bytes.

If discard_max_bytes == 0 then discard isn’t supported at all on this device.

Discard is voluntary. The device might ignore it silently. Also what you read back from the discarded blocks might not be zeroes — you might read back stale data or random data (but see below).

BLKZEROOUT

BLKZEROOUT is a bit like BLKDISCARD but it writes zeroes. The code is similar:

uint64_t range[2] = { offset, length };
ioctl (fd, BLKZEROOUT, range);

Again note that offset and length are in bytes, but the kernel wants you to pass a 512-byte aligned range.

As far as I can tell from the implementation, the kernel implements this call itself. There is no help needed from devices, nor any device-specific optimization available.

BLKDISCARDZEROES

I mentioned above that discarded blocks might read back as stale data. However some devices guarantee that discarded blocks read back as zeroes (which means, I assume, that BLKZEROOUT would not be needed on such block devices).

You can find out if the device you are currently using has this guarantee, either by reading the sysfs file /sys/block/disk/queue/discard_zeroes_data, or by using this code:

unsigned int arg;
discard_zeroes =
    ioctl (fd, BLKDISCARDZEROES, &arg) == 0 && arg;

BLKSECDISCARD

Finally secure discard tells the device that you want to do a secure erase operation on the blocks. Again, pass a byte range (which has the same alignment requirements as BLKDISCARD):

uint64_t range[2] = { offset, length };
ioctl (fd, BLKSECDISCARD, range);

The ioctl will return an error (EOPNOTSUPP) for devices which cannot do secure erase.

8 Comments

Filed under Uncategorized

8 responses to “BLKDISCARD, BLKZEROOUT, BLKDISCARDZEROES, BLKSECDISCARD

  1. dmyablonski@gmail.com

    Very timely article. I notice my CentOS 6.3 does not have blkdiscard command, while the 6.5 version does. Do you know when exactly this binary was included with Linux distros?

    • rich

      Sep 2012:

      commit d964b669c8d8675af1b7d7e1742ee8b68dc285ef
      Author: Lukas Czerner 
      Date:   Wed Sep 12 17:49:15 2012 -0400
      
          blkdiscard: add new command
          
          blkdiscard is used to discard device sectors. This is useful for
          solid-state drivers (SSDs) and thinly-provisioned storage. Unlike
          fstrim this command is used directly on the block device.
          
          blkkdiscard uses BLKDISCARD ioctl or BLKSECDISCARD ioctl for the secure
          discard.
          
          All data in the discarded region on the device will be lost!
      
  2. Dee

    Lack of documentation on Linux drives me nuts!! Looking up documentation for an ioctl is fairly easy with Windows via the msdn documentation which makes it fun to program. Thanks for documenting this – I can have some fun now!!

  3. Yaniv

    Looking at http://lxr.free-electrons.com/source/block/blk-lib.c#L273 seems like blkzeroout is smart enough to know if it can use BLKDISCARDZEROES like functionality.

  4. Yaniv

    blkdiscard(8) just got BLKZEROOUT support, using ‘-z’ or ‘–zero’.

  5. Steve

    Thanks a lot! There is so little information on the web on this subject.

    You write: “BLKZEROOUT is a bit like BLKDISCARD but it writes zeroes”

    Do I understand correctly that a kernel routine first fills the specified range with zeros and then informs the device controller (via TRIM) that the overwritten blocks can be discarded?

    • rich

      You’d probably want to look at the kernel code, but I would *hope* that it would optimize this case if possible using (eg) the SCSI “WRITE SAME” command.

  6. Steve

    I went through the kernel sources. The codebase of the BLKDISCARD, BLKSECDISCARD and BLKZEROOUT ioctls has been substantially rewritten since you posted this article.

    BLKZEROOUT ioctl: Does not issue a discard request, not even if the device would guarantee to zero out the blocks or return zeros when reading the blocks (the BLKDEV_ZERO_NOUNMAP flag is set in the function call chain). Furthermore, REQ_OP_WRITE_SAME seems to have been removed from the kernel in version 5.18 (see blk_types.h; not sure if device drivers issue WRITE SAME themselves when they get a request to zero blocks).

    BLKDISCARDZEROES ioctl: The BLKDISCARDZEROES ioctl and the sysfs file “/sys/block//queue/discard_zeroes_data” now always return zero, regardless of what the device actually supports.

    Unfortunately, the documentation of these changes is rather poor. I think this is one of the reasons why there are so many conflicting statements on the web about how BLKDISCARD etc. works. It would definitely be great and much appreciated if someone đŸ˜‰ would write an article on this topic.

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.