Tag Archives: blkid

udev unexpectedness

This was unexpected:

Write something to a partition device (eg. /dev/vda1) and immediately call blockdev --rereadpt /dev/vda to re-read the partition table of the whole device. Sometimes (about 50% for me) the blockdev command fails with:

blockdev: BLKRRPART: Device or resource busy

Nothing else is using /dev/vda, nothing from it was mounted, and the error was intermittent which indicates a race condition.

Why this happens:

udev has a rule that runs blkid -o udev -p /dev/vda1. It does this every time you close a block device so that blkid can rescan the content of the device.

The act of blkid running very briefly behind our backs causes the device to be open during the blockdev operation, causing it to fail.

Adding udevadm settle between the close and the blockdev fixed the problem for us, although this command is also inherently racy (what happens if it runs before the kernel has sent a message to udev?)

Advertisements

2 Comments

Filed under Uncategorized

New in libguestfs — export blkid info

We’ve been using blkid successfully in libguestfs for a while, but in a piecemeal fashion. New in libguestfs ≥ 1.15.9 is that you can now get all the information that blkid knows about any device:

><fs> blkid /dev/vg_f15x32/lv_root
UUID: 9293385f-3200-4694-8f4b-e20bb8d73c37
VERSION: 1.0
TYPE: ext4
USAGE: filesystem
MINIMUM_IO_SIZE: 512
PHYSICAL_SECTOR_SIZE: 512
LOGICAL_SECTOR_SIZE: 512
><fs> blkid /dev/vda1
UUID: d2cd4319-f515-4be2-9a5c-fc8b57b53723
VERSION: 1.0
TYPE: ext4
USAGE: filesystem
MINIMUM_IO_SIZE: 512
PHYSICAL_SECTOR_SIZE: 512
LOGICAL_SECTOR_SIZE: 512
PART_ENTRY_SCHEME: dos
PART_ENTRY_TYPE: 0x83
PART_ENTRY_FLAGS: 0x80
PART_ENTRY_NUMBER: 1
PART_ENTRY_OFFSET: 2048
PART_ENTRY_SIZE: 1024000
PART_ENTRY_DISK: 252:0

Thanks to Wanlong Gao [awesome photo!] for adding this.

Leave a comment

Filed under Uncategorized

How does mount load the right kernel module?

On any recent Linux distro, you can mount any filesystem type directly. For example:

# dd if=/dev/zero of=/tmp/test.img bs=4k count=4096
# mkfs.xfs /tmp/test.img
# mount -v -o loop /tmp/test.img /mnt/tmp

The mount command works even though I didn’t have the xfs.ko kernel module loaded, and I didn’t tell mount that it’s xfs.

How does it do that? I asked around several people at work and no one could give me the correct answer. So in this article I’ll describe exactly how it works.

First of all, I’ll mention two wrong answers to this: (a) the kernel doesn’t have a magic “mount any filesystem” syscall, and (b) it’s nothing to do with either /proc/filesystems or /etc/filesystems.

For (a), the mount(2) syscall clearly takes a filesystem type (string). As for (b), /proc/filesystems only lists filesystems which are known to the kernel already, ie. ones for which we’ve already loaded the right module. Since I didn’t have the xfs module loaded, xfs wasn’t listed in /proc/filesystems before I ran the mount command.

This should be enough of a clue that there must be some utility in userspace which knows how to probe the type by just looking at the header of any arbitrary filesystem. This utility is blkid, which used to be part of e2fsprogs but has now been combined with util-linux-ng.

blkid can probe a filesystem that it has not seen before and tell what type it is:

# blkid /tmp/test.img 
/tmp/test.img: UUID="c80ebc11-3b26-4b93-acbb-f52bdfaa9ac5" TYPE="xfs" 

Looking at the source for blkid confirms there is a directory full of probe tools for every conceivable filesystem.

The mount utility calls out to blkid — actually to the libblkid library, not to the command line tool, but it comes to the same thing.

So /bin/mount knows what it’s mounting, and requests the “xfs” filesystem type when it issues the system call into the kernel.

That still leaves the question of how the xfs module gets loaded. The answer is that the mount syscall eventually calls the kernel function __request_module. This strange function actually calls out to the userspace /sbin/modprobe binary, causing the module to get loaded. Meanwhile the mount syscall itself is paused. And yes, it even deals with the recursive situation where modprobe might need to mount filesystems or load other modules in order to succeed.

So there you have it, mounting a filesystem can magically load the right kernel module for that filesystem. All done using some userspace probing and some kernel trickery.

10 Comments

Filed under Uncategorized