Tag Archives: ext3

“mount -o ro” writes to the disk

mount -o ro ... or the equivalent libguestfs command mount-ro writes to the disk.

This is easy to show. Create a disk image containing an ext3 filesystem and a single file:

$ guestfish -N fs:ext3 -n -m /dev/sda1 touch /hello-world : sync
$ md5sum test1.img
a1f6684a8a04d14f7599902bc0ab4aaa  test1.img

Explanation:

  1. The guestfish -N option creates a prepared disk called test1.img in the current directory.
  2. The guestfish -n option turns off autosync, so the disk will not be cleanly unmounted after the command has finished.
  3. -m /dev/sda1 mounts the prepared filesystem
  4. touch creates a file on the prepared filesystem
  5. sync is needed to write changes without unmounting the filesystem (so it is dirty).
  6. md5sum computes the MD5 hash of the disk we’ve just created

Now let’s open the disk, mount it with the -o ro option, and read the root directory:

$ guestfish -a test1.img run : mount-ro /dev/sda1 / : ll /
total 17
drwxr-xr-x  3 root root  1024 Feb  3 17:49 .
drwxr-xr-x 23  500  500  4096 Feb  3 17:50 ..
-rw-r--r--  1 root root     0 Feb  3 17:49 hello-world
drwx------  2 root root 12288 Feb  3 17:49 lost+found
$ md5sum test1.img
8fab31ef115cb8a6edcbc71db61fcafc  test1.img

Explanation:

  1. -a test1.img adds the prepared disk (test1.img) to the libguestfs appliance, but note, not read-only
  2. run starts the appliance
  3. mount-ro mounts the prepared filesystem with the -o ro flag

Notice the MD5 hash of the disk has changed!

Try repeating the second command and you’ll see that the MD5 hash stays the same.

It appears that because the filesystem is ext3 and was originally dirty, mount -o ro reruns the journal and thus writes to the underlying disk. (It is possible also that it is merely updating the superblock to mark it clean, but in any case the point is that it is writing to the disk).

All of this is not unexpected, but it shows that you must use libguestfs add-drive-ro if you really want to look at a disk image without making any changes to it. That command uses qemu snapshots to ensure that any write operations never make it to the disk, but get discarded in an anonymous snapshot overlay.

If you use the guestfish --ro option then any -a or -d drives are added read-only.

Leave a comment

Filed under Uncategorized

Create a partitioned device from a collection of filesystems

Xen has a feature where it can export virtual partitions directly to virtual machines. You can configure a Xen VM like this example:

disk = ['phy:raidvg/devroot,hda1,w','phy:raidvg/devswap,hda2,w']

Notice that host device /dev/raidvg/devroot is mapped to a partition inside the guest (/dev/hda1), and on the host this device directly contains a filesystem:

host# file - < /dev/raidvg/devroot
/dev/stdin: Linux rev 1.0 ext3 filesystem data, UUID=... (needs journal recovery)

Inside the guest, it sees /dev/hda1, /dev/hda2, but no /dev/hda device or partition table.

This is actually a nice feature of Xen because resizing filesystems directly is much easier than resizing a partitioned block device. You can just make the host device bigger (lvresize -L sizeG /dev/raidvg/devroot), reboot the guest so it sees the increased device size, then resize the filesystem (resize2fs — this can even be done live if you want to make the filesystem bigger).

Imagine if we’d been dealing with a KVM partitioned block device instead:

+-+---------------------+------------+
|M| hda1                | hda2       |
|B| (root filesystem)   | (swap)     |
|R|                     |            |
+-+---------------------+------------+

Resizing this is much more painful. You first have to extend the host block device:

+-+---------------------+------------+-------+
|M| hda1                | hda2       | space |
|B| (root filesystem)   | (swap)     |       |
|R|                     |            |       |
+-+---------------------+------------+-------+

Now what do you do? Easiest is probably to create a third (hda3) partition in that extra space. If you didn’t have the foresight to use LVM, then this means your root filesystem cannot be extended — you can only create another extra filesystem (say for /var) and copy files over. This is very inflexible.

Instead you could recalculate the MBR and move (ie. copy block by block) hda2 up. (Imagine it wasn’t swap space since you can just throw that away and recreate it, but some valuable files). Recalculating the MBR is generally error-prone because partitions have strange limitations and alignment requirements.

One day I intend to write a program to do these kinds of complex resizing operations …

Anyhow, this wasn’t even what this rambling blog entry was about. It is a companion to last week’s tip about extracting filesystems from disk images. Can we do the opposite, ie. create a partitioned device from a collection of Xen filesystems?

Answer, yes we can, with guestfish.

I’m starting in fact with the filesystem and swap devices copied from my Xen server, and I need to know their exact sizes in 1024-byte-blocks first:

$ ls --block-size=1024 -l devroot devswap
-rw-rw-r--. 1 rjones rjones 3145728 2010-03-17 14:18 devroot
-rw-rw-r--. 1 rjones rjones 1048576 2010-03-17 14:19 devswap

I’m going to put this into a 5G disk image, giving me space to expand the root filesystem to fit. Inexplicably I’ve decided to keep the swap partition content even though in reality I would just throw it away and recreate the swap partition (imagine there’s some important filesystem content in there instead). I want devswap to precisely fit at the end of the new disk image.

Let’s create the disk image and find out how big it is in sectors:

$ rm -f disk.img
$ truncate -s 5G disk.img
$ guestfish -a disk.img -a devroot -a devswap
><fs> run
><fs> blockdev-getsz /dev/vda
10485760  # size in 512 byte sectors

Now I need to do some back of the envelope calculations to work out how I will size and place each partition. (This is a huge pain in the neck — I had to do several runs to get the numbers to come out right …)

><fs> part-init /dev/vda mbr
# numbers below are in units of 512 byte sectors:
><fs> part-add /dev/vda primary 64 8388607
><fs> part-add /dev/vda primary 8388608 -1
><fs> sfdisk-l /dev/vda

Disk /dev/vda: 10402 cylinders, 16 heads, 63 sectors/track
Units = cylinders of 516096 bytes, blocks of 1024 bytes, counting from 0

   Device Boot Start     End   #cyls    #blocks   Id  System
/dev/vda1          0+   8322-   8322-   4194272   83  Linux
/dev/vda2       8322+  10402-   2081-   1048576   83  Linux
/dev/vda3          0       -       0          0    0  Empty
/dev/vda4          0       -       0          0    0  Empty

Notice the number of (1024-byte) blocks for devswap is exactly the correct size: 1048576.

The sfdisk-l command is also telling me that my partitions aren’t aligned on “cylinders” which I don’t care about. But the swap partition should be aligned for the underlying device because sector 8388608 == 8192 * 1024.

Once the hard bit is out of the way, I can now copy across my filesystems. Notice I added devroot and devswap as devices (the -a option to guestfish). They appear in the guest as /dev/vdb and /dev/vdc respectively and I can just dd them to the right places:

><fs> dd /dev/vdb /dev/vda1
><fs> dd /dev/vdc /dev/vda2

and resize the root filesystem to fit the space available:

><fs> e2fsck-f /dev/vda1
><fs> resize2fs /dev/vda1

Now I have a single partitioned device, suitable for use with KVM (mind you, not bootable because it still contains a Xen paravirt kernel):

$ virt-list-filesystems -al disk.img
/dev/sda1 ext3
/dev/sda2 swap

As you can see there is much scope for automation …

1 Comment

Filed under Uncategorized

Is ext2/3/4 faster? On LVM?

This question arose at work — is LVM a performance penalty compared to using straight partitions? To save you the trouble, the answer is “not really”. There is a very small penalty, but as with all benchmarks it does depend on what the benchmark measures versus what your real workload does. In any case, here is a small guestfish script you can use to compare the performance of various filesystems with or without LVM, with various operations. Whether you trust the results is up to you, but I would advise caution.

#!/bin/bash -

tmpfile=/tmp/test.img

for fs in ext2 ext3 ext4; do
    for lvm in off on; do
        rm -f $tmpfile
        if [ $lvm = "on" ]; then
            guestfish <<EOF
              sparse $tmpfile 1G
              run
              part-disk /dev/sda efi
              pvcreate /dev/sda1
              vgcreate VG /dev/sda1
              lvcreate LV VG 800
              mkfs $fs /dev/VG/LV
EOF
            dev=/dev/VG/LV
        else # no LVM
            guestfish <<EOF
              sparse $tmpfile 1G
              run
              part-disk /dev/sda efi
              mkfs $fs /dev/sda1
EOF
            dev=/dev/sda1
        fi
        echo "fs=$fs lvm=$lvm"
        sync
        guestfish -a $tmpfile -m $dev <<EOF
          time fallocate /file1 200000000
          time cp /file1 /file2
EOF
    done
done
fs=ext2 lvm=off
elapsed time: 2.74 seconds
elapsed time: 4.52 seconds
fs=ext2 lvm=on
elapsed time: 2.60 seconds
elapsed time: 4.24 seconds
fs=ext3 lvm=off
elapsed time: 2.62 seconds
elapsed time: 4.31 seconds
fs=ext3 lvm=on
elapsed time: 3.07 seconds
elapsed time: 4.79 seconds

# notice how ext4 is much faster at fallocate, because it
# uses extents

fs=ext4 lvm=off
elapsed time: 0.05 seconds
elapsed time: 3.54 seconds
fs=ext4 lvm=on
elapsed time: 0.05 seconds
elapsed time: 4.16 seconds

5 Comments

Filed under Uncategorized

mkfs compared on different filesystems

How long does it take to mkfs a 10GB disk with all the different filesystems out there?

See my test results here using the new guestfish sparse / filesystem support. btrfs is “best” and ext3 comes off “worst”.

As a test this is interesting, but it’s not that relevant for most users — they will be most interested in how well the filesystem performs for their workload, which is not affected by mkfs time and hard to measure in general benchmarks anyway.

Update

In response to Stephen’s comment, I retested this using a memory-backed block device so there is no question about whether the host backing store affects the test:

$ for fs in ext2 ext3 ext4 xfs jfs reiserfs nilfs2 ntfs msdos btrfs hfs hfsplus gfs gfs2
    do guestfish sparse /dev/shm/test.img 10G : run : echo $fs : sfdiskM /dev/sda , : \
        time mkfs $fs /dev/sda1
    done
ext2
elapsed time: 1.45 seconds
ext3
elapsed time: 2.71 seconds
ext4
elapsed time: 2.58 seconds
xfs
elapsed time: 0.13 seconds
jfs
elapsed time: 0.27 seconds
reiserfs
elapsed time: 0.33 seconds
nilfs2
elapsed time: 0.08 seconds
ntfs
elapsed time: 2.07 seconds
msdos
elapsed time: 0.14 seconds
btrfs
elapsed time: 0.07 seconds
hfs
elapsed time: 0.17 seconds
hfsplus
elapsed time: 0.17 seconds
gfs
elapsed time: 0.84 seconds
gfs2
elapsed time: 2.76 seconds

4 Comments

Filed under Uncategorized