Tag Archives: tcl

NBD graphical viewer

Ever wondered what is really happening when you write to a disk? What blocks the filesystem writes to and so on? With our flexible, plug-in based NBD server called nbdkit and a little Tcl/Tk program I wrote you can now visualise this.

As in this video (MP4)

nbdview1234

… which shows me opening a blank disk, partitioning it, creating an ext4 filesystem and writing some files.

There’s a lot going on in this video, which I’ll explain below. But first to say that each pixel corresponds to a 4K block on disk — the total disk size is 64M which is 128×128 pixels, and each row is therefore half a megabyte. Red pixels are writes. Black flashing pixels show reads. Light purple is for trim requests, and white pixels are zero requests.

nbdkit was run with the following command line:

$ nbdkit -fv \
    --filter=log \
    --filter=delay \
    memory size=$((64*1024*1024)) \
    logfile=/tmp/log \
    rdelay=40ms wdelay=40ms

This means that we’re using the memory plugin to create a throwaway blank disk of 64M. In front of this plugin we place two filters: The delay filter delays all reads and writes by 40ms. This makes it easier to see what’s going on. The second filter is the log filter which records all requests in a log file (/tmp/log).

The log file is what the second command reads asynchronously to generate the graphical image:

$ ./nbdview.tcl /tmp/log $((64*1024*1024))

So to the video (MP4):

  • 00:07: I start guestfish connected to the NBD server. This boots a Linux appliance, and you can see from the flashes of black how the Linux kernel probes the disk every which way to try to detect any kind of partition or filesystem signature. (Recall that I’m intentionally delaying all read requests which is why the appliance boot and probing seems to take so long. In reality these probes happen near instantaneously.) Of course the disk is all zeroes at this point, so nothing is found.
  • 00:23: I partition the disk using GPT. The partitioning is done under the hood by GNU parted and as you can see there is a considerable amount of probing going on by both parted and then the kernel still looking for filesystem signatures. Eventually we end up with two blocks of red (written) data, because GPT creates both a primary and secondary partition table at the beginning and end of the disk.
  • 00:36: I create an ext4 filesystem inside the partition. After even more probing by mkfs the first major operation is to trim/discard all data on the disk (shown by the disk filling up with light purple). Then mkfs writes a large block of data in the middle of the disk which I believe is the journal, followed by four dots which I believe could be backup superblocks.
  • 00:48: Interestingly filesystem creation has not finished. ext4 (as well as other modern filesystems) defer a lot of work to the kernel, and this is obvious when I mount the disk. Notice that a few seconds after the mount (around 00:59) the kernel starts zeroing parts of the disk. I believe this is the inode table and block free bitmap for the first block group. For larger disks this lazy initialization could go on for a long time.
  • 01:05: I unpack a tarball into the filesystem. As expected the operation finishes almost instantaneously, and nothing is actually written to disk. However issuing an explicit sync at 01:11 causes the files and directories to be written, filling first the data blocks and then the inodes and block free bitmap (is there a reason these are written last, or is it just coincidence? Also does the Linux page cache retain the order that the filesystem wrote the blocks?)
  • 01:18: I delete the directory tree I just created. As you’d expect nothing is written to disk, and even after a sync nothing much changes.
  • 01:26: In contrast when I fstrim the filesystem, all the now-deleted data blocks are discarded (light purple). This is the same principle which virt-sparsify --in-place uses to make a disk image sparse.
  • 01:32: Finally after unmounting the filesystem I issue a blkdiscard command which throws the whole thing away. Even after this Linux is probing the partition to see if somehow a filesystem signature could be present.

You can easily reproduce this and similar results yourself using nbdkit and nbdview, and I’ve submitted a talk to FOSDEM about this and much more fun you can have with nbdkit.

2 Comments

Filed under Uncategorized

You can now write nbdkit plugins in Tcl

I have a soft spot for the Tcl programming language because Tcl/Tk was one of the earliest and best rapid GUI development environments available on Unix.

Well now you can write nbdkit plugins in Tcl.

Here’s an example:

# This is called from: nbdkit tcl example.tcl --dump-plugin
proc dump_plugin {} {
    puts "example_tcl=1"
}

# We expect a file=... parameter pointing to the file to serve.
proc config {key value} {
    global file

    if { $key == "file" } {
        set file $value
    } else {
        error "unknown parameter $key=$value"
    }
}

# Check the file parameter was passed.
proc config_complete {} {
    global file

    if { ![info exists file] } {
        error "file parameter missing"
    }
}

# Open a new client connection.
proc plugin_open {readonly} {
    global file

    # Open the file.
    if { $readonly } {
        set flags "r"
    } else {
        set flags "r+"
    }
    set fh [open $file $flags]

    # Stop Tcl from trying to convert to and from UTF-8.
    fconfigure $fh -translation binary

    # We can return any Tcl object as the handle.  In this
    # plugin it's convenient to return the file handle.
    return $fh
}

# Close a client connection.
proc plugin_close {fh} {
    close $fh
}

proc get_size {fh} {
    global file

    return [file size $file]
}

proc pread {fh count offset} {
    seek $fh $offset
    return [read $fh $count]
}

proc pwrite {fh buf offset} {
    seek $fh $offset
    puts -nonewline $fh $buf
}

2 Comments

Filed under Uncategorized

miniexpect: A small expect library for C

The rewrite of virt-p2v dumps the old system where the front end and the back end talked over a custom protocol. In the new version, virt-p2v ssh’es into the conversion server and runs commands directly.

However to do this I needed to be able to control the interactive ssh process from the program.

The classical way to do this is using Tcl’s expect tool. You can write expect scripts like this:

#!/usr/bin/expect  --
set timeout 30
spawn ssh user@remote
expect {           
    password: {
        send "secret\r"
    }
    "yes/no)?" {
        send "yes\r"
        set timeout -1
    }
    timeout {
        exit
    }
    eof {
        exit
    }
}

This doesn’t work so well from C, but luckily there is libexpect. Unfortunately the libexpect API isn’t thread-safe (we ❤ globals!), doesn’t adhere to modern C standards, and the library is quite buggy.

How hard would it be to write an expect replacement? At first glance it seems like it would be hard. libexpect has to contain a complete regular expression implementation (and a glob implementation too), because it has to be able to drive the regular expression parser over the incomplete data that it is reading from the subprocess.

However here is an observation: PCRE provides an implementation of partial pattern matching meaning that you can match on the incomplete data you’ve already received from the subprocess and decide whether you have a complete match or a partial match (in the partial match case you just wait for more data to arrive).

Using PCRE partial matches, I was able to write a pretty good mini-library for expect in just under 500 lines of code. The library is here:

http://git.annexia.org/?p=miniexpect.git;a=tree

Leave a comment

Filed under Uncategorized