Tag Archives: linux

An NBD block device written using Linux ublk (user block device)

Commits [1] and [2] and more here.

ublk is a Linux-only io_uring-based user block device. It lets you write block devices in userspace. nbdublk is an NBD client written using ublk.

# modprobe ublk_drv
# nbdublk /dev/ublkb0 nbd://remote
# ublk list

# blockdev --getsize64 /dev/ublkb0
# mke2fs /dev/ublkb0
# (etc)

# ublk del -n 0

Leave a comment

Filed under Uncategorized

SSH from RHEL 9 to RHEL 5 or RHEL 6

RHEL 9 no longer lets you ssh to RHEL ≤ 6 hosts out of the box. You can weaken security of the whole system but there’s no easy way to set security policy per remote host. Here’s how to set up ssh so it works for a RHEL 5 or RHEL 6 host:

First edit your .ssh/config file, adding an entry for the host:

Host rhel5or6-host
KexAlgorithms +diffie-hellman-group14-sha1
MACs +hmac-sha1
HostKeyAlgorithms +ssh-rsa
PubkeyAcceptedKeyTypes +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa

(The lines except the first “Host” line should be indented. WordPress screws up the formatting …)

That’s not enough on its own, because RHEL 9 also maims the openssl library by disabling SHA1 support by default. To fix that, create /var/tmp/openssl.cnf with:

.include /etc/ssl/openssl.cnf
[openssl_init]
alg_section = evp_properties
[evp_properties]
rh-allow-sha1-signatures = yes

Now you can ssh to RHEL 5 or RHEL 6 hosts like this:

OPENSSL_CONF=/var/tmp/openssl.cnf ssh rhel5or6-host

Thanks Laszlo Ersek for working out most of this. Related bugs:

2064740 – RFE: Make it easier to configure LEGACY policy per service or per host

2062360 – RFE: Virt-v2v should replace hairy “enable LEGACY crypto” advice which a more targeted mechanism

2 Comments

Filed under Uncategorized

FUSE mounting on top of a file

Our tool nbdfuse lets you mount an NBD block device as a file, using Linux FUSE. For example you could create a directory with a single file in it (called nbd) which contains the contents of the NBD export:

$ mkdir /var/tmp/test
$ nbdfuse /var/tmp/test --command nbdkit -s memory 1G &
$ ls -l /var/tmp/test/
total 0
 -rw-rw-rw-. 1 rjones rjones 1073741824 Nov  4 13:25 nbd
$ fusermount -u /var/tmp/test

This is cool, but wouldn’t it be nice to get rid of the directory and create the file anywhere? Recently Max Reitz found out you can mount a FUSE filesystem over a regular file.

It works! (After a few adjustments to the nbdfuse code)

$ touch /var/tmp/disk.img
$ nbdfuse /var/tmp/disk.img --command nbdkit -s memory 1G &
$ ls -l /var/tmp/disk.img
 -rw-rw-rw-. 1 rjones rjones 1073741824 Nov  4 13:29 /var/tmp/disk.img
$ fusermount -u /var/tmp/disk.img 

1 Comment

Filed under Uncategorized

nbdkit with BitTorrent

nbdkit is our high performance Network Block Device server for serving disk images from unusual sources. One (usual) source for Linux installers is to download an ISO from a website like Get Fedora or debian.org. However that costs the host money and is also a central point of failure, so another way to download Linux installers is over BitTorrent. Many Linux distros offer torrents of their installers including Fedora and Debian. By using these you are helping to redistribute Linux and defraying the cost of hosting these ISOs.

Now I’ve written a BitTorrent plugin for nbdkit so you can download, redistribute and install Linux all at the same time!

$ url=https://torrent.fedoraproject.org/torrents/Fedora-Server-dvd-x86_64-32.torrent
$ wget $url
$ nbdkit -U - torrent Fedora-Server-*.torrent \
         --run 'qemu-system-x86_64 -m 2048 -cdrom $nbd -boot d'

So what’s the serious use for this? It has the interesting property that the more people who are installing your Linux distro, the less bandwidth it uses and the faster it runs! This could be interesting technology for any kind of distributed environment where you have lots of machines accessing the same fixed/read-only filesystem or disk image.

If you want to get started with nbdkit it’s already in all popular Linux distributions, and compiles from source on Linux, FreeBSD and OpenBSD.

Leave a comment

Filed under Uncategorized

New nbdkit “remote tmpfs” (tmpdisk plugin)

I was making some thin clients for the Fedora RISC-V project a few weeks ago. These are based on the HiFive Unleashed U540 board and so they have no local SATA, only slow, unreliable SD cards. Any filesystems that might be heavily used must be network filesystems.

As these are Linux clients we essentially have three possible choices for network filesystems: NFS, NBD or a cluster FS. As I didn’t want to set up multiple server nodes or need the redundancy, cluster filesystems are immediately discounted. NFS is — “fine”. And indeed I selected that for /home where performance is not a problem. But for the clients that are build servers I selected NBD as a high-performance block-based storage. I’m using nbdkit, the high performance flexible NBD server.

However nbdkit didn’t quite do what I wanted, so I had to write a new plugin. I needed a “remote tmpfs“.

To get a new “remote tmpfs”, a fresh filesystem each time, the client does:

modprobe nbd
nbd-client server /dev/nbd0
mount /dev/nbd0 /var/scratch

Backing this is a flexible new plugin called tmpdisk. By default it creates a new disk for each connection, formats it with mkfs and serves it to the client. But it’s also scriptable, so you can substitute any command you like instead of mkfs to create your own custom disks (I recommend looking at mke2fs -d in case you need to have pre-populated scratch disks).

It’s important to note that these are scratch disks so when the client unmounts the filesystem it is completely deleted from the server. But that’s fine for temporary directories and (for my purposes) package builds.

To find out more about nbdkit, watch my video from FOSDEM 2019.

2 Comments

Filed under Uncategorized

New nbdkit data strings

You can use nbdkit, our infinitely flexible Network Block Device server to serve small disks and test images with the nbdkit data plugin. For example you can cut and paste this command into your shell to demonstrate a bootable disk image which prints “hello, world”:

nbdkit data data='
    0xb4 0 0xb0 3 0xcd 0x10 0xb4 0x13
    0xb3 0x0a 0xb0 1 0xb9 0x0e 0 0xb6
    0 0xb2 0 0xbd 0x19 0x7c 0xcd 0x10
    0xf4 0x68 0x65 0x6c 0x6c 0x6f 0x2c 0x20
    0x77 0x6f 0x72 0x6c 0x64 0x0d 0x0a
    @0x1fe 0x55 0xaa
' --run 'qemu-system-i386 -fda $nbd'

(As an aside, what is the smallest nbdkit data string that can boot to a “hello, world” message?)

The data parameter is a mini-language, and I recently extended it in an interesting way. It wasn’t possible to make repeated patterns easily before. If you wanted a disk containing 0x55 0xAA repeated (the binary bit patterns 01010101 10101010) then the only way to get that was to literally write:

nbdkit data data='0x55 0xAA 0x55 0xAA [repeated many times ...]'

but now you can group things together and write:

nbdkit data data='( 0x55 0xAA )*256'

The nesting works by recursively creating a new parser, which means you can use any data expression. For example to get 4 sectors containing half blank and half test data you can now do:

nbdkit data data='( @256 ( 0x55 0xAA )*128 )*4'

This gives you lots of way to make disks containing test patterns which you could then use to test Linux programs using /dev/nbd0 loop devices.

1 Comment

Filed under Uncategorized

Cascade – a turn-based text arcade game

cascade

I wrote this game about 20 years ago. Glad to see it still compiled out of the box on the latest Linux distro! Download it from here. If anyone can remember the name or any details of the original 1980s MS-DOS game that I copied the idea from, please let me know in the comments.

Leave a comment

Filed under Uncategorized

AMD Ryzen 9 3900X – nice!

Screenshot_2019-09-04_11-08-41

This thing really screams. It’s nice being able to do make -j24 (threads) builds so quickly.

1 Comment

Filed under Uncategorized

NBD’s state machine

states

Eric and I are writing a Linux NBD client library. There were lots of requirements but the central one for this post is it has to be a library callable from programs written in C and other programming languages (Python, OCaml and Rust being important), and we don’t control those programs so they may be single or multithreaded, or may use non-blocking main loops like gio and glib.

An NBD command involves sending a request over a socket to a remote server and receiving a reply. You can also have multiple requests “in flight” and the reply can be received in multiple parts. On top of this the “fixed newstyle” NBD protocol has a complex multi-step initial handshake. Complicating it further we might be using a TLS transport which has its own handshake.

It’s complicated and we mustn’t block the caller.

There are a few ways to deal with this in my experience — one is to ignore the problem and insist that the main program uses a thread for each NBD connection, but that pushes complexity onto someone else. Another way is to use some variation of coroutines or call/cc — if we get to a place where we would block then we save the stack, return to the caller, and have some way to restore the stack later. However this doesn’t necessarily work well with non-C programming languages. It likely won’t work with either OCaml or Ruby’s garbage collectors since they both involve stack walking to find GC roots. I’d generally want to avoid “tricksy” stuff in a library.

The final way that I know about is to implement a state machine. However large state machines are hellishly difficult to write. Our state machine has 75 states (so far — it’s nowhere near finished). So we need a lot of help.

I came up with a slightly nicer way to write state machines.

The first idea is that states in a large state machine could be grouped. You can consider each group like a mini state machine — it has its own namespace, lives in a single file (example), and may only be entered via a single START state (so you don’t need to consider what happens if another part of the state machine jumps into the middle of the group).

Secondly groups can be hierarchical. This lets us organise the groups logically, so for example there is a group for “fixed newstyle handshaking” and within that there are separate groups for negotiating each newstyle option. States can refer to each other using either relative or absolute paths in this hierarchy.

Thirdly all states and transitions are defined and checked in a single place, allowing us to enforce rules about what transitions are permitted.

Fourthly the final C code that implements the state machine is generated (mostly). This lets us generate helper functions to (eg) turn state transitions into debug messages, or return whether the connection is in a mode where it’s expecting to read or write from the socket (making it easier to integrate with main loops).

The final code looks like this and generates currently 173K lines of C (although as it’s mostly large switch statements it compiles down to a reasonably small size).

Has anyone else implemented a large state machine in a similar way?

4 Comments

Filed under Uncategorized

nbdkit / FOSDEM test presentation about better loop mounts for Linux

I’ve submitted a talk about nbdkit, our flexible pluggable NBD server, to FOSDEM next February. This is going to be about using NBD as a better way to do loop mounts in Linux.

In preparation I gave a very early version of the talk to a small Red Hat audience.

Video link: http://oirase.annexia.org/rwmj.wp.com/rjones-nbdkit-tech-talk-2018-11-19.mp4

Sorry about the slow start. You may want to skip to 2 mins to get past the intro.

Summary of what’s in the talk:

  1. Demo of regular, plain loop mounting.
  2. Demo of loop mounting an XZ-compressed disk image using NBD + nbdkit.
  3. Slides about how loop device compares to NBD.
  4. Slides about nbdkit plugins and filters.
  5. Using VMware VDDK to access a VMDK file.
  6. Creating a giant disk costing EUR 300 million(!)
  7. Visualizing a single filesystem.
  8. Visualizing RAID 5.
  9. Writing a plugin in shell script (live demo).
  10. Summary.

Screenshot_2018-11-26_17-18-16

2 Comments

Filed under Uncategorized