Yearly Archives: 2018

Does virt-v2v preserve sparseness?

A common question is does virt-v2v preserve sparseness (aka thin provisioning) when you convert the guest from VMware to a KVM-based hypervisor like oVirt/RHV or OpenStack? The very short answer is: “what?”. The medium answer is: The question doesn’t really make sense. For the long answer, read on …

First of all we need to ask what is thin provisioning? For complex and essentially historical reasons when you provision a new guest you have to decide how much maximum disk space you want it to have. Let’s say you choose 10 GB. However because a new guest install might only take, say, 1 GB, you can also decide if you want the whole 10 GB to be preallocated up front or if you want the disk to be thin provisioned or sparse, meaning it’ll only take up 1 GB of host space, but that will increase as the guest creates new files. There are pros and cons to preallocation. Preallocating the full 10 GB obviously takes a lot of extra space (and what if the guest never uses it?), but it can improve guest performance in some circumstances.

That is what happens initially. By the time we come to do a virt-v2v conversion that guest may have been running for years and years. Now what does the disk look like? It doesn’t matter if the disk was initially thin provisioned or fully allocated, what matters is what the guest did during those years.

Did it repeatedly fill up the disk and/or delete those files? – In which case your initially thin provisioned guest could now be fully allocated.

Did it have trimming enabled? Your initially preallocated guest might now have become sparsely allocated.

In any case VMware doesn’t store this initial state, nor does it make it very easy to find out which bits of the disk are actually backed by host storage and which bits are sparse (well, maybe this is possible, but not using the APIs we use when liberating your guests from VMware).

In any case, as I explained in this talk (slides) from a few years ago, virt-v2v tries to avoid copying any unused, zeroed or deleted parts of the disk for efficiency reasons, and so it will always make the disk maximally sparse when copying it (subject to what the target hypervisor does, read on).

When virt-v2v comes to creating the target guest, the default is to create a maximally sparse guest, but there are two ways to change this:

  1. You can specify the -oa preallocated option, where virt-v2v will try to ask the target hypervisor to fully preallocate the target disks of the guest.
  2. For some hypervisors, especially RHV, your choice of backend storage may mean you have no option but to use preallocated disks (unfortunately I cannot give clear advice here, best to ask a RHV expert).

The basic rule is that when converting guests you need to think about whether you want the guest to be sparse or preallocated after conversion, based on your own performance vs storage criteria. Whether it happened to be thin provisioned when you set it up on VMware years earlier isn’t a relevant issue.

2 Comments

Filed under Uncategorized

New in nbdkit: Create an ISO image on the fly

nbdkit is the pluggable Network Block Device server that Eric and I wrote. I have submitted a talk to FOSDEM next February about the many weird and wonderful ways you can use nbdkit as a flexible replacement for loopback mounting.

Anyway, new in nbdkit 1.7.6 you can now create ISO 9660 (CD-ROM) disk images on the fly from a directory:

# nbdkit iso /boot params="-JrT"
# nbd-client -b 512 localhost /dev/nbd0
# file -bsL /dev/nbd0
ISO 9660 CD-ROM filesystem data 'CDROM'
# mount /dev/nbd0 /tmp/mnt
# ls /tmp/mnt
config-4.18.0-0.rc8.git2.1.fc29.x86_64
config-4.19.0-0.rc1.git3.2.fc30.x86_64
config-4.19.0-0.rc6.git0.1.fc30.x86_64
efi
extlinux
grub2
[etc]
# umount /tmp/mnt
# nbd-client -d /dev/nbd0
# killall nbdkit

That ISO wouldn’t actually be bootable, but you could create one (eg. an El Torito ISO) by adding the appropriate extra parameters.

To head off the first question: If you copy files into the directory while nbdkit is running, do they appear in the ISO? Answer: No! This is largely impossible with the way Linux block devices work.

3 Comments

Filed under Uncategorized

Fedora/RISC-V now mirrored as a Fedora “alternative” architecture

https://dl.fedoraproject.org/pub/alt/risc-v/repo/fedora/29/latest/. These packages now get mirrored further by the Fedora mirror system, eg. to https://mirror.math.princeton.edu/pub/alt/risc-v/repo/fedora/29/latest/

If you grab the latest nightly Fedora builds you can get the mirrors by editing the /etc/yum.repos.d/*.repo file.

Also we got some additional help so we now have loads more build hosts! These were provided by Facebook with hosting by Oregon State University Open Source Lab (see cfarm), so thanks to them.

Thanks to David Abdurachmanov and Laurent Guerby for doing all the work (I did nothing).

12 Comments

Filed under Uncategorized

Programming a game on the ZX81

This took me back as it was my first ever computer and I had no games so I had to program it. I would recommend that David buys a RAMPACK.

3 Comments

Filed under Uncategorized

Run Fedora RISC-V with X11 GUI in your browser

https://bellard.org/jslinux/vm.html?cpu=riscv64&url=https://bellard.org/jslinux/fedora29-riscv-xwin.cfg&graphic=1&mem=256

Without the GUI is a bit faster: https://bellard.org/jslinux/vm.html?cpu=riscv64&url=https://bellard.org/jslinux/fedora29-riscv-2.cfg&mem=256

Leave a comment

Filed under Uncategorized

Creating Windows templates for virt-builder

virt-builder is a tool for rapidly creating customized Linux images. Recently I’ve added support for Windows although for rather obvious licensing reasons we cannot distribute the Windows templates which would be needed to provide Windows support for everyone. However you can build your own Windows templates as described here and then:

$ virt-builder -l | grep windows
windows-10.0-server      x86_64     Windows Server 2016 (x86_64)
windows-6.2-server       x86_64     Windows Server 2012 (x86_64)
windows-6.3-server       x86_64     Windows Server 2012 R2 (x86_64)
$ virt-builder windows-6.3-server
[   0.6] Downloading: http://xx/builder/windows-6.3-server.xz
[   5.1] Planning how to build this image
[   5.1] Uncompressing
[  60.1] Opening the new disk
[  77.6] Setting a random seed
virt-builder: warning: random seed could not be set for this type of guest
virt-builder: warning: passwords could not be set for this type of guest
[  77.6] Finishing off
                   Output file: windows-6.3-server.img
                   Output size: 10.0G
                 Output format: raw
            Total usable space: 9.7G
                    Free space: 3.5G (36%)

To build a Windows template repository you will need the latest libguestfs sources checked out from https://github.com/libguestfs/libguestfs and you will also need a suitable Windows Volume License, KMS or MSDN developer subscription. Also the final Windows templates are at least ten times larger than Linux templates, so virt-builder operations take correspondingly longer and use lots more disk space.

First download install ISOs for the Windows guests you want to use.

After cloning the latest libguestfs sources, go into the builder/templates subdirectory. Edit the top of the make-template.ml script to set the path which contains the Windows ISOs. You will also possibly need to edit the names of the ISOs later in the script.

Build a template, eg:

$ ../../run ./make-template.ml windows 2k12 x86_64

You’ll need to read the script to understand what the arguments do. The script will ask you for the product key, where you should enter the volume license key or your MSDN key.

Each time you run the script successfully you’ll end up with two files called something like:

windows-6.2-server.xz
windows-6.2-server.index-fragment

The version numbers are Windows internal version numbers.

After you’ve created templates for all the Windows guest types you need, copy them to any (private) web server, and concatenate all the index fragments into the final index file:

$ cat *.index-fragment > index

Finally create a virt-builder repo file pointing to this index file:

# cat /etc/virt-builder/repos.d/windows.conf
[windows]
uri=http://xx/builder/index

You can now create Windows guests in virt-builder. However note they are not sysprepped. We can’t do this because it requires some Windows tooling. So while these guests are good for small tests and similar, they’re not suitable for creating actual Windows long-lived VMs. To do that you will need to add a sysprep.exe step somewhere in the template creation process.

Leave a comment

Filed under Uncategorized

Write nbdkit plugins in shell script(!)

nbdkit is a pluggable NBD server and you can write plugins in C or several other scripting languages. But not shell script – until now. Shell script turns out to be a reasonably nice language for this:

case "$1" in
  get_size)
    stat -L -c '%s' $f ||
      exit 1 ;;
  pread)
    dd iflag=skip_bytes,count_bytes skip=$4 count=$3 if=$f ||
      exit 1 ;;
  pwrite)
    dd oflag=seek_bytes conv=notrunc seek=$4 of=$f ||
      exit 1 ;;

Now that coreutils util-linux provides the fallocate program we can even implement efficient trim and zeroing operations.

2 Comments

Filed under Uncategorized

nbdkit for loopback pt 7: a slow disk

What happens to filesystems and programs when the disk is slow? You can test this using nbdkit and the delay filter. This command creates a 4G virtual disk in memory and injects a 2 second delay into every read operation:

$ nbdkit --filter=delay memory size=4G rdelay=2

You can loopback mount this as a device on the host:

# modprobe nbd
# nbd-client -b 512 localhost /dev/nbd0
Warning: the oldstyle protocol is no longer supported.
This method now uses the newstyle protocol with a default export
Negotiation: ..size = 4096MB
Connected /dev/nbd0

Partitioning and formatting is really slow!

# sgdisk -n 1 /dev/nbd0
Creating new GPT entries in memory.
... sits here for about 10 seconds ...
The operation has completed successfully.
# mkfs.ext4 /dev/nbd0p1
mke2fs 1.44.3 (10-July-2018)
waiting ...

Actually I killed it and decided to restart the test with a smaller delay. Since the memory plugin was rewritten to use a sparse array, we’re serializing all requests as an easy way to lock the sparse array data structure. This doesn’t matter normally because requests to the memory plugin are extremely fast, but once you inject delays this means that every request into nbdkit is serialized. Thus for example two reads issued in parallel at the same time by the kernel are delayed by 2+2 = 4 seconds instead of 2 seconds in total.

However shutting down the NBD connection reveals likely kernel bugs in the NBD driver:

[74176.112087] block nbd0: NBD_DISCONNECT
[74176.112148] block nbd0: Disconnected due to user request.
[74176.112151] block nbd0: shutting down sockets
[74176.112183] print_req_error: I/O error, dev nbd0, sector 6144
[74176.112252] print_req_error: I/O error, dev nbd0, sector 6144
[74176.112257] Buffer I/O error on dev nbd0p1, logical block 4096, async page read
[74176.112260] Buffer I/O error on dev nbd0p1, logical block 4097, async page read
[74176.112263] Buffer I/O error on dev nbd0p1, logical block 4098, async page read
[74176.112265] Buffer I/O error on dev nbd0p1, logical block 4099, async page read
[74176.112267] Buffer I/O error on dev nbd0p1, logical block 4100, async page read
[74176.112269] Buffer I/O error on dev nbd0p1, logical block 4101, async page read
[74176.112271] Buffer I/O error on dev nbd0p1, logical block 4102, async page read
[74176.112274] Buffer I/O error on dev nbd0p1, logical block 4103, async page read

Note nbdkit did not return any I/O errors, but the connection was closed with in-flight delayed requests. Well at least our testing is finding bugs!

I tried again with a 500ms delay and using the file plugin which is fully parallel:

$ rm -f temp
$ truncate -s 4G temp
$ nbdkit --filter=delay file file=temp rdelay=500ms

I was able to partition and create a filesystem more easily on this because of the shorter delay and the fact that parallel kernel requests are delayed “in parallel” [same steps as above], and then mount it on a temp directory:

# mount /dev/nbd0p1 /tmp/mnt

The effect is rather strange, like using an NFS mount from a remote server. Initial file reads are slow, and then they are fast (as they are cached in memory). If you drop Linux caches:

# echo 3 > /proc/sys/vm/drop_caches

then everything becomes slow again.

Confident that parallel requests were being delayed in parallel, I also increased the delay back up to 2 seconds (still using the file plugin). This is like swimming in treacle or what I imagine it would be like to mount an NFS filesystem from the other side of the world over a 56K modem.

I wasn’t able to find any further bugs, but this should be useful for someone who wants to test this kind of thing.

1 Comment

Filed under Uncategorized

nbdkit for loopback pt 6: giant file-backed disks for testing

In part 1 and part 5 of this series I created some giant disks with a virtual size of 263-1 bytes (8 exabytes). However these were stored in memory using nbdkit-memory-plugin so you could never allocate more space in these disks than available RAM plus swap.

This is a problem when testing some filesystems because the filesystem overhead (the space used to store superblocks, inode tables, block free maps and so on) can be 1% or more.

The solution to this is to back the virtual disks using a sparse file instead. XFS lets you create sparse files up to 263-1 bytes and you can serve them using nbdkit-file-plugin instead:

$ rm -f temp
$ truncate -s $(( 2**63 - 1 )) temp
$ stat -c %s temp
9223372036854775807
$ nbdkit file file=temp

nbdkit-file-plugin recently got a lot of updates to ensure it always maintains sparseness where possible and supports efficient zeroing, so make sure you’re using at least nbdkit ≥ 1.6.

Now you can serve this in the ordinary way and you should be able to allocate as much space as is available on the host filesystem:

# nbd-client -b 512 localhost /dev/nbd0
Negotiation: ..size = 8796093022207MB
Connected /dev/nbd0
# blockdev --getsize64 /dev/nbd0
9223372036854774784
# sgdisk -n 1 /dev/nbd0
# gdisk -l /dev/nbd0
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048  18014398509481948   8.0 EiB     8300

This command will still probably fail unless you have a lot of patience and a huge amount of space on your host:

# mkfs.xfs -K /dev/nbd0p1

Leave a comment

Filed under Uncategorized

nbdkit for loopback pt 5: 8 exabyte btrfs filesystem

Thanks Chris Murphy for noting that btrfs can create and mount 8 EB (approx 263 byte) filesystems effortlessly:

$ nbdkit -fv memory size=$(( 2**63-1 ))
# modprobe nbd
# nbd-client -b 512 localhost /dev/nbd0
# blockdev --getss /dev/nbd0
512
# gdisk /dev/nbd0
Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048  18014398509481948   8.0 EiB     8300  Linux filesystem
# mkfs.btrfs -K /dev/nbd0p1
btrfs-progs v4.16
See http://btrfs.wiki.kernel.org for more information.

Detected a SSD, turning off metadata duplication.  Mkfs with -m dup if you want to force metadata duplication.
Label:              (null)
UUID:               770e5592-9055-4551-8416-b6376802a2ad
Node size:          16384
Sector size:        4096
Filesystem size:    8.00EiB
Block group profiles:
  Data:             single            8.00MiB
  Metadata:         single            8.00MiB
  System:           single            4.00MiB
SSD detected:       yes
Incompat features:  extref, skinny-metadata
Number of devices:  1
Devices:
   ID        SIZE  PATH
    1     8.00EiB  /dev/nbd0p1

# mount /dev/nbd0p1 /tmp/mnt
# df -h /tmp/mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/nbd0p1     8.0E   17M  8.0E   1% /tmp/mnt

I created a few files in there and it all seemed to work although I didn’t do any extensive testing. Good job btrfs!

3 Comments

Filed under Uncategorized