Tag Archives: guestfish

Tip: Read guest disks from VMware vCenter using libguestfs

virt-v2v can import guests directly from vCenter. It uses all sorts of tricks to make this fast and efficient, but the basic technique uses plain https range requests.

Making it all work was not so easy and involved a lot of experimentation and bug fixing, and I don’t think it has been documented up to now. So this post describes how we do it. As usual the code is the ultimate repository of our knowledge so you may want to consult that after reading this introduction.

Note this is read-only access. Write access is possible, but you’ll have to use ssh instead.

VMware ESXi hypervisor has a web server but doesn’t support range requests, so although you can download an entire disk image in one go from the ESXi hypervisor, to random-access the image using libguestfs you will need VMware vCenter. You should check that virsh dumpxml works against your vCenter instance by following these instructions. If that doesn’t work, it’s unlikely the rest of the instructions will work.

You will need to know:

  1. The hostname or IP address of your vCenter server,
  2. the username and password for vCenter,
  3. the name of your datacenter (probably Datacenter),
  4. the name of the datastore containing your guest (could be datastore1),
  5. .. and of course the name of your guest.

Tricky step 1 is to construct the vCenter https URL of your guest.

This looks like:


https://root:password@vcenter/folder/guest/guest-flat.vmdk?dcPath=Datacenter&dsName=datastore1

where:

root:password
username and password
vcenter
vCenter hostname or IP address
guest
guest name (repeated twice)
Datacenter
datacenter name
datastore1
datastore

Once you’ve got a URL that looks right, try to fetch the headers using curl. This step is important! not just because it checks the URL is good, but because it allows us to get a cookie which is required else vCenter will break under the load when we start to access it for real.

$ curl --insecure -I https://....
HTTP/1.1 200 OK
Date: Wed, 5 Nov 2014 19:38:32 GMT
Set-Cookie: vmware_soap_session="52a3a513-7fba-ef0e-5b36-c18d88d71b14"; Path=/; HttpOnly; Secure; 
Accept-Ranges: bytes
Connection: Keep-Alive
Content-Type: application/octet-stream
Content-Length: 8589934592

The cookie is the vmware_soap_session=... part including the quotes.

Now let’s make a qcow2 overlay which encodes our https URL and the cookie as the backing file. This requires a reasonably recent qemu, probably 2.1 or above.

$ qemu-img create -f qcow2 /tmp/overlay.qcow2 \
    -b 'json: { "file.driver":"https",
                "file.url":"https://..",
                "file.cookie":"vmware_soap_session=\"...\"",
                "file.sslverify":"off",
                "file.timeout":1000 }'

You don’t need to include the password in the URL here, since the cookie acts as your authentication. You might also want to play with the "file.readahead" parameter. We found it makes a big difference to throughput.

Now you can open the overlay file in guestfish as usual:

$ export LIBGUESTFS_BACKEND=direct
$ guestfish
><fs> add /tmp/overlay.qcow2 copyonread:true
><fs> run
><fs> list-filesystems
/dev/sda1: ext4
><fs> mount /dev/sda1 /

and so on.

Leave a comment

Filed under Uncategorized

Making a bootable CD-ROM/ISO from virt-builder

virt-builder can throw out new virtual machines with existing operating systems in a few seconds, and you can also write these directly to a USB key or hard disk:

# virt-builder fedora-20 -o /dev/sdX

What you’ve not been able to do is create a bootable CD-ROM or ISO image.

For that I was using the awful livecd-creator program. This needs root and is incredibly fragile. You can have a kickstart that works one day, but not the next, and requires massive hacks to get working … which is the exact reason why I set off to find out how to make virt-builder create ISOs.

Read-only

The background as to why this is difficult: CDs are not writable.

You can take all the files from a Fedora guest built by virt-builder and turn them into an ISO, and put ISOLINUX on it but such a guest would not be able to boot, or at least, it would fail the first time it tried to write to the disk. One day overlayfs (which just went upstream a few days ago) will solve this, but until that is widely available in upstream kernels, we’re going to need something that creates a writable overlay at boot time.

Boot Time

I have chosen dracut (another tool I have a love/hate, mainly hate, relationship with), which has a useful module called dmsquash-live. This implements the boot side of making a live CD writable, for Fedora and RHEL. It’s what livecd-creator uses.

dmsquash-live demands a very particular ISO layout, but it wasn’t hard to reverse engineer it by reading the code carefully and a lot of trial and error.

It requires that we have a filesystem containing a squashfs in a particular location on the CD:

/LiveOS/squashfs.img

That squashfs has to contain inside it a disk image with this precise name:

/LiveOS/rootfs.img

and the disk image is the root filesystem.

The Script

The script below creates all of this, and effectively replaces livecd-creator with something manageable that doesn’t require root, and is only 100 lines of shell (take that OO/Python!)

Update: Kashyap notes that the script will fail if you’re using tmp-on-tmpfs, so you might need to disable that or modify the script to use /var/tmp instead.

Once you’ve run the script you can try booting the image using:

$ qemu-kvm -m 2048 -cdrom boot.iso -boot d

The Future

One improvement to this script would be to remove the dependency on dmsquash-live. We don’t need the baroque complexity of this script, and could write a custom dracut module (perhaps even, a tiny self-contained initramfs) which would do what we need. It could even use overlayfs to simplify things greatly.

#!/bin/bash -

set -e

# Make bootable ISO from virt-builder
# image.
#
# This requires the Fedora
# squashfs/rootfs machinery.  See:
# /lib/dracut/modules.d/90dmsquash-live/dmsquash-live-root.sh

cd /tmp

# Build the regular disk image, but also
# build a special initramfs which has
# the dmsquash-live & pollcdrom modules
# enabled.  We also need to kill SELinux
# relabelling, and hence SELinux.
cat > postinstall <<'EOF'
#!/bin/bash -
version=` rpm -q kernel | sort -rV | head -1 | sed 's/kernel-//' `
echo installed kernel version: $version
dracut --no-hostonly --add "dmsquash-live pollcdrom" /boot/initrd0 $version
EOF

virt-builder fedora-20 \
    --install kernel \
    --root-password password:123456 \
    --edit '/etc/selinux/config:
        s/SELINUX=enforcing/SELINUX=disabled/' \
    --delete /.autorelabel \
    --run postinstall

# Extract the root filesystem (as an ext3/4 disk image).
guestfish --progress-bars --ro -a fedora-20.img -i \
    download /dev/sda3 rootfs.img

# Update /etc/fstab in the rootfs (but NOT in the original guest)
# so it works for the CD
virt-customize -a rootfs.img \
  --write '/etc/fstab:/dev/root / ext4 defaults 1 1'

# Turn the rootfs.img into a squashfs
# which must contain the layout
# /LiveOS/rootfs.img
rm -rf CDroot
rm -f squashfs.img
mkdir -p CDroot/LiveOS
mv rootfs.img CDroot/LiveOS
mksquashfs CDroot squashfs.img

# Create the CD layout.
rm -rf CDroot
mkdir -p CDroot/LiveOS

cp squashfs.img CDroot/LiveOS/

mkdir CDroot/isolinux

# Get the kernel (only) from the disk
# image.
pushd CDroot/isolinux
virt-builder --get-kernel ../../fedora-20.img
mv vmlinuz* vmlinuz0
rm init*
popd

# Get the special initrd that we built
# above.
guestfish --ro -a fedora-20.img -i \
    download /boot/initrd0 CDroot/isolinux/initrd0

# ISOLINUX configuration.
cat > CDroot/isolinux/isolinux.cfg <<EOF
prompt 1
default 1
label 1
    kernel vmlinuz0
    append initrd=initrd0 rd.live.image root=CDLABEL=boot rootfstype=auto rd.live.debug console=tty0 rd_NO_PLYMOUTH
EOF

# Rest of ISOLINUX installation.
cp /usr/share/syslinux/isolinux.bin CDroot/isolinux/
cp /usr/share/syslinux/ldlinux.c32 CDroot/isolinux/
cp /usr/share/syslinux/libcom32.c32 CDroot/isolinux/
cp /usr/share/syslinux/libutil.c32 CDroot/isolinux/
cp /usr/share/syslinux/vesamenu.c32 CDroot/isolinux/

# Create the ISO.
rm -f boot.iso
mkisofs -o boot.iso \
    -J -r \
    -V boot \
   -b isolinux/isolinux.bin -c isolinux/boot.cat \
   -no-emul-boot -boot-load-size 4 -boot-info-table \
   CDroot

4 Comments

Filed under Uncategorized

nbdkit now supports cURL — HTTP, FTP, and SSH connections

nbdkit is a liberally licensed NBD (Network Block Device) server designed to let you connect all sorts of crazy disk images sources (like Amazon, Glance, VMware VDDK) to the universal network protocol for sharing disk images: NBD.

New in nbdkit 1.1.8: cURL support. This lets you turn any HTTP, FTP, TFTP or SSH server that hosts a disk image into an NBD server.

For example:

$ nbdkit -r curl url=http://onuma/scratch/boot.iso

and then you can read the disk image using guestfish, qemu or any other nbd client:

$ guestfish --ro -a nbd://localhost -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

/dev/sda mounted on /

><fs> _

If you are using a normal SSH server like OpenSSH which supports the SSH File Transfer Protocol (aka SFTP), then you can use SFTP to access images:

$ nbdkit -r curl url=sftp://rjones@localhost/~/fedora-20.img

I’m hoping to enable write support in a future version.

It doesn’t work at the moment because I haven’t worked out how to switch between read (GET) and write (POST) requests in a single cURL handle. Perhaps I need to use two handles? The documentation is confusing.

2 Comments

Filed under Uncategorized

New in libguestfs: virt-log

In libguestfs ≥ 1.27.17, there’s a new tool called virt-log for displaying the log files from a disk image or virtual machine:

$ virt-log -a disk.img | less

Previously you could write:

$ virt-cat -a disk.img /var/log/messages

That worked for some Linux guests, but several things happened:

Virt-log is designed to do the right thing automatically (although at the moment Windows support is not finished). In particular it will automatically decode and display the systemd journal, and it knows the different locations that some Linux distros store their plain text log files.

4 Comments

Filed under Uncategorized

Caseless virtualization cluster, part 4

AMD supports nested virtualization a bit more reliably than Intel, which was one of the reasons to go for AMD processors in my virtualization cluster. (The other reason is they are much cheaper)

But how well does it perform? Not too badly as it happens.

I tested this by creating a Fedora 20 guest (the L1 guest). I could create a nested (L2) guest inside that, but a simpler way is to use guestfish to carry out some baseline performance measurements. Since libguestfs is creating a short-lived KVM appliance, it benefits from hardware virt acceleration when available. And since libguestfs ≥ 1.26, there is a new option that lets you force software emulation so you can easily test the effect with & without hardware acceleration.

L1 performance

Let’s start on the host (L0), measuring L1 performance. Note that you have to run the commands shown at least twice, both because supermin will build and cache the appliance first time and because it’s a fairer test of hardware acceleration if everything is cached in memory.

This AMD hardware turns out to be pretty good:

$ time guestfish -a /dev/null run
real	0m2.585s

(2.6 seconds is the time taken to launch a virtual machine, all its userspace and a daemon, then shut it down. I’m using libvirt to manage the appliance).

Forcing software emulation (disabling hardware acceleration):

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.995s

L2 performance

Inside the L1 Fedora guest, we run the same tests. Note this is testing L2 performance (the libguestfs appliance running on top of an L1 guest), ie. nested virt:

$ time guestfish -a /dev/null run
real	0m5.750s

Forcing software emulation:

$ time LIBGUESTFS_BACKEND_SETTINGS=force_tcg guestfish -a /dev/null run
real	0m9.949s

Conclusions

These are just some simple tests. I’ll be doing something more comprehensive later. However:

  1. First level hardware virtualization performance on these AMD chips is excellent.
  2. Nested virt is about 40% of non-nested speed.
  3. TCG performance is slower as expected, but shows that hardware virt is being used and is beneficial even in the nested case.

Other data

The host has 8 cores and 16 GB of RAM. /proc/cpuinfo for one of the host cores is:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD FX(tm)-8320 Eight-Core Processor
stepping	: 0
microcode	: 0x6000822
cpu MHz		: 1400.000
cache size	: 2048 KB
physical id	: 0
siblings	: 8
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core perfctr_nb arat cpb hw_pstate npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold bmi1
bogomips	: 7031.39
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 48 bits physical, 48 bits virtual
power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

The L1 guest has 1 vCPU and 4 GB of RAM. /proc/cpuinfo in the guest:

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 21
model		: 2
model name	: AMD Opteron 63xx class CPU
stepping	: 0
microcode	: 0x1000065
cpu MHz		: 3515.548
cache size	: 512 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx pdpe1gb lm rep_good nopl extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c hypervisor lahf_lm svm abm sse4a misalignsse 3dnowprefetch xop fma4 tbm arat
bogomips	: 7031.09
TLB size	: 1024 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management:

Update

As part of the discussion in the comments about whether this has 4 or 8 physical cores, here is the lstopo output:

lstopo

9 Comments

Filed under Uncategorized

Transactions with guestfish

I was asked a few days ago if libguestfs has a way to apply a group of changes to an image together. The question was really about transaction support — applying a group of changes and then committing them or doing a rollback, with the final image either containing all the changes or none of them.

Although libguestfs doesn’t support this, you can do it using libguestfs and the qemu-img tool together. This post shows you how.

First I use virt-builder to quickly get a test image that I can play with:

$ virt-builder fedora-20

We create an overlay which will store the changes until we decide to commit or rollback:

$ qemu-img create -f qcow2 -b fedora-20.img overlay.img

Now open the overlay and make your changes:

$ guestfish -a overlay.img -i

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Fedora release 20 (Heisenbug)
/dev/sda3 mounted on /
/dev/sda1 mounted on /boot

><fs> write-append /etc/issue.net \
    "THIS IS A CHANGE TO ISSUE.NET\n"
><fs> cat /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET
><fs> exit

The base image (fedora-20.img) is untouched, and the overlay contains the changes we made:

$ virt-cat -a fedora-20.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
$ virt-cat -a overlay.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET

Rollback

Rollback is pretty simple!

$ rm overlay.img

Commit

The more interesting one is how to commit the changes back to the original file. Using qemu-img you just do:

$ qemu-img commit overlay.img
Image committed.
$ rm overlay.img

The changes are now contained in the original image file:

$ virt-cat -a fedora-20.img /etc/issue.net
Fedora release 20 (Heisenbug)
Kernel \r on an \m (\l)
THIS IS A CHANGE TO ISSUE.NET

ACID

Have we discovered the ACID properties of disk images? Not quite.

Although the change is atomic (A)1, the disk image is consistent (C) before and after the change, and the change is durable (D)2, the final property is not satisfied.

There is no isolation (I). Because it is infeasible to resolve conflicts at the block layer where qemu-img operates, it would be guaranteed corruption if you tried this technique in parallel on the same disk image. The only way to make it work reliably is to serialize every operation on the disk image with a mutex.

1 The change is only atomic if you don’t look at the backing file for the short time that qemu-img commit runs.

2 Strictly speaking, you must call sync or fsync after the qemu-img commit in order for the change to be durable.

Leave a comment

Filed under Uncategorized

New in nbdkit: Run nbdkit as a captive process

New in nbdkit ≥ 1.1.6, you can run nbdkit as a “captive process” under external programs like qemu or guestfish. This means that nbdkit runs for as long as qemu/guestfish is running, and when they exit it cleans up and exits too.

Here is a rather involved way to boot a Fedora 20 guest:

$ virt-builder fedora-20
$ nbdkit file file=fedora-20.img \
    --run 'qemu-kvm -m 1024 -drive file=$nbd,if=virtio'

The --run parameter is what tells nbdkit to run as a captive under qemu-kvm. $nbd on the qemu command line is substituted automatically with the right nbd: URL for the port or socket that nbdkit listens on. As soon as qemu-kvm exits, nbdkit is killed and cleaned up.

Here is another example using guestfish:

$ nbdkit file file=fedora-20.img \
    --run 'guestfish --format=raw -a $nbd -i'

Welcome to guestfish, the guest filesystem shell for
editing virtual machine filesystems and disk images.

Type: 'help' for help on commands
      'man' to read the manual
      'quit' to quit the shell

Operating system: Fedora release 20 (Heisenbug)
/dev/sda3 mounted on /
/dev/sda1 mounted on /boot

><fs>

The main use for this is not to run the nbdkit file plugin like this, but in conjunction with perl and python plugins, to let people easily open and edit OpenStack Glance/Cinder and other unconventional disk images.

2 Comments

Filed under Uncategorized