Tag Archives: libguestfs

Great new changes coming to nbdkit

Eric Blake has been doing some great stuff for nbdkit, the flexible plugin-based NBD server.

  • Full parallel request handling.
    You’ve always been able to tell nbdkit that your plugin can handle multiple requests in parallel from a single client, but until now that didn’t actually do anything (only parallel requests from multiple clients worked).
  • An NBD forwarding plugin, so if you have another NBD server which doesn’t support a feature like encryption or new-style protocol, then you can front that server with nbdkit which does.

As well as that he’s fixed lots of small bugs with NBD compliance so hopefully we’re now much closer to the protocol spec (we always check that we interoperate with qemu’s nbd client, but it’s nice to know that we’re also complying with the spec). He also fixed a potential DoS where nbdkit would try to handle very large writes which would delay a thread in the server indefinitely.


Also this week, I wrote an nbdkit plugin for handling the weird Xen XVA file format. The whole thread is worth reading because 3 people came up with 3 unique solutions to this problem.

Advertisements

Leave a comment

Filed under Uncategorized

Fedora 27 virt-builder images

Fedora 27 has just been released, and I’ve just uploaded virt-builder images so you can try it right away:

$ virt-builder -l | grep fedora-27
fedora-27                aarch64    Fedora® 27 Server (aarch64)
fedora-27                armv7l     Fedora® 27 Server (armv7l)
fedora-27                i686       Fedora® 27 Server (i686)
fedora-27                ppc64      Fedora® 27 Server (ppc64)
fedora-27                ppc64le    Fedora® 27 Server (ppc64le)
fedora-27                x86_64     Fedora® 27 Server
$ virt-builder fedora-27 \
      --root-password password:123456 \
      --install emacs \
      --selinux-relabel \
      --size 30G
$ qemu-system-x86_64 \
      -machine accel=kvm:tcg \
      -cpu host -m 2048 \
      -drive file=fedora-27.img,format=raw,if=virtio &

Leave a comment

Filed under Uncategorized

Tip: Changing the qemu product name in libguestfs

20:30 < koike> Hi. Is it possible to configure the dmi codes for libguestfs? I mean, I am running cloud-init inside a libguestfs session (through python-guestfs) in GCE, the problem is that cloud-init reads /sys/class/dmi/id/product_name to determine if the machine is a GCE machine, but the value it read is Standard PC (i440FX + PIIX, 1996) instead of the expected Google Compute Engine so cloud-init fails.

The answer is yes, using the guestfs_config API that lets you set arbitrary qemu parameters:

g.config('-smbios',
         'type=1,product=Google Compute Engine')

Leave a comment

Filed under Uncategorized

Fedora 26 is out, virt-builder images available

Fedora 26 is released today. virt-builder images are already available for almost all architectures:

$ virt-builder -l | grep fedora-26
fedora-26                aarch64    Fedora® 26 Server (aarch64)
fedora-26                armv7l     Fedora® 26 Server (armv7l)
fedora-26                i686       Fedora® 26 Server (i686)
fedora-26                ppc64      Fedora® 26 Server (ppc64)
fedora-26                ppc64le    Fedora® 26 Server (ppc64le)
fedora-26                x86_64     Fedora® 26 Server

For example:

$ virt-builder fedora-26
$ qemu-system-x86_64 -machine accel=kvm:tcg -cpu host -m 2048 \
    -drive file=fedora-26.img,format=raw,if=virtio

Why not s390x? That’s because qemu doesn’t yet emulate enough of the s390x instruction set / architecture so that we can run Fedora under TCG emulation.

Leave a comment

Filed under Uncategorized

virt-builder Debian 9 image available

Debian 9 (“Stretch”) was released last week and now it’s available in virt-builder, the fast way to build virtual machine disk images:

$ virt-builder -l | grep debian
debian-6                 x86_64     Debian 6 (Squeeze)
debian-7                 sparc64    Debian 7 (Wheezy) (sparc64)
debian-7                 x86_64     Debian 7 (Wheezy)
debian-8                 x86_64     Debian 8 (Jessie)
debian-9                 x86_64     Debian 9 (stretch)

$ virt-builder debian-9 \
    --root-password password:123456
[   0.5] Downloading: http://libguestfs.org/download/builder/debian-9.xz
[   1.2] Planning how to build this image
[   1.2] Uncompressing
[   5.5] Opening the new disk
[  15.4] Setting a random seed
virt-builder: warning: random seed could not be set for this type of guest
[  15.4] Setting passwords
[  16.7] Finishing off
                   Output file: debian-9.img
                   Output size: 6.0G
                 Output format: raw
            Total usable space: 3.9G
                    Free space: 3.1G (78%)

$ qemu-system-x86_64 \
    -machine accel=kvm:tcg -cpu host -m 2048 \
    -drive file=debian-9.img,format=raw,if=virtio \
    -serial stdio

4 Comments

Filed under Uncategorized

New in libguestfs: Rewriting bits of the daemon in OCaml

libguestfs is a C library for creating and editing disk images. In the most common (but not the only) configuration, it uses KVM to sandbox access to disk images. The C library talks to a separate daemon running inside a KVM appliance, as in this Unicode-art diagram taken from the fine manual:

 ┌───────────────────┐
 │ main program      │
 │                   │
 │                   │           child process / appliance
 │                   │          ┌──────────────────────────┐
 │                   │          │ qemu                     │
 ├───────────────────┤   RPC    │      ┌─────────────────┐ │
 │ libguestfs  ◀╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍╍▶ guestfsd        │ │
 │                   │          │      ├─────────────────┤ │
 └───────────────────┘          │      │ Linux kernel    │ │
                                │      └────────┬────────┘ │
                                └───────────────│──────────┘
                                                │
                                                │ virtio-scsi
                                         ┌──────┴──────┐
                                         │  Device or  │
                                         │  disk image │
                                         └─────────────┘

The library has to be written in C because it needs to be linked to any main program. The daemon (guestfsd in the diagram) is also written in C. But there’s not so much a specific reason for that, except that’s what we did historically.

The daemon is essentially a big pile of functions, most corresponding to a libguestfs API. Writing the daemon in C is painful to say the least. Because it’s a long-running process running in a memory-constrained environment, we have to be very careful about memory management, religiously checking every return from malloc, strdup etc., making even the simplest task non-trivial and full of untested code paths.

So last week I modified libguestfs so you can now write APIs in OCaml if you want to. OCaml is a high level language that compiles down to object files, and it’s entirely possible to link the daemon from a mix of C object files and OCaml object files. Another advantage of OCaml is that you can call from C ↔ OCaml with relatively little glue code (although a disadvantage is that you still need to write that glue mostly by hand). Most simple calls turn into direct CALL instructions with just a simple bitshift required to convert between ints and bools on the C and OCaml sides. More complex calls passing strings and structures are not too difficult either.

OCaml also turns memory errors into a single exception, which unwinds the stack cleanly, so we don’t litter the code with memory handling. We can still run the mixed C/OCaml binary under valgrind.

Code gets quite a bit shorter. For example the case_sensitive_path API — all string handling and directory lookups — goes from 183 lines of C code to 56 lines of OCaml code (and much easier to understand too).

I’m reimplementing a few APIs in OCaml, but the plan is definitely not to convert them all. I think we’ll have C and OCaml APIs in the daemon for a very long time to come.

Leave a comment

Filed under Uncategorized

How many disks can you add to a (virtual) Linux machine? (contd)

In my last post I tried to see what happens when you add thousands of virtio-scsi disks to a Linux virtual machine. Above 10,000 disks the qemu command line grew too long for the host to handle. Several people pointed out that I could use the qemu -readconfig parameter to read the disks from a file. So I modified libguestfs to allow that. What will be the next limit?

18,278

Linux uses a strange scheme for naming disks which I’ve covered before on this blog. In brief, disks are named /dev/sda through /dev/sdz, then /dev/sdaa through /dev/sdzz, and after 18,278 drives we reach /dev/sdzzz. What’s special about zzz? Nothing really, but historically Linux device drivers would fail after this, although that is not a problem for modern Linux.

20,000

In any case I created a Linux guest with 20,000 drives with no problem except for the enormous boot time: It was over 12 hours at which point I killed it. Most of the time was being spent in:

-   72.62%    71.30%  qemu-system-x86  qemu-system-x86_64  [.] drive_get
   - 72.62% drive_get
      - 1.26% __irqentry_text_start
         - 1.23% smp_apic_timer_interrupt
            - 1.00% local_apic_timer_interrupt
               - 1.00% hrtimer_interrupt
                  - 0.82% __hrtimer_run_queues
                       0.53% tick_sched_timer

Drives are stored inside qemu on a linked list, and the drive_get function iterates over this linked list, so of course everything is extremely slow when this list grows long.

QEMU bug filed: https://bugs.launchpad.net/qemu/+bug/1686980

Edit: Dan Berrange posted a hack which gets me past this problem, so now I can add 20,000 disks.

The guest boots fine, albeit taking about 30 minutes (and udev hasn’t completed device node creation in that time, it’s still going on in the background).

><rescue> ls -l /dev/sd[Tab]
Display all 20001 possibilities? (y or n)
><rescue> mount
/dev/sdacog on / type ext2 (rw,noatime,block_validity,barrier,user_xattr,acl)

As you can see the modern Linux kernel and userspace handles “four letter” drive names like a champ.

Over 30,000

I managed to create a guest with 30,000 drives. I had to give the guest 50 GB (yes, not a mistake) of RAM to get this far. With less RAM, disk probing fails with:

scsi_alloc_sdev: Allocation failure during SCSI scanning, some SCSI devices might not be configured

I’d seen SCSI probing run out of memory before, and I made a back-of-the-envelope calculation that each disk consumed 200 KB of RAM. However that cannot be correct — there must be a non-linear relationship between number of disks and RAM used by the kernel.

Because my development machine simply doesn’t have enough RAM to go further, I wasn’t able to add more than 30,000 drives, so that’s where we have to end this little experiment, at least for the time being.

><rescue> ls -l /dev/sd???? | tail
brw------- 1 root root  66, 30064 Apr 28 19:35 /dev/sdarin
brw------- 1 root root  66, 30080 Apr 28 19:35 /dev/sdario
brw------- 1 root root  66, 30096 Apr 28 19:35 /dev/sdarip
brw------- 1 root root  66, 30112 Apr 28 19:35 /dev/sdariq
brw------- 1 root root  66, 30128 Apr 28 19:35 /dev/sdarir
brw------- 1 root root  66, 30144 Apr 28 19:35 /dev/sdaris
brw------- 1 root root  66, 30160 Apr 28 19:35 /dev/sdarit
brw------- 1 root root  66, 30176 Apr 28 19:24 /dev/sdariu
brw------- 1 root root  66, 30192 Apr 28 19:22 /dev/sdariv
brw------- 1 root root  67, 29952 Apr 28 19:35 /dev/sdariw

3 Comments

Filed under Uncategorized