How many disks can you add to a (virtual) Linux machine?

><rescue> ls -l /dev/sd[tab]
Display all 4001 possibilities? (y or n)

Just how many virtual hard drives is it practical to add to a Linux VM using qemu/KVM? I tried to find out. I started by modifying virt-rescue to raise the limit on the number of scratch disks that can be added¹: virt-rescue --scratch=4000

I hit some interesting limits in our toolchain along the way.

256

256 is the maximum number of virtio-scsi disks in unpatched virt-rescue / libguestfs. A single virtio-scsi controller supports 256 targets, with up to 16384 SCSI logical units (LUNs) per target. We were assigning one disk per target, and giving them all unit number 0, so of course we couldn’t add more than 256 drives, but virtio-scsi supports very many more. In theory each virtio-scsi controller could support 256 x 16,384 = 4,194,304 drives. You can even add more than one controller to a guest.

About 490-500

At around 490-500 disks, any monitoring tools which are using libvirt to collect disk statistics from your VMs will crash (https://bugzilla.redhat.com/show_bug.cgi?id=1440683).

About 1000

qemu uses one file descriptor per disk (maybe two per disk if you are using ioeventfd). qemu quickly hits the default open file limit of 1024 (ulimit -n). You can raise this to something much larger by creating this file:

$ cat /etc/security/limits.d/99-local.conf
# So we can run qemu with many disks.
rjones - nofile 65536

It’s called /etc/security for a reason, so you should be careful adjusting settings here except on test machines.

About 4000

The Linux guest kernel uses quite a lot of memory simply enumerating each SCSI drive. My default guest had 512 MB of RAM (no swap), and ran out of memory and panicked when I tried to add 4000 disks. The solution was to increase guest RAM to 8 GB for the remainder of the test.

Booting with 4000 disks took 10 minutes² and free shows about a gigabyte of memory disappears:

><rescue> free -m
              total        used        free      shared  buff/cache   available
Mem:           7964         104        6945          15         914        7038
Swap:             0           0           0

What was also surprising is that increasing the number of virtual CPUs from 1 to 16 made no difference to the boot time (in fact it was a bit slower). So even though SCSI LUN probing is not deterministic, it appears that it is not running in parallel either.

About 8000

If you’re using libvirt to manage the guest, it will fail at around 8000 disks because the XML document describing the guest is too large to transfer over libvirt’s internal client to daemon connection (https://bugzilla.redhat.com/show_bug.cgi?id=1443066). For the remainder of the test I instructed virt-rescue to run qemu directly.

My guest with 8000 disks took 77 minutes to boot. About 1.9 GB of RAM was missing, and my ballpark estimate is that each extra drive takes about 200KB of kernel memory.

Between 10,000 and 11,000

We pass the list of drives to qemu on the command line, with each disk taking perhaps 180 bytes to express. Somewhere between 10,000 and 11,000 disks, this long command line fails with:

qemu-system-x86_64: Argument list too long

To be continued …

So that’s the end of my testing, for now. I managed to create a guest with 10,000 drives, but I was hoping to explore what happens when you add more than 18278 drives since some parts of the kernel or userspace stack may not be quite ready for that.

Continue to part 2 …

Notes

¹That command will not work with the virt-rescue program found in most Linux distros. I have had to patch it extensively and those patches aren’t yet upstream.

²Note that the uptime command within the guest is not an accurate way to measure the boot time when dealing with large numbers of disks, because it doesn’t include the time taken by the BIOS which has to scan the disks too. To measure boot times, use the wallclock time from launching qemu.

Thanks: Paolo Bonzini

Edit: 2015 KVM Forum talk about KVM’s limits.

9 Comments

Filed under Uncategorized

9 responses to “How many disks can you add to a (virtual) Linux machine?

  1. FYI: “You are not authorized to access bug #1440683.” (with a RH account.)

  2. Laszlo Ersek

    Cool research 🙂

    For moving beyond the command line size limit, you could play with “-readconfig” (see the QEMU manual).

    • rich

      Yeah it’s rather undocumented, eg. how does it work, how does escaping work? I ended up reverse engineering it from -writeconfig output.

      Edit: Also: https://bugs.launchpad.net/qemu/+bug/1686364

      • Laszlo Ersek

        Huh, sorry I didn’t respond to your comment; I’m seeing it only now on your website. I expected to get an email notification, as I always request that option when commenting.

        Anyway, I believe you’ve figured it all out (and I couldn’t have told you about the escaping intricacies anyway 🙂 )

  3. Do you know if there are plans to have

    virsh migrate –offline –persistent e-devel qemu+ssh://kvm2/system

    also have an option to copy the disks? Right now it expects shared storage.

    virsh can do it with live migration, so offline should be an even simpler case?

    A problem with the above command is that it doesn’t really “migrate” it just copies. So if the destination host have an older CPU, then the guest won’t start. One have to modify the guest xml, which should be part of a migration I’d say.

    Do you have any insights of this?

  4. Pingback: How many disks can you add to a (virtual) Linux machine? – LUG Mureş

  5. Pingback: How many disks can you add to a (virtual) Linux machine? (contd) | Richard WM Jones

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.