Tag Archives: linux

RHEV-M 3.0 beta part 1

To get access to the RHEV-M 3.0 beta, you must have an active Red Hat Enterprise Virtualization subscription. Go to this RHN page to see links to the beta channels. See this page for discussion around the beta. There is also a Webinar taking place today (18th August). Finally here is the official announcement.

I’m getting ready to install RHEV-M 3.0 beta, and that starts with buying some cheap hardware.

RHEV-M requires two physical servers, one running our minimal hypervisor RHEV-H and one running the management console. Starting with RHEV-M 3.0 the management console runs on Linux [PDF] (you can still run it on Windows if you want). The management console can be run in a VM, but it can’t unfortunately be run in a VM on top of RHEV-H because there’s a chicken-and-egg problem that the management console needs to talk to RHEV-H to instruct it to start VMs.

I’m doing this on the cheap, so the hardware I’ve ordered is not the recommended way. Performance is expected to be fairly abysmal.

I ordered two HP Proliant Microservers, and upgrades to the RAM and disks.

2 x HP microservers
@£250 each inc tax/delivery
£500
2 x 1 TB Samsung HD103SJ
@£44.80 each inc tax/delivery
£89.60
2 x 8 GB RAM
@£67.99 each + £27.20 tax, delivery included
£163.18
Total £752.78

HP have extended the cashback offer on these servers through August 2011, so I should be able to claim £200 back.

18 Comments

Filed under Uncategorized

libguestfs build — an open ended problem

librarian made a very true observation (Google translate) about libguestfs. It’s a Swiss army chainsaw, but it’s damn hard to build from source.

With RHEL and Fedora I’ve made it my aim that no one should need to build libguestfs from source, because we offer the highest quality packages with every feature compiled in. I also build Debian and Ubuntu packages when I can and until someone steps up to do that.

But why is libguestfs a difficult package to build?

The primary reason is that we package up, make an API for, and rigorously test, something like 200 different Linux packages. Essentially if you use (say) the guestfs_part_* API then in the background you’re using parted. If you’re using another API, you might be using e2fsck or resize2fs or lvm or grep or file or the kernel or any one of dozens of other programs. And to compound the problem, we don’t just “ship and forget”. We test these programs, and if they break, then we break. Our test suite has about 600 different tests and takes 2 hours to run.

And we test against Fedora Rawhide. The latest and buggiest.

Consequently we hit all the new bugs. Just today I hit a Linux 3.0 bug and another kernel/ftrace bug. Two weeks ago it was a bug in the file command, another bug in udev on Debian, and you can never exclude the possibility of stupidity by Ubuntu kernel maintainers.

It’s routine that I discover qemu, kernel and other bugs for the first time, because often a libguestfs build in Koji is the first build that boots up and runs the new software.

So what’s my point? It would be good if the Fedora kernel and qemu maintainers didn’t just push out a new package, but they tested that one can run inside the other. But while that would improve the situation for me, the real problem is that integrating software is hard, and it’s unfortunate that libguestfs has got into a situation where we are the first people to integrate and run Rawhide.

Leave a Comment

Filed under Uncategorized

My first Linux 3.0 bug?

I discovered that parted fails to run if the Linux kernel
version is 3.0, because parted is expecting the utsname
release string to contain 3 numbers (like 2.6.39).

Linux version 3.0-0.rc1.git0.2.fc16.x86_64 (mockbuild@x86-03.phx2.fedoraproject.org) (gcc version 4.6.0 20110530 (Red Hat 4.6.0-9) (GCC) ) #1 SMP Fri Jun 3 12:47:56 UTC 2011

6 Comments

Filed under Uncategorized

What is sVirt?

There seems to be a lot of confusion about the “sVirt” feature that we added in Fedora 11. Really it’s very simple, and in this posting I hope to explain it in very simple terms.

Let’s start with the problem that sVirt tries to solve: If you run lots of KVM virtual machines on a single host, then probably all those qemu processes are running as the same user. They could all be running as root (very bad!). Better, they might all be running as a separate qemu.qemu user/group. Also, any disks they are using are probably chowned to qemu.qemu too.

The problem is that the interface between the virtual machines and the containing qemu process is very complicated and hacked together in C. It’s very likely that this boundary is full of undiscovered insecurities that allow a user in the virtual machine to take over the qemu process — in other words, to escape from the confinement provided by the hypervisor. (Xen and other hypervisors have similar problems, this is not something that’s special to KVM).

If all your qemu processes are running as the same user, there is literally no protection between the virtual machines if one is compromised like this. Two processes running as the same user can send signals, insert data into each other using ptrace, and lots more — if one qemu process “goes bad”, you’ve lost control of all the other qemu processes on the host. Furthermore, because all the disk images (and other resources) were accessible by the single qemu.qemu user, you’ve also lost all those too.

What’s the solution? Well, sVirt of course. One thing you could do is to run all the qemu processes as different users, but that’s not very convenient because it would mean reserving a block of hundreds of UIDs and GIDs. Step forward SELinux: without needing to reserve anything, you can give each qemu process a different SELinux label. This firstly prevents a compromised qemu from attacking other processes, and also allows you to label the precise set of resources that each process can see — so a compromised qemu can only attack its own disk images.

You can see an example of sVirt labels on this page.

What happens if you turn off SELinux (or use a distribution that doesn’t have libvirt, SELinux or sVirt in the first place)? You’re trusting that huge, hacked together boundary between the VM and the hypervisor to keep you safe. Good luck.

2 Comments

Filed under Uncategorized

Tip: Code for getting DHCP address from a virtual machine disk image

Previously (1) and previously (2) I showed there are many different ways to get the IP address from a virtual machine.

The example below shows one way to use libguestfs and hivex from a C program (virt-dhcp-address) to get the DHCP address that a virtual machine has picked up.

Continue reading

7 Comments

Filed under Uncategorized

“mount -o ro” writes to the disk

mount -o ro ... or the equivalent libguestfs command mount-ro writes to the disk.

This is easy to show. Create a disk image containing an ext3 filesystem and a single file:

$ guestfish -N fs:ext3 -n -m /dev/sda1 touch /hello-world : sync
$ md5sum test1.img
a1f6684a8a04d14f7599902bc0ab4aaa  test1.img

Explanation:

  1. The guestfish -N option creates a prepared disk called test1.img in the current directory.
  2. The guestfish -n option turns off autosync, so the disk will not be cleanly unmounted after the command has finished.
  3. -m /dev/sda1 mounts the prepared filesystem
  4. touch creates a file on the prepared filesystem
  5. sync is needed to write changes without unmounting the filesystem (so it is dirty).
  6. md5sum computes the MD5 hash of the disk we’ve just created

Now let’s open the disk, mount it with the -o ro option, and read the root directory:

$ guestfish -a test1.img run : mount-ro /dev/sda1 / : ll /
total 17
drwxr-xr-x  3 root root  1024 Feb  3 17:49 .
drwxr-xr-x 23  500  500  4096 Feb  3 17:50 ..
-rw-r--r--  1 root root     0 Feb  3 17:49 hello-world
drwx------  2 root root 12288 Feb  3 17:49 lost+found
$ md5sum test1.img
8fab31ef115cb8a6edcbc71db61fcafc  test1.img

Explanation:

  1. -a test1.img adds the prepared disk (test1.img) to the libguestfs appliance, but note, not read-only
  2. run starts the appliance
  3. mount-ro mounts the prepared filesystem with the -o ro flag

Notice the MD5 hash of the disk has changed!

Try repeating the second command and you’ll see that the MD5 hash stays the same.

It appears that because the filesystem is ext3 and was originally dirty, mount -o ro reruns the journal and thus writes to the underlying disk. (It is possible also that it is merely updating the superblock to mark it clean, but in any case the point is that it is writing to the disk).

All of this is not unexpected, but it shows that you must use libguestfs add-drive-ro if you really want to look at a disk image without making any changes to it. That command uses qemu snapshots to ensure that any write operations never make it to the disk, but get discarded in an anonymous snapshot overlay.

If you use the guestfish --ro option then any -a or -d drives are added read-only.

Leave a Comment

Filed under Uncategorized

How are Linux drives named beyond drive 26 (/dev/sdz, ..)?

[Edit: Thanks to adrianmonk for correcting my math]

It’s surprisingly hard to find a definitive answer to the question of what happens with Linux block device names when you get past drive 26 (ie. counting from one, the first disk is /dev/sda and the 26th disk is /dev/sdz, what comes next?) I need to find out because libguestfs is currently limited to 25 disks, and this really needs to be fixed.

Anyhow, looking at the code we can see that it depends on which driver is in use.

For virtio-blk (/dev/vd*) the answer is:

Drive # — Name
1 vda
26 vdz
27 vdaa
28 vdab
52 vdaz
53 vdba
54 vdbb
702 vdzz
703 vdaaa
704 vdaab
18278 vdzzz

Beyond 18278 drives the virtio-blk code would fail, but that’s not currently an issue.

For SATA and SCSI drives under a modern Linux kernel, the same as above applies except that the code to derive names works properly beyond sdzzz up to (in theory) sd followed by 29 z‘s! [Edit: or maybe not?]

As you can see virtio and SCSI/SATA don’t use common code to name disks. In fact there are also many other block devices in the kernel, all using their own naming scheme. Most of these use numbers instead of letters: eg: /dev/loop0, /dev/ram0, /dev/mmcblk0 and so on.

If disks are partitioned, then the partitions are named by adding the partition number on the end (counting from 1). But if the drive name already ends with a number then a letter p is added between the drive name and the partition number, thus: /dev/mmcblk0p1.

4 Comments

Filed under Uncategorized

What can affect a process?

I’ve been following an interesting thread on fedora-devel which set me thinking. What is the complete list of different things that can affect the way your process runs?

Here’s my list below. If you have any other ideas, post a comment and I’ll keep this list updated.

  1. Environment variables. Obviously there are direct effects, like if $PATH is different then you may end up running different sub-processes. But there are more subtle differences like what happens if the environment is too large or completely empty? Also $LD_* variables can make a big difference to what is inside your process.
  2. ulimits. Too small and your program could fail to allocate memory or fail to open a file.
  3. Signal masks. Often overlooked, but I’ve hit this one: If a signal is masked, your program can behave quite differently. There is a famous bug where SIGPIPE was masked in the whole of Fedora, because some early program (login) was using dbus which promiscuously masked the signal, then login was forking every other program with this signal masked.
  4. Program arguments. You could put this in the “too obvious” class if you want, but consider also argv[0] which might affect your program but not be an immediately visible change.
  5. The PID. I have actually seen this: a program (sshd) was trying to create some lock file, something like /var/lock/sshd.<pid> at boot time in order to ensure only one instance was running. It was consistently failing to start sshd at boot. It took me some time to work out that because the boot was exactly predictable (and thus the PID was always the same), it was falling over its own lock file left from last time the machine was shut down.
  6. The file descriptors. Does the program change behaviour if fds 0, 1, 2 (stdin, stdout, stderr) are not open? How about if other open fds are leaked from the parent process?
  7. Current working directory. Affects what files are opened by relative paths. I guess you could include the chroot here too, but that is quite an obvious change.
  8. Number of other processes. This is like an “unofficial” ulimit, since as normally configured Linux will only allow 32766(?) processes (less PID 0 which is reserved and PID 1 for init).
  9. UIDs and GIDs. If these are very large, Bad Things can happen. External utilities like cpio and tar will fail.
  10. SELinux context. (Suggested by David Malcolm) One thing to note is that the SELinux context of a root login can be different from the context of, say, a daemon started at boot.
  11. Wallclock time or other timers. (Alexander E. Patrakov)
  12. Filesystem journalling mode, filesystem type. (see Bruno Wolff’s comment)

That’s all I can think of for now. Post a comment if you can think of any more.

10 Comments

Filed under Uncategorized

Tip: Desktop Effects + more than 4 desktops

“Desktop Effects” (aka compiz) is nice but it reduces the number of virtual desktops to 4. Come on, who can work in just 4 desktops!!?!

I finally worked out there is a way to increase this, although it’s completely non-obvious and not exposed through GNOME (bug 382901).

# yum install ccsm

Run ccsm (CompizConfig Settings Manager) which has the strangest most non-intuitive interface since Blender. Go to General → General Options. Go to the Desktop Size tab. Change Horizontal Virtual Size from 4 to the number of desktops desired.

That’s it. Not clear if you need to save that setting, but I guess I’ll find out when I next log in.

6 Comments

Filed under Uncategorized

Yubikey arrived

It arrived this morning, about 20 hours after I placed the order. I haven’t had a chance to do anything other than plugging it in yet.

usb 1-1.1: new low speed USB device using ehci_hcd and address 5
usb 1-1.1: New USB device found, idVendor=1050, idProduct=0010
usb 1-1.1: New USB device strings: Mfr=1, Product=2, SerialNumber=0
usb 1-1.1: Product: Yubico Yubikey II
usb 1-1.1: Manufacturer: Yubico
input: Yubico Yubico Yubikey II as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.1/1-1.1:1.0/input/input9
generic-usb 0003:1050:0010.0001: input,hidraw0: USB HID v1.11 Keyboard [Yubico Yubico Yubikey II] on usb-0000:00:1a.0-1.1/input0

lsusb output:

Bus 001 Device 005: ID 1050:0010 Yubico.com Yubikey

Pressing the one button on the yubikey sends a string of random letters (as if typed on the keyboard or another USB input device). The string is different each time.

That’s all for now folks!

Leave a Comment

Filed under Uncategorized