Tag Archives: grub

Odd/scary RHEL 5 bug

Yesterday my colleague gave me a RHEL 5 VM disk image which failed to boot after converting it using the latest virt-v2v.  Because it booted before conversion but not afterwards, the fingers naturally pointed at something that we were doing during the conversion process. Which is not unusual as v2v conversion is highly complex.

Screenshot_xen-pv-rhel5.8-x86_64
The “GRUB _” prompt after conversion

The thing is that we don’t reinstall grub during conversion, but we do edit a few grub configuration files. Could editing grub configuration cause this error?

I wanted to understand what the grub-legacy “GRUB _” prompt means. There are lots and lots and lots of people reporting this bug (eg), but as is often the case I could find no coherent explanation anywhere of what grub-legacy means when it gets into this state. Lots of the blind leading the blind, and random suggestions about how people had rescued such machines (probably coincidentally), but no hard data anywhere. So I had to go back to first principles and debug qemu to find out what’s happening just before the message is printed.

Tip: To breakpoint qemu when the Master Boot Record (first sector) is loaded, do:

target remote tcp::1234
set architecture i8086
b *0x7c00
cont

After an evening of debugging, I found that it’s the first sector (known in grub-legacy as “stage 1”) which prints the GRUB<space> message. (The same happens to be true of grub2). The stage 1 boot sector has, written into it at a fixed offset, the location of the /boot/grub/stage2 file, ie. the literal disk start sector and length of this file. It sends BIOS int $0x13 commands to load those sectors into memory at address 0x8000, and jumps there to start the stage 2 of grub. The boot sector is 512 bytes, so there’s no luxury to do anything except print 5 characters. It’s after the stage2 file has been loaded when all the nice graphical stuff happens.

Unfortunately in the image after conversion, the stage2 data loaded into memory was all zeroes, and that’s why the boot fails and you see GRUB<space><cursor> and then the VM crashes.

The mystery was how conversion could be changing the location of the /boot/grub/stage2 file so that it could no longer be loaded at the fixed offset encoded in the boot sector.

This morning it dawned on me what was really happening …

The new virt-v2v tries very hard to avoid copying any unused data from the guest, just to save time. No point wasting time copying deleted files and empty space. This makes virt-v2v very fast, but it has an unusual side-effect: If a file is deleted on the source, the contents of the file are not copied over to the target, and turn into zeroes.

It turns out if you take the source disk image and simply zero all of the empty space in /boot, then the source doesn’t boot either, even though virt-v2v is not involved. Yikes … this could be a bug in RHEL 5. Grub is generating a bootloader that references a deleted file.

This is where we are right now with this bug. It appears that a valid sequence of steps can make a RHEL 5 bootloader that references a deleted file, but still works as long as you never overwrite the sectors used by that file.

I have written a simple test script that you can download to find out if your RHEL ≤ 6 virtual machines could be affected by this problem. I’m interested if anyone else sees this. I ran the test over a selection of RHEL 3 – 5 guests, and could not find any which had the problem, but my collection is not very extensive, and there are likely to be common modes in how they were created.

The next steps will likely be to test a lot more RHEL 5 installs to see if this bug is really common or a strange one-off. I will also probably add a workaround to virt-v2v so it doesn’t trim the boot partition — the reason is that we cannot go back and fix old RHEL 5 installs, we have to work with them if they are broken. If it turns out to be a real bug in RHEL 5 then we will need to issue a fix for that.

Advertisements

3 Comments

Filed under Uncategorized

New in libguestfs: Use SYSLINUX or EXTLINUX to make bootable guests

Although grub support in libguestfs is currently on hold because of an unfortunate situation, the latest libguestfs now supports SYSLINUX and EXTLINUX, which is (let’s be frank about this) a much simpler and more sane bootloader than grub/grub2.

In fact, you can make a bootable Linux guest real easily now. Here’s a script:

#!/usr/bin/perl
# Copyright (C) 2013 Red Hat Inc.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.

# This ambitious script creates a complete, bootable guest.

use strict;
use warnings;

use Sys::Guestfs;

my $disk = "syslinux-guest.img";

# Find prerequisites.
my $mbr = "/usr/share/syslinux/mbr.bin";
unless (-f $mbr) {
    $mbr = "/usr/lib/syslinux/mbr.bin";
    unless (-f $mbr) {
        die "$0: mbr.bin (from SYSLINUX) not found\n";
    }
}
print "mbr: $mbr\n";

my $mbr_data;
{
    local $/ = undef;
    open MBR, "$mbr" or die "$mbr: $!";
    $mbr_data = <MBR>;
}
die "invalid mbr.bin" unless length ($mbr_data) == 440;

my $kernel = `ls -1rv /boot/vmlinuz* | head -1`;
chomp $kernel;
unless ($kernel) {
    die "$0: kernel could not be found\n";
}
print "kernel: $kernel\n";

print "writing to: $disk ...\n";

# Create the disk.
unlink "$disk";
open DISK, ">$disk" or die "$disk: $!";
truncate DISK, 100*1024*1024;
close DISK;

my $g = Sys::Guestfs->new ();
$g->add_drive ($disk, format => "raw");
$g->launch ();

unless ($g->feature_available (["syslinux"])) {
    die "$0: 'syslinux' feature not available in this version of libguestfs\n";
}

# Format the disk.
$g->part_disk ("/dev/sda", "mbr");
$g->mkfs ("msdos", "/dev/sda1");
$g->mount ("/dev/sda1", "/");

# Install the kernel.
$g->upload ($kernel, "/vmlinuz");

# Install the SYSLINUX configuration file.
$g->write ("/syslinux.cfg", <<_END);
DEFAULT linux
LABEL linux
  SAY Booting the kernel from /vmlinuz
  KERNEL vmlinuz
  APPEND ro root=/dev/sda1
_END

$g->umount_all ();

# Install the bootloader.
$g->pwrite_device ("/dev/sda", $mbr_data, 0);
$g->syslinux ("/dev/sda1");
$g->part_set_bootable ("/dev/sda", 1, 1);

# Finish off.
$g->shutdown ();

After running the script, you can try booting the minimal “guest” (note it only contains a kernel, not any userspace):

$ qemu-kvm -hda syslinux-guest.img

1 Comment

Filed under Uncategorized

Tip: VM won’t boot, troubleshoot with guestfish

Unbootable virtual machine? Here are three useful guestfish commands to help. (You can also consider using virt-rescue).

1. Edit /boot/grub/grub.conf

$ guestfish -i Rawhide

Welcome to guestfish, the libguestfs filesystem interactive shell for
editing virtual machine filesystems.

Type: 'help' for help with commands
      'quit' to quit the shell

><fs> ls /boot/
System.map-2.6.32.1-9.fc13.x86_64
System.map-2.6.32.3-21.fc13.x86_64
System.map-2.6.33-0.40.rc7.git0.fc13.x86_64
config-2.6.32.1-9.fc13.x86_64
config-2.6.32.3-21.fc13.x86_64
config-2.6.33-0.40.rc7.git0.fc13.x86_64
[...]

Use the “edit”, “emacs” or “vi” commands to edit grub.conf:

><fs> vi /boot/grub/grub.conf

From here you can change the boot kernel, change it to boot in single user mode, enable the grub menu, remove the “rhgb quiet” option so you can see boot messages, and much more.

2. Look at the /init script

When the kernel panics because it cannot mount root, it’s often because the initrd or initramfs is broken in some way. Two commands help here:

><fs> initrd-list /boot/initramfs-2.6.33-0.40.rc7.git0.fc13.x86_64.img | less
><fs> initrd-cat /boot/initramfs-2.6.33-0.40.rc7.git0.fc13.x86_64.img init | less

The first command lists all the files in the initrd, which lets you see if the right drivers got included for the (virtual) hardware. The second command lists out the init script — which is the shell script that runs first before the OS proper starts to boot.

3 Comments

Filed under Uncategorized