Howto fix CentOS 5 VM not booting with a kernel panic

While I have moved on from CentOS 5 a long time ago, on rare occasions I need to access an oldĀ  CentOS 5 VM. Today was such an occasion and it resulted in the VM not booting with a kernel panic. It took a bit of digging to figure out what was going on and how to fix it so I though I share it to save you some time.

The symptoms

When booting the CentOS 5 VM I saw the following messages:

Note the Trying to resume from label=SWAP-vda3 followed by Unable to access resume device (LABEL=SWAP-vda3). Apparently last time this VM was used (running on a CentOS 5 host) it was perhaps paused or saved which is the reason why it now wants to resume from swap. And something seems wrong with the swap space in the VM. It might be corrupt or using this CentOS 5 VM on a CentOS 7 host instead of it’s old CentOS 5 host requires some changes.

So the swap partition and references to it in /etc/fstab need to be checked and fixed if neccessary. At least that’s obvious.

What’s not so obvious is that when a CentOS 5 VM goes into such a state is that the initrd will be recreated with instructions to resume from swap. So not only do you need to fix the swap space, check /etc/fstab and fix any references if required but you also need to recreate the initrd.

Here are the steps to fix both issues:

Boot VM with rescue mode

Boot the VM and choose F12 during boot so you can select PXE. Once the PXE options are available you don’t boot from local harddisk but instead boot a rescue image. I just used a CentOS 6 rescue entry which I always have available when booting with PXE.

The PXE label config looks like this:

And the rescue.ks kickstart file looks like this:

Once you have booted into rescue mode your CentOS 5 VM harddisks should be automagically mounted and you are presented with the option to start a shell, run a diagnostic or reboot. Select Start shell and activate the chroot:

Quick tip: you can now start the SSH server so you can ssh into the VM and do your work from a terminal instead of the VM console.

Check and fix swap

Let’s see what partitions this VM has:

And let’s see what the label of the swap partition is:

And if that fails, try:

So there are references to SWAP-vda3 (where vda refers to a virtio disk) in a VM that only ever had IDE named devices. That does not seem right. Let’s recreate the swap partition with the proper label:

Check and fix /etc/fstab

Let’s see what’s in /etc/fstab:

Note again the reference to the vda3 (virtio) partition while the label of our swap partition is now SWAP-sda3. So let’s fix the swap line in /etc/fstab so it has the proper label:

Optionally install the latest updates

Before recreating the initrd you can optionally install the latest updates if you have not updated the VM recently and feel adventurous. The reason I mention adventurous is that if you run yum update while booted via a rescue image and accessing the VM via chroot and a new kernel is installed, you will probably see the following error during the kernel installation:

The reason for the error is that when booting via a rescue image and accessing the VM via chroot, /dev/root does not point to the root partition (/dev/sda2 in this VM). Unfortunately the scripts that are run when a new kernel is installed, new-kernel-pkg and grubby, can’t handle that situation gracefully so grub.conf is most likely not updated with the new kernel details. If that is the case you will need to manually add an entry for the new kernel to /boot/grub/grub.conf:

Note that hd(0,0) might not apply to your setup and require a different entry.

Recreate the initrd

Go to the /boot directory and check what the version is of the most recent kernel:

So the version is 2.6.18-371.12.1.el5

Next move the old initrd out of the way:

Now let’s recreate the initrd:

Also check if the latest kernel is the default one that gets booted. There should be an entry default=0 in /boot/grub/grub.conf. The 0 means that the first kernel entry (the one at the top with kernel version 2.6.18-371.12.1) will be used.

Before you reboot

Before you reboot, check the hardware details of the VM. In my case the VM required the Disk bus of the harddisk to be IDE. Check the setting in virtual-manager or in the xml config file of the VM and make sure you have the correct settings.

Reboot

Finally reboot the VM to see if it worked. From the console:

sh-3.2# sync
sh-3.2# exit
bash-4.1# exit

And then select reboot followed by Ok.

How to replace a failed disk with Linux software RAID (mdadm)

Recently one of the disks died in a RAID1 setup which uses the excellent Linux software RAID (mdadm). Here is a quick overview of the steps to exchange the failed disk with a new one. Obviously you should replace the X in /dev/mdX with the proper number of your RAID device. The same applies to /dev/sddY. Replace it with the actual device and partition you are using. So don’t copy and paste these commands. All commands should be executed as root or with proper sudo rights.

Step 1 – mark broken disk as failed

Step 2 – remove broken disk from array

Step 3 – create partition on the new disk

You can partition the new disk manually using your favourite tool (fdisk, gparted, sfdisk, etc.). But it’s probably faster and less error prone to copy the partition from an existing RAID disk to the new disk. Here’s an example how to do that using sfdisk. Requirement: the disk must be smaller than 2TB and should not be using GPT.

Make sure that all the partitions that are part of the RAID set have the proper ‘fd’ ID (Linux RAID autodetect). And example to set the ID of partition 1, 2 and 3 on the new disk to ‘fd’:

Step 4 – add new disk to RAID array

Step 5 – check status of RAID array

Step 6 – increase speed of the rebuild

By default mdadm will rebuild the RAID array in the background. If you want to speed up the resync process you can change the values of:

/proc/sys/dev/raid/speed_limit_max
/proc/sys/dev/raid/speed_limit_min

For example to set the minimum resync speed to 150MB/s:

You can watch the progress of the resync in a terminal with the following command:

Finally, for your next RAID setup also have a look at Partitionable RAID.