Ask Klaus!


Klaus Knopper is the creator of Knoppix and co-founder of the LinuxTag expo. He currently works as a teacher, programmer, and consultant. If you have a configuration problem, or if you just want to learn more about how Linux works, send your questions to: klaus@linux-magazine.com

Boot Sequence

Question:

Thanks for answering our "average user" questions so diligently and simply. I am not new to Linux. In fact, I have been using it since the mid-90s when I installed a mail relay and Internet gateway using Red Hat 6. I also jumped with joy when my ipchains-based firewall setup actually worked. At the time, I could find no support or training in my country, and I had to figure everything out myself.

I have a problem that has been annoying me for some time. Would you detail the steps involved in booting a Linux distribution such as Red Hat? Describe every step taken in the boot sequence - loading of bootsector, kernel, mounting of partitions, etc. I would also like to know which files are involved, and in what order they are read and used.

Answer:

The full story would probably fill a book, but here are some things that I think are most important to know about the boot process. The following answer may be specific to what is know as the "PC Architecture," usually meaning Intel/AMD-based processors and corresponding boards. In other hardware architectures, the boot process is similar, yet the process varies depending on hardware design.

Usually, when you switch on your computer, the hardware runs a short "self-test" sequence that can partly be influenced by the settings stored in the BIOS. Parts of this self test are:

After this, the system is up and running and you can log in (if you have not automatically been logged in already). Most errors happen (if not due to kernel misconfiguration) in the init scripts, which are not always written with easy recovery in mind.

For example, if a corruption in the root filesystem is detected during the file system check, you may be unable to log in and "repair errors manually" in the way that is suggested by the error messages at this step.

In this case, you can often use a trick that allows you to skip most of the boot process and just start a single program right away after kernel and initrd are loaded. This can, referring back to your question, be extremely helpful if you accidentally misconfigured iptables or low-level settings so that the system does not reach a stage that would allow you to log into it.

To get such an "emergency shell" at the lilo or grub boot prompt, type:

linux init=/bin/sh

You will then be prompted to enter commands immediately after the kernel's hardware detection, a least if your root filesystem is still in a somewhat usable condition.

In most setups, the mounted root file system is still read-only, so if you plan, for example, to reset the root password without knowing the old one, you should remount your root file system read-write first, using:

mount -o remount,rw /

Then you can manually try to repair everything from the shell.

Be very careful not to hit the ctrl-alt-del key combination during your system repair attempts, since, without using init, that key combination is not rerouted to a clean shutdown and would reset your computer without any further notice or question, probably without giving the kernel a chance to save any buffered changes.

After you are done with your emergency rescue shell session, just remount filesystems as read-only:

mount -o remount,ro /

This will then allow you to just hit the reset button or use the previously mentioned immediate reset by keyboard.

Figure 1: init reads its configuration from the inittab file.

Figure 2: The fstab file contains information on drives and mount points.

Ruined Everything!

Question:

I'm not sure what's wrong. My computer recently gave the message "operating system error" when I tried to start it. Now a message says "searching for a boot record," and it never stops searching. And now my DVD/CD drive is missing.

Answer:

It may be a hardware error, which cannot be repaired by software. If a ruined BIOS setup is likely, try setting all BIOS options to the "slow but safe" variant. (Sometimes this is not the safe variant, but the "normal" setting is.)

Also, major failure to work can also arise from a changed hardware configuration, even if the BIOS only thinks that you may have changed something, so it automatically changes the chipset settings, interrupt numbers or IO addresses on its own. I've seen it happen. Hardware is evil.

I've seen computers that would do hard lockups without even attempting to boot into a usable system, working again by just changing the sequence of network cards or by removing a soundcard that was installed in addition to the onboard card.

Also, connectors do age. Despite what hardware vendors want you to believe, there is such a thing as oxidation of contacts and changes in the physical geometry that makes signal cables lose their contacts. So, if you make sure that all hard disk and CD-ROM connectors are carefully fixed in their sockets, and no plastic attachments are loose or apparently broken, there is a good chance that your computer will magically work again even though it refused to recognize any drive attached to the IDE or SATA sockets earlier.

You can also try new cables. I don't believe in contact/cleaning sprays, though, because these can easily short-circuit modern connectors, disintegrate isolation, and - in the worst case - ruin your board. Also, ram may need to be replaced because it's failing after a few billion read/write cycles.

So much for the hardware rescue attempt. For the software side, sometimes a BIOS upgrade can help eliminate problems that only showed up after installing a new operating system version.

Sometimes hardware features won't work well, and instead of patching hardware with a soldering iron, it can be wise not to use some features that are not urgently needed, such as IRQ balancing or ACPI.

Knoppix has a good - yet long - boot command line to see whether problems caused by an incomplete onboard implementation of features that have found their way into the Linux kernel vanish:

knoppix acpi=off noapic nolapic pnpbios=off pci=bios

nolapic is not a spelling error. I recently discovered some notebooks that would not boot without the "nolapic" option.

Using nolapic, everything - including ACPI, CPU frequency scaling, and all onboard controllers - worked flawlessly, while without nolapic, the kernel stopped dead at recognition of the hard-disk controllers (where the fault was apparently neither the disks, nor the controllers).

Root Password

Question:

I have a live CD of Knoppix 3.7. I would like to use it to do recovery and disk partitioning, but when I boot, it comes up in user mode, and I need root to run administrative programs. How do I set the root password for a Live CD?

Answer:

Most setup scripts in Knoppix (located in the Knoppix menu, beneath the KDE menu) will use "sudo" to run setup tasks under root privileges. When you start a program that is designed to ask for a root password, instead of being run by sudo, you probably do need a root password. Passwords in Knoppix are locked/invalid. There are no backdoors or "default passwords." Set a new root password with:

sudo passwd

or visit the root console by control-alt-F1 and type:

passwd

Then just switch back to X with control-alt-F5.