Zack's Kernel News

The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching 10,000 messages in a week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Kernel Cleanliness

A routine pull request turned into an ugly-code witch hunt recently when Phillip Lougher tried to add LZMA compression to SquashFS. His patches ended up copying some ugly code out of include/linux/mm.h into several other files, and Linus Torvalds was worried that this was just going to spread the problem, when it should be fixed up at the root, especially since someone (Phillip) was already doing work around that area of the kernel.

One of Linus's big complaints was that the mm.h header file actually contained C code for a malloc() implementation, which, first, was out of place in a header file, second, was associated with other weird #ifdefs going on in that file, and, third, was not a very good implementation of malloc() either. The reason the problem came about in the first place was because of some deep, dark boot-time issues; however, Linus felt the situation could be fixed, or at least contained, and not spread to other files, as Phillip's patches tried to do.

Phillip, meanwhile, had been aware of the problem and had fixed a bunch of other problems in that file over the course of time, but for these patches, he was mainly trying to avoid making too many big changes while still getting his SquashFS updates into the kernel. So he was ready to go back and fix the mess before submitting his patches; Linus asked H. Peter Anvin to help him with it. Peter had some ideas about how to get it done, but one thing he mentioned was that this code was used by nearly every architecture, so it would take some delicate footwork to make sure that the problems themselves were fixed and that nothing else broke in the meantime. He figured the changes might have to be timed with a new kernel release, which meant Phillip's SquashFS update would also have to be delayed.

The really interesting thing about the whole exchange is the way Linus's insistence on what he calls "tasteful coding" has spread to be a more general approach among his various "lieutenants" and code maintainers. The Linux kernel is most definitely not just a big pile of code that its hundreds and thousands of contributors glom more and more of their features onto. Certainly ugly code does get into the kernel, but there's a very strong culture of opposition to that. Sometimes the opposition takes the form of Linus complaining about something like mm.h; sometimes it takes the form of people being encouraged to just poke around the kernel for stuff that needs to be cleaned up. This part of kernel development culture didn't just happen; it's the result of debate and discussion and an evolution of policy taking place over years.

Status of Remote Controller Subsystem

Mauro Carvalho Chehab announced that he and some other folks were completely rewriting the Remote Controller subsystem. The previous code had been too tied to Video4Linux and DVB, whereas Mauro and others wanted to support generic remote control devices; in fact, they wanted to support sending arbitrary raw events to arbitrary userspace programs.

The rewrite already included changes to the SysFS interface, and other big behavioral changes, and at the time of his announcement, both the subsystem and a userspace test application were working well enough to be tested.

Seems like some wild times are in store for remote control users.

Memory Management for Virtual Systems

Dmitry Torokhov from VMware announced a new VMware balloon driver that was no longer tied to the virtio subsystem. The driver was intended to handle VMware memory allocation between the host system and the virtual machine running on top of it. VMware isn't the only virtualization system that uses a bubble driver. Bubble drivers create an area of memory that can be expanded and contracted on the basis of need, so the virtual machine only keeps control of the RAM it actually needs at any given time, and frees the rest to be used by other programs running on the host. Xen and KVM both have similar needs as VMware for this type of driver.

The problem is that these virtualization systems have evolved fairly independently, so they each feed their own memory management patches upstream into the kernel; the kernel developers prefer that similar functionality be implemented in one spot and simply used by all the different tools that might need it. So in response to Dmitry's announcement, there was a fair amount of discussion about which other tools might use the same driver and how to keep the driver from becoming too VMware specific. But another line of discussion involved whether to simply extend an existing part of the kernel, for example, the memory handling portion of Linux's hibernation code, with the features needed by these virtualization systems. That was Andrew Morton's suggestion.

This choice is not necessarily obvious. As Dan Magenheimer remarked, "Hibernation assumes everything in the machine is going to stop for a while. Ballooning assumes that the machine has lower memory need for a while but is otherwise fully operational. Think of it as hot-plug memory at a page granularity."

The discussion veered in a couple of technical directions, with folks advocating their various preferred approaches; however, it does seem as though Andrew will probably insist on some sort of more generic solution that addresses the way

Linux handles memory in general, rather than focusing on the needs of specific virtualization systems.

Status of the Big Kernel Lock

As part of the long-term effort to get rid of the big kernel lock (BKL), Arnd Bergmann recently announced that his own branch of the kernel source tree, including patches from lots of different folks, had completely restricted the BKL to mainly obscure drivers (and a few less obscure ones). The entire rest of his tree was BKL-free, and the remaining drivers that did depend on it, he'd clearly marked in the config system, so it would be possible to build a completely BKL-free kernel by excluding those drivers.

Some of his patches he admitted were probably just placeholders. His code replacing the TTY layer's BKL with a global mutex, for example, would probably be left out in favor of Alan Cox's planned BKL work on the TTY layer. Also, Arnd said it would be a fairly large effort to merge his tree with Linus Torvalds' upstream sources, given the large differences between the two. But, he said Frederic Weisbecker had volunteered to help him with that part.

Some key drivers still rely on the BKL, including USB, DRM, and VFAT. Clearly there is more work to do, but it seems the big kernel lock will soon be at least an optional part of most kernel builds.

The really amazing thing about eradicating the BKL is that it was so deeply embedded in all parts of the kernel, that many developers thought for a long time it would be totally impossible to get rid of it. The problem was that all the different places that used it in the kernel actually had slightly (and some not so slightly) different requirements for the kind of locking they needed. The BKL satisfied all needs, but trying to replace the BKL with something less extreme seemed like an intractable problem for a long time.

Ultimately, the solution was, first, not to try to replace the BKL at all but just push the BKL code farther and farther to the periphery of the kernel, into the drivers. Then, anything that depends on the BKL in one location would be distinct from anything that depends on it in a different location. Once that was accomplished, each case of the BKL could be dealt with individually, replaced with a mutex, semaphore, or other special-case locking variant, depending on the particular needs of just the little bit of code that wanted a lock. The ultimate value in eliminating the BKL is that multi-threaded user software will run into fewer circumstances in which the entire kernel has to come to a screeching halt just to perform some particular operation before moving on to the next timeslice.