diff -u: What's New in Kernel Development

Zack Brown

Issue #278, June 2017

Some PCI devices include their own RAM, and Logan Gunthorpe wanted to make it available to the system as general-purpose memory. He understood that there could be a slowdown when using RAM from those devices relative to the RAM chips on the motherboard, but he figured that in cases of heavy load, it could be worth it. Sometimes what you really need is every last drop of memory, regardless of any other consideration.

He posted a patch to implement p2pmem, a peer-to-peer memory driver for PCI devices. But to avoid too much slowdown, he constrained his code to linking memory only from devices that all sat behind the same PCI switch.

But, Sinan Kaya didn't like this, saying it wasn't a portable solution. He wanted Logan to remove any such restrictions and let users decide for themselves if the performance hit was too terrible or if the code wouldn't work at all with a given device. That way, Logan's patch would work the same on all architectures.

They went back and forth about this. Logan felt it was important to ensure good performance, which required the code to include a certain amount of understanding of the hardware it controlled. The simplest approach was to support PCI devices that were all behind the same switch; anything more generic than that, he said, risked exploding the complexity of the code, as well as the need to list tons of specific devices and their compatibility issues.

But ultimately, Sinan made the point that Logan's code simply could be generic and allow users to shoot themselves in the foot if they so desired. Logan's patch would be off by default anyway, so there was no harm in letting users make the final call based on their own knowledge of the hardware on their systems.

This actually had been Logan's inclination from the start, but he'd received push-back from the LSF (load sharing facility) folks, who preferred things to be simple and functional. But with Sinan's argument about portability, Logan said it made more sense to remove the requirement that all shared memory devices be behind the same PCI switch and just let users make the decision themselves.

The discussion ended there, but presumably, the LSF folks will have their own objections, and the whole patch will have to go through several more iterations before everyone is fully satisfied, especially the kernel maintainers themselves.

Sometimes it's useful to have a whole separate thread running a particular kernel operation. If something is complicated and can take an arbitrary amount of time, giving it its own thread can fix latency issues by adding it to the normal process rotation in the scheduler. Then it can take however long it wants, without inconveniencing other parts of the kernel.

The printk() function is a good example of something that would benefit from having its own kthread. The printk() function sends messages to the console, to keep users informed of any problems with the system. Unfortunately, printk() works only in certain contexts and can take an arbitrary amount of time to execute. For people writing code in any given part of the kernel, it can be annoying to keep track of where and how to call printk() such that it will work. Putting printk() in its own kernel thread would greatly simplify that whole question.

Sergey Senozhatsky recently posted some code to do this, but although it did receive some initial interest, folks like Peter Zijlstra objected.

The problem with putting something into its own kernel thread is that each thread is a permanent drain on overall performance. The scheduler must cycle through every single thread, many times per second. As the number of threads on a system goes up, the system performance can become choppier and choppier. As a result, any new feature that requires a new kernel thread typically must have something really stupendous to offer. Maybe it resolves a security issue or makes people's lives much easier than they were before, or maybe it's something that just naturally belongs in a separate thread.

The printk() function may turn out to have a valid need for its own thread. But since printk() works fairly well as currently implemented, there are bound to be plenty of other folks like Peter who need to be convinced.

The AVR32 Architecture is being removed from the kernel. It's an old system-on-a-chip that came out of Atmel corporation. But Atmel hasn't supported it in quite some time, and it has become a drag on other architectures that share drivers with it, such as the Atmel ARM SoC.

Hans-Christian Noren Egtvedt posted a patch to get rid of it, and a bunch of people cheered out loud. Various folks also suggested additional parts of the kernel source that could be included in the AVR32 wipeout.

In some ways, it's sad to lose an architecture like this. Sometimes it's fun to think about running a modern version of Linux on an old TRS-80 color computer or whatnot. But, the kernel is a living project. And it's somewhat uplifting to remember that, aside from its mysterious absence from desktop systems, Linux does indeed essentially run the entire internet—everything connected to it and everything not connected to it. So I guess we can do without the AVR32 architecture.

Some things we don't like and we can fix. Other things we don't like, but the fix is worse. Recently, Christoph Hellwig railed against the fact that system calls would accept any input flag at all and just ignore the ones that weren't supported. The reason the kernel does this is so that user code can run on older kernels without making the system calls choke on unknown input.

The bad part, unfortunately, is that it makes it impossible for user code to probe a system call to see whether a given feature is supported. And it turns out that some user code really would benefit from being able to do things like that—for example, atomic input/output.

One problem with fixing the system calls to reject unsupported input flags is that it would break existing binaries running in the world. All such binaries would need to be recompiled, which would be a problem if the binaries are very old and the source code is no longer available, which is actually a significant possibility in some cases.

Breaking existing binaries is called an ABI (application binary interface) change, and it's allowed only under very extreme circumstances—for example, if it's the only way to plug a given security hole.

But, Linus Torvalds didn't like Christoph's idea for another reason entirely. He said, “probing for flags is why we *could* add things like O_NOATIME etc - exactly because it 'just worked' with old kernels, and people could just use the new flags knowing that it was a no-op on old kernels.”

So even though some users would benefit from being able to probe for features, even more users benefit from not having to worry about a given feature failing to do anything at all.