It's odd that printk() would pose so many problems for kernel development, given that it's essentially just a replacement for printf() that doesn't require linking the standard C library into the kernel.
And yet, it's famously a mess, full of edge cases, corner cases, deadlocks, race conditions and a variety of other tough-to-solve problems. The reason for this is, unlike printf(), the printk() system call has to produce reasonable behavior even when the entire system is in the midst of crashing. That's really the whole point—printk() needs to report errors and warnings that can be used to debug whatever strange and unexpected catastrophe has just hit a running system.
Trying to fix all the deadlocks and other problems at the same time would be too large a task for anyone, especially since each one is a special case defined by the particular context in which the printk() call appeared. But, sometimes a bunch of instances in a particular region of code can be addressed all together.
Sergey Senozhatsky recently tried to address some printk() deadlocks, although he acknowledged he wouldn't address any instances that were caused by the printk() code itself triggering a separate recursive printk() call. He wanted to concern himself with non-recursion-based deadlocks only.
Sergey focused on the console code, which was where printk() generally sent its output, and which was one place where printk() could deadlock. He added a very small safeguard to the code, but the result seemed to be that drivers all throughout the kernel would have to be updated to use the new safeguard.
His code was not met with universal acclaim. Alan Cox noticed that Sergey's safeguard added code to the "fast path"—a region of code that needed to be as fast and efficient as possible, because it was run all the time, many times per second. Slowing down the fast path would slow down the whole system. Alan suggested instead of this, it would be better for the kernel simply not to call printk() if the console code would be in a position to deadlock.
Sergey was not in any way satisfied, however. He pointed out that his patch solved real-world problems that users had reported experiencing directly. He didn't see how it would help anything simply to pull out the printk() instances that triggered the problem, especially if those instances were doing important work like reporting on the real reason the system was crashing and so on.
Sergey wanted to keep the printk() instances and implement the safeguards to protect them. However, at this point Linus Torvalds joined the discussion, saying:
The rule is simple: DO NOT DO THAT THEN.
Don't make recursive locks. Don't make random complexity. Just stop doing the thing that hurts.
There is no valid reason why an UART driver should do a printk() of any sort inside the critical region where the console is locked.
Just remove those printks, don't add new crazy locking.
If you had a spinlock that deadlocked because it was inside an already spinlocked region, you'd say "that's buggy".
This is the exact same issue. We don't work around buggy garbage. We fix the bug—by removing the problematic printk.
Sergey pointed out that the printk() instances were called from all those drivers he wanted to change. It wasn't a case of some simple part of the kernel having an extra printk(). The drivers all needed to be updated with the safeguard, or they would continue to report the wrong thing.
The conversation ended with no conclusion. It's difficult to know when something should be fixed versus removed. There are all sorts of technical questions that come up, including wondering if the fix is worth all the fuss.
At a time when many companies are rushing to internationalize their products and services to appeal to the broadest possible market, the Linux kernel is actively resisting that trend, although it already has taken over the broadest possible market—the infrastructure of the entire world.
David Howells recently created some sample code for a new kernel library, with some complex English-language error messages that were generated from several sources within the code. Pavel Machek objected that it would be difficult to automate any sort of translations for those messages, and that it would be preferable simply to output an error code and let something in userspace interpret the error at its leisure and translate it if needed.
In this case, however, the possible number of errors was truly vast, based on a variety of possible variables. David argued that representing each and every one with a single error code would use a prohibitively large number of error codes.
Ordinarily, I might expect Pavel to be on the winning side of this debate, with Linus Torvalds or some other top developer insisting that support for internationalization was necessary in order to give the best and most useful possible experience to all users.
However, Linus had a very different take on the situation:
We don't internationalize kernel strings. We never have. Yes, some people tried to do some database of kernel messages for translation purposes, but I absolutely refused to make that part of the development process. It's a pain.
For some GUI project, internationalization might be a big deal, and it might be "TheRule(tm)". For the kernel, not so much. We care about the technology, not the language.
So we'll continue to give error numbers for "an error happened". And if/when people need more information about just what _triggered_ that error, they are as English-language strings. You can quote them and google them without having to understand them. That's just how things work.
[...]
There are places where localization is a good idea. The kernel is *not* one of those places.
He added later:
I really think the best option is "Ignore the problem". The system calls will still continue to report the basic error numbers (EINVAL etc), and the extended error strings will be just that: extended error strings. Ignore them if you can't understand them.
That said, people have wanted these kinds of extended error descriptors forever, and the reason we haven't added them is that it generally is more pain than it is necessarily worth.
Pavel still felt that, since David's code was all new, there was no ancient cruft standing in the way of implementing internationalization in this one new area. He agreed there was no point in a lot of other cases, but for this one, it felt like being given a fresh chance.
But Linus said, "Really. No translation. No design for translation. It's a nasty nasty rat-hole, and it's a pain for everybody."
He added, "the fact is, I want simple English interfaces. And people who have issues with that should just not use them. End of story. Use the existing error numbers if you want internationalization, and live with the fact that you only get the very limited error number. It's really that simple."
The discussion ended shortly thereafter. It's a fascinating rejection of a very politically popular attitude, based on the technical consideration that keeping the programming interface simple is worth more than keeping the user interface friendly.
Various efforts always are underway to implement Secure Boot and to add features that will allow vendors to lock users out of controlling their own systems. In that scenario, users would look helplessly on while their systems refused to boot any kernels but those controlled by the vendors.
The vendors' motivation is clear—if they control the kernel, they can then stream media on that computer without risking copyright infringement by the user. If the vendor doesn't control the system, the user might always have some secret piece of software ready to catch and store any streamed media that could then be shared with others who would not pay the media company for the privilege.
Recently, Chen Yu and other developers tried to submit patches to enhance Secure Boot so that when the user hibernated the system, the kernel itself would encrypt its running image. This would appear to be completely unnecessary, since as Pavel Machek pointed out, there is already uswsusp (userspace software suspend), which encrypts the running image before suspending the system. As Pavel said, the only difference was that uswusp ran in userspace and not kernel space.
Perhaps in an effort to draw Chen into admitting the deeper motives behind the patch submission, Pavel asked Chen to elucidate exactly what security hole his patches addressed and how they would deal with them. Pavel would ask that question over and over again before the end of the discussion, and he would not receive an answer.
Chen offered a variety of justifications for the patch, including letting users do less work, but none of them answered the fundamental question: why was this patch needed as a security enhancement in the first place? And eventually, Pavel called it like he saw it. He said, "Purpose here is to prevent the user from reading/modifying kernel memory content on machine he owns. Strange as it may sound, that is what 'secure' boot requires (and what Disney wants)."
The discussion ended inconclusively, but not utterly. It's clear that Pavel, and a group of core kernel developers including Linus Torvalds, will continue to guard against allowing vendors to control user systems. This seems to be one of the fundamental values of the Linux kernel—to prevent the reemergence of the kind of situation we had in the 1980s, where vendors had ultimate control over virtually all software, while users were at the mercy of business decisions they didn't agree with but could do nothing about.
Note: if you're mentioned in this article and want to send a response, please send a message with your response text to ljeditor@linuxjournal.com and we'll run it in the next Letters section and post it on the website as an addendum to the original article.