Zack's Kernel News



The Linux kernel mailing list comprises the core of Linux development activities. Traffic volumes are immense, often reaching ten thousand messages in a given week, and keeping up to date with the entire scope of development is a virtually impossible task for one person. One of the few brave souls to take on this task is Zack Brown.

Our regular monthly column keeps you abreast of the latest discussions and decisions, selected and summarized by Zack. Zack has been publishing a weekly online digest, the Kernel Traffic newsletter for over five years now. Even reading Kernel Traffic alone can be a time consuming task.

Linux Magazine now provides you with the quintessence of Linux Kernel activities, straight from the horse's mouth.

RelayFS

The RelayFS filesystem, which is intended for high-speed data transfer between the kernel and user-space, is apparently so successful that it is no longer needed! Paul Mundt has recently submitted a set of patches to abstract RelayFS's "channel buffer" features that allow the high-speed data transfer available through Relay FS and make them available to all filesystems via a consistent API. Having done this, RelayFS itself is removed as redundant.

While this new feature or something like it is undoubtedly a great boon for the myriad filesystems out there today, the kernel developers know it is not such a simple matter to remove a filesystem that users may currently depend upon. After all (as Dave Jones has pointed out) it has taken years to remove DevFS after the decision had been made; why should RelayFS have it any easier?

DevFS may be an odd example, given the tremendous anguished controversy that has surrounded it. But the overarching point of avoiding breaking user-space would seem to be valid in this case, and Paul has submitted additional code to keep the current user-space behavior unchanged.

However, even that accommodation, according to Christoph Hellwig, may be unnecessary. Apparently, before RelayFS was merged into the tree, Andrew Morton assured the RelayFS developers that it would be OK to continue development and to change things at a high level, so long as no other in-kernel components came to rely on the earlier behavior. In the case of RelayFS, at least during these early days, the rest of the kernel itself would be the "user-space" that must not break.

In light of this, along with the fact that Paul has been willing to submit patches to allow even non-kernel uses to continue to work unchanged, it looks like a safe bet that RelayFS as we've known it will disappear and that the core features of RelayFS will reappear as a thin API layer available throughout filesystem-space.

Directory Hardlinks?

For many years, a perennial question has been, "why can't we have hard-links to directories?" Recently Joshua Hudson asked this question in the form of a patch to enable the feature within the VFS. Unfortunately, merely enabling this hard-link feature does not solve the underlying problems it creates, as Horst von Brand has pointed out. Apparently the possibility of loops in the directory tree is one that the Linux kernel cannot cope with and still remain fast. Garbage collection, currently done by reference counts, would require a new, complex infrastructure, and this infrastructure would require a lot of memory to get right.

It would be possible to simplify those requirements and still allow directory hard-links, but only by enforcing certain restrictions on what users could do with their directory structure. And while it would be straightforward to implement these restrictions, Horst says, they would appear arbitrary and hard to understand to the user.

So Joshua's patch will almost certainly have to live off-tree. Judging from the fact that this debate has really been over for years, and no existing filesystem is clamoring for directory hardlinks, it appears the decision has been made long since to go with a simpler, freer interface in general and just have a single clear rule against directory hard links.

Software Suspend Status

The software suspend project has had a rough life. When Nigel Cunningham forked the code away from Pavel Machek years ago, the split was quite acrimonious. Then in 2004 it looked as though both hackers had resolved their differences and agreed to work together again with Pavel as the lead. When that plan fell through, the two projects continued in their separate directions, with Pavel's work gaining more developers, and Nigel's maintaining a significant following as well. In recent days Nigel has made a big career change, and suggested that, unless other major developers rise up to support his suspend work, he would make no further effort to merge his code with the official tree, though he does plan to continue maintaining his patches. Given how unpredictable this project has been up till now, I would expect any future direction to be different from anything the current situation might suggest. But that's the current situation.

Kernel ABI Stability

The Linux ABI (Application Binary Interface) consists of functions like system calls that can be called by user programs. The Linux ABI is described as "a patch to the Linux kernel that allows a Linux system to run foreign binaries." The ABI provides Linux support for binaries intended for SCO, Caldera OpenUnix 8, Sun Solaris 2, IBM System V, and other Unix-based systems.

When the ABI changes, applications that rely on it can suddenly break. Since there is no way to know how many applications rely on a given function in the ABI, it is difficult to estimate how much work an ABI change will require on the part of application developers throughout the world. So in general, ABI changes are frowned upon.

Unfortunately, it's not necessarily perfectly clear how stable a given binary interface should be. Keeping functions forever unchanged would interfere with Linux improving those interfaces, and would tend to make it more difficult for Linux to flexibly respond to the shifting needs of the world. In practice, the ABI does change. The problem is how to mitigate that change so that application developers do not have to constantly rewrite the guts of their software, and so that kernel hackers can continue to make the kernel really great.

One approach is to document the issue of ABI stability, and Greg Kroah-Hartman has decided to do just that. First of all he's identified about half a dozen levels of stability, going beyond a simple stable/unstable dichotomy, and including categories for obsolete interfaces (like DevFS), private interfaces (like the ALSA code), and interfaces that have already been removed from the kernel after having been obsoleted. The final organization of this system is still to be determined, and Linus Torvalds has already expressed some dissatisfaction with portions of the plan; but the basic idea of documenting ABI stability seems to be acceptable to Linus and on its way into the tree.

That is not to say the issue is not controversial. A significant number of big-time kernel hackers are opposed to the whole idea. Theodore Y. Ts'o, for instance, has said flatly that the kernel should not accept any user-visible interfaces it does not intend to support in an ongoing way. And it is clear that ABI changes themselves, regardless of any documentation effort, will continue to cause consternation among application developers. But perhaps Greg's approach will at least help make the ideal of kernel development mesh better with the realities.

Git-bisect and Patch Sumission Processes

The advent of the git-bisect tool has been a fantastic boon to developers, who are now able to much more quickly identify the precise patch that introduced a given bug. But the arrival of this git bisect tool has also begun to somewhat regulate the way kernel development takes place. Linus Torvalds has begun rejecting patch-sets having any single patch that leaves the kernel in an uncompilable state, even if applying the full patch set does not.

It has never been a good idea to include a patch that breaks compilation, but in the old days, the idea was that patches should be split into discrete parts for easy review. As long as a single patch accomplished a single thing, actual compilation was not part of the equation, at least not explicitly, especially if the full series of patches would result in a working kernel. The job of splitting the patches was seen by many as a mere housekeeping task.

But now that git-bisect has shown its great value, the ability of intermediary patches to successfully compile becomes much more important. A failed compilation means that a hunted run-time bug cannot be tested for in that kernel, which means that git-bisect will be slower to identify the precise kernel version that introduced it. And since each kernel must be tested by compiling and running it, "slower" could mean hours slower. And if many intermediary patches leave uncompilable kernels, the value of git-bisect will quickly drop.

It's interesting to observe the various development policies as they change. In this case, git-bisect only turned out to be as useful as it has been because of the earlier policy of patch-splitting; and now that usefulness has inspired a further refinement of the same policy.