Last month, we gave an introduction to block device drivers. This month, we look at some tricks that are useful when writing block device drivers, starting with the most basic “trick” of using hardware interrupts where available and describing some neat infrastructure that block device drivers can take advantage of by adding five lines of code and one function.
Block devices, which are usually intended to hold file-systems, may or may not be interrupt-driven. Interrupt-driven block device drivers have the potential to be faster and more efficient than non- interrupt-driven block device drivers.
Last month, I gave an example of a very simplistic block device driver that reads its request queue one item at a time, satisfying each request in turn, until the request queue is emptied, and then returning. Some block device drivers in the standard kernel are like this. The ramdisk driver is the obvious example; it does very little more than the simplistic block device driver I presented. Less obvious to the casual observer, few of the CD-ROM drivers (actually none of them, as I write this) are interrupt-driven. It is easy to determine which drivers are interrupt-driven by reading drivers/block/blk.h, searching for the string DEVICE_INTR, and noting which devices use it.
I'm tired of typing “block device driver”, and you are probably tired of reading it. For the rest of this article, I will use “driver” to mean “block device driver”, except where stated otherwise.
Interrupt-driven drivers have the potential to be more efficient than non-interrupt-driven ones because the drivers have to spend less time busy-waiting—sitting in a tight loop, waiting for the device to become ready or finish executing a command. They also have the potential to be faster, because it may be possible to arrange for multiple requests to be satisfied at once, or to take advantage of peculiarities of the hardware.
In particular, the SCSI disk driver tries to send the SCSI disk one command to read multiple sectors and satisfy each of the requests as the data for each block arrives from the disk. This is a big win considering the way the SCSI interface is designed; because initiating a SCSI transfer takes some complex negotiation, it takes a significant amount of time to negotiate a SCSI transfer, and when the SCSI driver can ask for multiple blocks at the same time, it only has to negotiate the transfer once, instead of once for each block.
This complex negotiation makes SCSI a robust bus that is useful for many things besides disk drives. It also makes it necessary to pay attention to timing when writing the driver, in order to take advantage of the possibilities without being extremely slow. Before certain optimizations were added to the generic, high-level SCSI driver in Linux, SCSI performance did not at all approach its theoretical peak. Those optimizations made for throughput 3 to 10 times greater on most devices.
As another example, the original floppy driver in Linux was very slow. Each time it wanted a block, it read it in from the media. The floppy hardware is very slow and has high latency (it rotates slowly and if you wanted to read the block that just started going past the head, you had to wait until the disk made a full revolution), which kept it very slow.
Around version .12, Lawrence Foard added a track buffer. Since it only takes approximately 30% to 50% more time to read an entire track off the floppy as it does to wait for the block you want to read to come around and be read (depending on the type of disk and the position of the disk at the start of the request), it makes sense, when reading a block, to read the entire track the block is in.
As soon as the requested block has been read into the track buffer, it is copied into the request buffer, the process waiting for it to be read can continue, and the rest of the track is read into a private buffer area belonging to the floppy driver. The next request for a block from that floppy is often for the very next block, and that block is now in the track buffer and ready immediately to be used to fulfill the request. This is true approximately 8 times out of 9 (assuming 9 blocks, or 18 sectors, per track). This single change turned the floppy driver from a very slow driver into a very fast driver.
So, you are convinced that interrupt-driven drivers have a lot more potential, and you want to know how to turn the non-interrupt-driven driver you wrote last month into an interrupt-driven one. I can't give you all the information you need in a single article, but I can get you started, and after reading the rest of this article, you will be better prepared to read the source code for real drivers, which is the best preparation for writing your own driver.
The basic control flow of a request for a block from a non-interrupt-driven driver usually runs something like this simplification alert:
user program calls read() read() (in the kernel) asks the buffer cache to get and fill in the block buffer cache notices that it doesn't have the data in the cache buffer cache asks driver to fill in a block with correct data driver satisfies request and returns buffer cache passes newly-filled-in block back to read() read() copies the data into the user program and returns user program continues An interrupt-driven driver runs more like this simplification alert: user program calls read() read() (in the kernel) asks the buffer cache to get and fill in the block buffer cache notices that it doesn't have the data in the cache buffer cache asks driver to fill in a block with correct data driver starts the process of satisfying the request and returns buffer cache waits for block to be read by sleeping on an event Some other processes run for a while, perhaps causing other I/O on the device. the physical device has the data available and interrupts the driver driver reads the data from the device and wakes up the buffer cache buffer cache passes the newly-filled-in block back to read(). read() copies the data into the user program and returns user program continues
Note that read() is not the only way to initiate I/O.
One thing to note about this is that just about anything can be done before waking up the process(es) waiting for the request to complete. In fact, other requests might be added to the queue. This seems, at first, like a troublesome complication, but really is one of the important things that makes it possible to do some worthwhile optimizations. This will become obvious as we start to optimize the driver. We will start, though, by taking our non-interrupt-driven driver and making it use interrupts.
I am going to take the foo driver I started developing last month, and add interrupt service to it. It is hard to write good, detailed code for a hypothetical and vaguely defined device, so (as usual) if you want to understand better after reading this, take a look at some real devices. I suggest the hd and floppy devices; start from the do_hd_request() and do_fd_request() routines and follow the logic through.
static void do_foo_request(void) { if (foo_busy) /* another request is being processed; this one will automatically follow */ return; foo_busy = 1; foo_initialize_io(); } static void foo_initialize_io(void) { if (CURRENT->cmd == READ) { SET_INTR(foo_read_intr); } else { SET_INTR(foo_write_intr); } /* send hardware command to start io based on request; just a request to read if read and preparing data for entire write; write takes more code */ } static void foo_read_intr(void) { int error=0; CLEAR_INTR; /* read data from device and put in CURRENT->buffer; set error=1 if error This is actually most of the function... */ /* successful if no error */ end_request(error?0:1); if (!CURRENT) /* allow new requests to be processed */ foo_busy = 0; /* INIT_REQUEST will return if no requests */ INIT_REQUEST; /* Now prepare to do I/O on next request */ foo_initialize_io(); } static void foo_write_intr(void) { int error=0; CLEAR_INTR; /* data has been written. error=1 if error */ /* successful if no error */ end_request(error?0:1); if (!CURRENT) /* allow new requests to be processed */ foo_busy = 0; /* INIT_REQUEST will return if no requests */ INIT_REQUEST; /* Now prepare to do I/O on next request */ foo_initialize_io(); }
In blk.h, we need to add a few lines to the FOO_MAJOR section:
#elif (MAJOR_NR == FOO_MAJOR) #define DEVICE_NAME "foobar" #define DEVICE_REQUEST do_foo_request #define DEVICE_INTR do_foo #define DEVICE_NR(device) (MINOR(device) > 6) #define DEVICE_ON(device) #define DEVICE_OFF(device) #endif
Note that many functions are missing from this; this is the important part to understanding interrupt-driven device drivers; the “heart”, if you will. Also note that, obviously, I haven't tried to compile or run this hypothetical driver. I may have made some mistakes—you are bound to make mistakes of your own while writing your driver, and finding bugs in this skeleton will be good practice for finding bugs in your own driver, if you are so inclined. I do suggest that when you write your own driver, you start with code from some other working driver rather than starting from this skeleton code.
An explanation of some of the new ideas here is in order. The first new idea is (obviously, I hope) the use of interrupt routines to do part of servicing the hardware and walking down the request list. I used separate routines for reading and writing; this isn't fundamentally necessary, but it does generally help allow cleaner code and smaller, easier-to-read interrupt service routines. Most (all?) of the interrupt-driven device drivers of any kind in the real kernel use separate routines for reading and writing.
We also have a separate routine to do most of the I/O setup instead of doing it in the request() procedure. This is so that the interrupt routines can call the separate routings to set up the next request, if necessary, upon completion of a request. Again, this is a design feature that makes most real-world drivers smaller and easier to write and debug.
It must be noted that any routine that is called from an interrupt is different than all the other routines I have described so far. Routines called from an interrupt do not execute in the context of any calling user-level program and cannot write to user-level memory. They can only write to kernel memory. If they absolutely need to allocate memory, they must do so with the GFP_ATOMIC priority. In general, it is best for them to write into buffers allocated from user-process-context routines with priority GFP_KERNEL before the interrupt routines are set up. They also can wake up processes sleeping on an event, as end_request() does, but they cannot sleep themselves.
The header file blk.h provides some nice macros which are used here. I won't document them all (most are documented in The Linux Kernel Hackers' Guide, the KHG), but I will mention the ones I use, which are used to manage interrupts.
Instead of setting up interrupts manually, it is easier and better to use the SET_INTR() macro. (If you want to know how to set them up manually, read the definitions of SET_INTR in blk.h.) Easier because you just do SET_INTR(interrupt_handling_function), and better because if you set up automatic timeouts (which we will cover later), SET_INTR() automatically sets them up.
Then, when the interrupt has been serviced, the interrupt service routine (foo_read_intr() or foo_write_intr() above) clears the interrupt, so that spurious interrupts don't get delivered to a procedure that thinks that it is supposed to read or write to the current request. It is possible—it only takes a little more work—to provide an interrupt routing to handle spurious interrupts. If you are interested, read the hd driver.
In blk.h, a mechanism for timing out when hardware doesn't respond is provided. If the foo device has not responded to a request after 5 seconds have passed, there is very clearly something wrong. We will update blk.h again:
#elif (MAJOR_NR == FOO_MAJOR) #define DEVICE_NAME "foobar" #define DEVICE_REQUEST do_foo_request #define DEVICE_INTR do_foo #define DEVICE_TIMEOUT FOO_TIMER #define TIMEOUT_VALUE 500 /* 500 == 5 seconds */ #define DEVICE_NR(device) (MINOR(device) > 6) #define DEVICE_ON(device) #define DEVICE_OFF(device) #endif
This is where using SET_INTR() and CLEAR_INTR becomes helpful. Simply by defining DEVICE_TIMEOUT, SET_INTR is changed to automatically set a “watchdog timer” that goes off if the foo device has not responded after 5 seconds, SET_TIMER is provided to set the watchdog timer manually, and a CLEAR_TIMER macro is provided to turn off the watchdog timer. The only three other things that need to be done are to:
Add a timer, FOO_TIMER, to linux/timer.h. This must be a #define'd value that is not already used and must be less than 32 (there are only 32 static timers).
In the foo_init() function called at boot time to detect and initialize the hardware, a line must be added:
timer_table[FOO_TIMER].fn = foo_times_out;
And (as you may have guessed from step 2) a function foo_times_out() must be written to try restarting requests, or otherwise handling the time out condition.
The foo_times_out() function should probably reset the device, try to restart the request if appropriate, and should use the CURRENT->errors variable to keep track of how many errors have occurred on that request. It should also check to see if too many errors have occurred, and if so, call end_request(0) and go on to the next request.
Exactly what steps are required depend on how the hardware device behaves, but both the hd and the floppy drivers provide this functionality, and by comparing and contrasting them, you should be able to determine how to write such a function for your device. Here is a sample, loosely based on the hd_times_out() function in hd.c:
static void hd_times_out(void) { unsigned int dev; SET_INTR(NULL); if (!CURRENT) /* completely spurious interrupt- pretend it didn't happen. */ return; dev = DEVICE_NR(CURRENT->dev); #ifdef DEBUG printk("foo%c: timeout\n", dev+'a'); #endif if (++CURRENT->errors >= FOO_MAX_ERRORS) { #ifdef DEBUG printk("foo%c: too many errors\n", dev+'a'); #endif /* Tell buffer cache: couldn't fulfill request */ end_request(0); INIT_REQUEST; } /* Now try the request again */ foo_initialize_io(); }
SET_INTR(NULL) keeps this function from being called recursively. The next two lines ignore interrupts that occur when no requests have been issued. Then we check for excessive errors, and if there have been too many errors on this request, we abort it and go on to the next request, if any; if there are no requests, we return. (Remember that the INIT_REQUEST macro causes a return if there are no requests left.)
At the end, we are either retrying the current request or have given up and gone on to the next request, and in either case, we need to re-start the request.
We could reset the foo device right before calling foo_initialize_io(), if the device maintains some state and needs a reset. Again, this depends on the details of the device for which you are writing the driver.