Designing a new library for processing large images using a minimal amount of memory.
Library design is an imprecise art. It would be impractical to include in a design all possible uses for a library of any sufficient size or complexity. Thus, it is inevitable that many libraries (and programs) reach evolutionary dead ends, where newly anticipated uses or algorithms can no longer fit nicely into the existing architecture.
The quickest short term solution is usually to hack in new interfaces, but over time the accumulation of hacks tends to reduce the stability, understandability and maintainability of code. The software industry comes up against this problem often, but fortunately has found a simple yet elegant solution—start over. Usually, rewriting lower- to middle-level interfaces can remove the accumulation of hacks. However, if the design criteria in question are incorporated throughout the code, sometimes only a total rewrite can truly help.
This is not solely a trait of software design. Software engineering is an expression of mathematics in a confined space (your computer). Through this heritage, it shares traits with other inexact sciences such as physics, where rewriting or reworking theories is not uncommon.
During the last five years, I've written many image-processing algorithms, from specialized routines for machine vision to complete libraries for commercial video and aerial image-processing software. The last commercial library has been in use for three years and has weathered many interface changes and enhancements. But just as each library was in some ways an improvement over previous attempts, I saw ways to improve performance and capability. Following in the footsteps of the makers of the six million dollar man—I wanted to make it faster, smarter and better than before and at significantly reduced cost.
My commercial library had a large amount of code tied to it, so simply modifying the existing code was not an option. It seemed as if I would never be able to incorporate a new library into my commercial work because of enormous design incompatibilities. Rather than have this new library be destined to collect electronic dust on my hard drive, I decided to start completely from scratch as open source. In late November 1998, the Large Image Manipulation Program (LIMP) was born.
It's likely that even as open source, this library would have been inconspicuous enough to draw little, if any, attention. However, after a few months spent developing LIMP in my spare time, Open Source Remote Sensing (OSRS, http://remotesensing.org/) was born. I was thrilled at the thought of having an open source library that was actually useful to someone, so LIMP was moved to OSRS for public development.
The purpose of LIMP is to allow the processing of large images using a minimal amount of memory. A number of available libraries can be used for image processing, any of which could be used to give identical results. The differences between these libraries can often be summed up by answering a few questions:
Can the data be processed on demand, or must all the processed data be in memory or on disk?
How easy is it to write new algorithms for the library?
How efficient is it?
The simplest image processing library would first allow loading an image into memory, then provide pixel-level access to the data (read and write), and finally allow storing the data back to disk. Advantages of such a scheme include a simple set of interfaces and (given enough memory or small enough images) nearly optimal computational efficiency. Disadvantages include high memory usage and therefore poor scaling with image size or number of images.
If memory usage is not a problem, this would be the optimal way of dealing with images. Unfortunately, memory cannot yet be considered infinite for many image-processing demands. As an example, the very first test of my last commercial library was to load 1200 images consisting of 300 gigabytes of data and display them at once. Actual processing was done in blocks of images to avoid having a failure wipe out weeks of processing time. In an attempt to avoid repeating historical blunders, I would never say that no one will ever have several hundred gigabytes of memory. I imagine when that time comes, people will work with even larger data sets.
In handling large images, many proprietary libraries reduce memory usage by sacrificing ease of use and efficiency. Great lengths can be taken to reduce the loss, but all such libraries are at a slight disadvantage in these areas and LIMP is no exception.
By learning from past experiences, many interfaces in LIMP have been designed to promote speed-enhancing optimizations as well as to group complex code into a few internal locations, where they can be more easily maintained. Because complex code is grouped into reusable templates, many types of functions and conversions can be written without having to deal with any complexity that would normally be encountered in large image processing.
LIMP is almost pure C++, as is most of the code I've written in the last several years. My feeling is the compiler should do as much of the work as possible, allowing the programmer to focus on higher-level concepts. C++ is certainly not the holy grail of programming, but I feel that using an object-oriented language is much simpler then manually approximating an object-oriented interface in a non-object-oriented language. After all, everyone knows all computer languages can be reduced to assembly language; it is just a matter of how much work you have to do vs. how much the compiler does for you.
The Qt library (www.troll.no) is used as both a widget set and a template library. Since all of the core library is independent of the display subsystem, a library such as STL could have been used, but Qt is better documented with no inconsistencies between platforms. I also intended to write graphical interfaces using the core libraries, so Qt was a natural choice.
Plug-in types are used for a number of interfaces. This makes it easy to add new implementations without changing existing code. Image loading, saving and serialization are accomplished using plug-in interfaces. Simple interpolation filters are also implemented this way. The plug-in manager is very generic and can handle plug-ins of any type. This also makes it possible to add run-time loading of external plug-ins, although this feature is not yet implemented.
An image in LIMP is nothing more than a class for caching and moving data. By itself, an image does not produce, or in any way modify, the data. All production and processing steps are performed in “layers”. An image can contain any number of layers, but the minimum for useful work is one—the source layer. This layer normally corresponds to some type of file loader (e.g., tiff), but can also be a simpler type such as an in-memory buffer, a constant-value image, or anything that can produce an image from scratch. In order to deal with large images efficiently, all data should be produced or loaded on-demand.
Other layers could perform such functions as data format conversion, radiometric or geometric transforms, and mosaics by combining multiple images. A number of data-type conversions are predefined (e.g., from RGB to YCbCr or RGB to gray-scale), and many other conversions can be easily defined. When processing layers are added to an image, everything about the image can be affected. The 2-D properties (width/height) can change, or the depth (samples per pixel, pixel data type, etc.) could be modified. For example, one class that modifies 2-D size is the zoom layer, which produces a new virtual image that is a magnification of an existing one. This can be used not only for zooming in on a visual image, but for up-sampling nearly any supported type.
By design, as much processing as possible is moved into an assembly-line approach. The basic unit for loading and processing data is a tile. All requests for data are made to an Image class, where it is broken up into tiles for processing. By determining which tiles are needed ahead of time, additional optimizations can be performed (for example, reordering the tile requests to optimize data cache hits). Each tile is processed and the ones not in the cache are created. All the complexity of chaining together layers of various types is dealt with by the Image class, which simplifies layer construction. When the time comes for a layer to process a given tile, it is presented with the input data space already filled and the output data space already allocated; therefore, it only has to process the pixels.
Linux is the primary development platform; however, efforts are being made to keep the library portable to other platforms. The GNU auto-configure tools are used to test for required system characteristics. LIMP is known to build on Red Hat/ix86 5.2, Irix/Mips 6.3, and Red Hat/Alpha 5.2—each using egcs 1.1.2.
Support exists for tiff images in a variety of formats; scan-line tiff, as well as tiled tiff, are supported. Color and gray-scale capability is known to work, but the tiff layer also supports a number of other formats including shorts, floats and doubles, as well as multi-channel types. Recent work by others, notably Frank Warmerdam, has resulted in support for other image formats by bridging to his open source GDAL library.
Many optimizations that have been designed for LIMP are not yet fully realized. This is not to say that LIMP is slow in its current state, but room for improvement exists within design specs. Already, some optimizations have been added which could be difficult to add in other architectures. For example, LIMP implements data-request optimizations that reorder requests to make optimal use of the image cache. Most of the optimizations are done internally, so the calling objects benefit from the optimizations without adding any extra complexity.
Today, LIMP is meant primarily for developers. The only thing usable for non-programmers is its image viewing program (imgview). imgview is a simple display tool, capable of viewing very large images quickly and without needing much memory. It takes user interface ideas from other image-processing tools to allow options such as dragging the canvas while updating the display in the background. It also supports a number of zoom filters for smoothing upsampled (i.e., enlarged) image data.
What LIMP does now is only the tip of the iceberg. The basic foundation has been laid, but many algorithms remain to be written. Every area of LIMP will most likely receive extensions as work progresses. Of course, the people who contribute to LIMP and OSRS are the people who will drive the development. I will outline some of the planned work to give ideas of the kinds of things I believe would be useful. This is not an all-inclusive list, as there are likely many interesting features which have yet to occur to me.
Support for geographic information needs to be added at some level to allow applications to learn how images are related to the real world and each other. This is a basic requirement for high-level GIS programs. It may not be necessary to put this information directly into LIMP, but the metadata information in an image provides a potentially convenient way of storing and retrieving such information.
The classes from LIMP's image viewer will be extended to handle multiple overlapping images as well as vector data. This is also somewhat detached from the core of LIMP, because it would create a new display pipeline. The image display class already consists of a sophisticated drawing class, which is capable of ordering and computing tiles for the display with minimal impact on the GUI. This impact can be reduced to almost nothing once Qt supports threaded event handling.
A wide variety of radiometric image adjustments will be useful. These will start as simple histogram stretches for viewing, and progress to more complex color and intensity modifications. This type of modification should add very little extra overhead to LIMP, as it was designed specifically to minimize the procedural and computational overhead of such objects.
For a more complete list of expected modifications to LIMP, see the TODO file in the LIMP distribution. Similarly, if you are interested in the progress, see the NEWS and ChangeLog files.
As with most libraries that cater to processing extremes, LIMP is not destined for a mass-market audience. Good solutions for dealing with similar image-processing demands, such as editing, already exist. The problems facing a designer of a general editing program, where every pixel may be changed interactively, are not driving our choices. Instead, LIMP is designed to deal well with scientific image-processing needs. In this category, I am most familiar with aerial and satellite image processing, but I imagine other fields have similar needs.
Images for the GIS market typically cover a large area with a relatively low image scale. One obvious potential parallel in another field would be small area images with higher image scales—as might be found in microscopy work. As everyone who has played with a fractal generator knows, if you zoom into an object far enough, the original object seen at that scale covers an incredibly immense area.
Aerial and satellite images have come to be processed and stored on computers only in recent history. As computers become more powerful and capable of accessing larger amounts of data, new attempts will undoubtedly be made to process and understand exponentially larger sets of data at even finer resolutions. LIMP is not expected to be a final answer to these problems, but is just an experiment in dealing with them while maintaining performance, ease of use and the sanity of its programmers.