Strategies for reducing Linux's footprint, leaving more resources for the application or letting engineers further reduce the hardware cost of the device.
“How small can you make this?” is a question frequently heard by embedded engineers at the start of their projects. Most of the time, the person asking this question is concerned with reducing the RAM and Flash resources with the goal of reducing a device's unit costs or energy requirements.
Because Linux, and the surrounding environment, originally was intended for desktop or server systems, its default configuration isn't optimized for size. However, as Linux is finding itself in more embedded devices, making Linux “small” isn't as daunting a task as it once was. There are several different approaches for reducing the memory footprint of a system.
Many engineers start by reducing the size of the kernel; however, there is lower-hanging fruit at hand. This article goes into detail about how to reduce the size of the kernel, mostly by removing code that won't even be used in a typical embedded system.
A root filesystem (RFS) can be the largest consumer of memory resources in a system. A root filesystem contains the infrastructure code used by an application as well as the C library. Selecting the filesystem used for the RFS itself can have a large effect on the final size. The standard, ext3, is frightfully inefficient on several axes from an embedded engineer's perspective, but that's a topic for another article.
Even the smallest Linux distribution has at least two parts: a kernel and root filesystem. Sometimes, these components are colocated in the same file, but they're still separate and distinct components. By removing nearly all features from the kernel (networking, error logging and support for most devices) and making the root filesystem just the application, the size of a system easily can be less than 1MB. However, many users choose Linux for the networking and device support, so this isn't a realistic scenario.
The Linux kernel is interesting in that although it depends on GCC during compilation time, it has no dependencies at runtime. Those engineers new to Linux confuse the initial RAM disk (so-called initrd) with a kernel runtime dependency. The initrd is mounted first by the kernel, and a program runs that interrogates the system in order to figure out what modules need to be loaded in order to support the devices, so that the “real” root filesystem can be mounted. In fact, the two-step mounting, the initrd followed by the real root filesystem, rarely finds its way into embedded systems as the gain in flexibility in a system that does change isn't worth the additional space or time. But, this topic falls under the rubric of the root filesystem and is discussed later in this article.
Most of the effort in reducing kernel size lies in removing what's not needed. Because the kernel is configured for desktop and server systems, it has many features enabled that wouldn't be used in an embedded system.
Kernel loadable modules are re-locatable code that the kernel links into itself at runtime. The typical use cases for loadable modules are allowing drivers to be loaded into the kernel from user space (typically after some probing process) and allowing the upgrade of device drivers without taking down the system. For most embedded systems, once they're out in the field, changing the root filesystem is either impractical or impossible, so the system's designer links the modules directly into the kernel, removing the need for loadable modules. The space-saving in this area isn't limited to the kernel, however, as the programs managing loadable modules (such as insmod, rmmod and lsmod) and the shell script to load them aren't necessary.
The Linux-tiny set of patches has been an on-again-off-again project that originally was spearheaded by Matt Mackall. The Consumer Electronics Linux Forum (CELF) has put effort into reviving the project, and the CELF Developer's Wiki has patches for the 2.6.22.5 kernel (at the time of this writing). In the meantime, many of the changes in the Linux-tiny Project have been included in the mainline kernel. Even if many of the original Linux-tiny patches have made it into the kernel, some substantial space-saving patches haven't, such as:
Fine-grain printk support: users can have control over what files can use printk. This allows engineers to reap the size benefits of excluding printk for the kernel at large while still having access to their favorite debugger in the places where it's needed most.
Change CRC from calculation to use table lookup: Ethernet packets require a CRC to validate the integrity of the packet. This implementation of the CRC algorithm uses table lookups instead of calculations, saving about 2K.
Network tweaking: several patches reduce the supported network protocols, buffer sizes and open sockets. Many embedded devices support only a few protocols and don't need to service thousands of connections.
No panic reporting: if the device has three status lights and a serial connection, the user won't be able to see, much less act on, panic information that appears on a (nonexistent console). If the device has a kernel panic failure, the user simply will power-cycle the device.
Reduction of inlining: an inline is where the compiler, instead of generating a call to a function, treats it as a macro, putting a copy of the code in each place it is called. Although the inline directive is technically a hint, GCC will inline any function by default. By suppressing inline functions, the code runs slightly slower, as the compiler needs to generate code for a call and return; in exchange, however, the object file is smaller.
The Linux-tiny patches are distributed in a tar archive that can be applied with the quilt utility or applied individually.
Although the Linux-tiny Project covers a lot of ground, several additional configuration changes will result in substantial footprint reductions:
Remove ext2/3 support and use a different filesystem: the ext2/3 filesystem is large, a little more than 32K. Most engineers enable a Flash filesystem, but don't disable ext2/3, wasting memory in the process.
Remove support for sysctl: sysctl allows the user to tweak kernel parameters at runtime. In most embedded devices, the kernel configuration is known and won't change, making this feature a wasted 1K.
Reduce IPC options: most systems can do without SysV IPC features (grep your code for msgget, msgct, msgsnd and msgrcv) and POSIX message queues (grep for mq_*[a-z]), and removing them scores another 18K.
The size command reports the amount of code and data in an object file. This is different from the output of the ls command, which reports the number of bytes in the filesystem.
For example, a kernel compiled with an armv5l cross-compiler reports the following:
# armv5l-linux-size vmlinx text data bss dec hex filename 2080300 99904 99312 2279516 22c85c vmlinux
The text section is the code (discovering the historical reason the code is in the text section is an exercise left to the reader) emitted by the compiler. The data section contains the values for globals and other values used to initialize static symbols. The bss section contains static data that is zeroed out as part of initialization.
Although this data is revealing, it doesn't show what parts of the system are consuming memory. There isn't a way to query vmlinux for that information, but looking at the files linked together to create vmlinux is the next best thing. To get this information, use find to locate the built-in.o files in the kernel project and pass those results to size:
# find . -name "built-in.o" | xargs armv5l-linux-size ↪--totals | sort -n -k4
The output of this command is similar to the following:
text data bss dec hex filename 189680 16224 33944 239848 3a8e8 ./kernel/built-in.o 257872 10056 5636 273564 42c9c ./net/ipv4/built-in.o 369396 9184 34824 413404 64edc ./fs/built-in.o 452116 15820 11632 479568 75150 ./net/built-in.o 484276 36744 14216 535236 82ac4 ./drivers/built-in.o 3110478 180000 159241 3449719 34a377 (TOTALS)
This technique makes spotting code that occupies a large amount of space obvious, so engineers working on a project can remove those features first. When taking this approach, users shouldn't forget to do a clean make between builds, as dropping a feature from the kernel doesn't mean that the object file from the prior build will be deleted.
For those new to the Linux kernel, a common question is how to associate some built-in.o file with an option in the kernel configuration program. This can be done by looking at the Makefile and the Kconfig file in the directory. The Makefile will contain a line that looks like this:
obj-$(CONFIG_ATALK) += p8022.o psnap.o
which will result in the files on the right-hand side being built when the user sets the configuration variable CONFIG_ATALK. However, the kernel configuration tool typically doesn't readily expose the underling configuration variable names. To find the link between the variable name and what's visible, look for the variable name, sans the CONFIG_, in the files (Kconfig) used to drive the kernel configuration editor:
find . -name Kconfig -exec fgrep -H -C3 "config ATALK" {} \;
which produces the following output:
./drivers/net/appletalk/Kconfig-# ./drivers/net/appletalk/Kconfig-# Appletalk driver configuration ./drivers/net/appletalk/Kconfig-# ./drivers/net/appletalk/Kconfig:config ATALK ./drivers/net/appletalk/Kconfig- tristate "Appletalk protocol support" ./drivers/net/appletalk/Kconfig- select LLC ./drivers/net/appletalk/Kconfig- ---help---
There's still some hunting to do, as the user needs to find where “Appletalk protocol support” appears in the configuration hierarchy, but at least there's a clear idea of what's being sought.
For many embedded engineers new to Linux, the notion of a root filesystem on an embedded device is a foreign concept. Embedded solutions before Linux worked by linking the application code directly into the kernel. Because Linux has a well-defined separation between the kernel and root filesystem, the work on minimizing the system doesn't end with making the kernel small. Before optimization, the size of the root filesystem dwarfs that of the kernel; however, in the Linux tradition, this part of the system has many knobs to turn to reduce the size of this component.
The first question to answer is “Do I need a root filesystem at all?” In short, yes. At the end of the kernel's startup process, it looks for a root filesystem, which it mounts and runs the first process (usually init; doing ps aux | head -2 will tell you what it is on your system). In the absence of either the root filesystem or the initial program, the kernel panics and stops running.
The smallest root filesystem can be one file: the application for the device. In this case, the init kernel parameter points to a file and that is the first (and only) process in userland. So long as that process is running, the system will work just fine. However, if the program exits for any reason, the kernel will panic, stop running, and the device will require a reboot. For that reason alone, even the most space-constrained systems opt for an init program. For a very small overhead, init includes the code to respawn a process that dies, preventing a kernel panic in the event of an application crash.
Most Linux systems are more complex, including several executable files and frequently shared libraries containing code shared by applications running on the device. For these filesystems, several options exist to reduce the size of the RFS greatly.
Combined with GCC, most users don't think of the C library as a separate entity. The C language contains only 32 keywords (give or take a few), so most of the bytes in a C program are those from the standard library. The canonical C library, glibc, has been designed for compatibility, internationalization and platform support rather than size. However, several alternatives exist that have been engineered from inception to be small:
uClibc: this project started as an implementation of the C library for processors without a memory management unit (MMU-less). uClibc was created from the beginning to be small while supplying the same functionality of glibc, by dropping features like internationalization, wide character support and binary compatibility. Furthermore, uClibc's configuration utility gives users great freedom in selecting what code goes into the library, allowing users to reduce the size further.
uClibc++: for those using C++, this library is implemented under the same design principles. With support for most of the C++ standard library, engineers easily can deploy C++-based applications onboard with only a few megabytes.
Newlib: Newlib grew out of Red Hat's foray into the embedded market. Newlib has a very complete implementation of the math library and therefore finds favor with users doing control or measurement applications.
dietlibc: still the smallest of the bunch, dietlibc is the best kept secret among replacements for glibc. Extremely small, 70K small in fact, dietlibc manages to be small by dropping features, such as dynamically linked libraries. It has excellent support for ARM and MIPS.
Both Newlib and dietlibc work by providing a wrapper script that invokes the compiler with the proper set of parameters to ignore the regular C libraries included with the compiler and instead use the ones specified. uClibc is a little different as it requires that the toolchain be built from source, supplying tools to do the job in the buildroot project.
Once you know how to invoke GCC so it uses the right compiler, the next step is updating the makefiles or build scripts for the project. In most cases, the build for the project resides in a makefile with a line that looks like this:
CC=CROSS_COMPILE-gcc
In this case, all the user needs to do is run make and override the CC variable from the command line:
make CC=dietc
This results in the makefile invoking diet for the C compiler. Although it's tempting, don't add parameters into this macro; instead, use the CFLAGS variable. For example:
make CC="gcc -Os"
should be:
make CC=gcc CFLAGS="-Os"
This is important, because some rules will invoke CC for things other than compilation, and the parameters will not make sense and result in an error.
After selecting the C library, all of the code in the root filesystem needs to be compiled with the new compiler, so that code can take advantage of the newer, smaller C library. At this point, it's worth evaluating whether static versus shared libraries are the right choice for the target. Shared libraries work best if the device will have arbitrary code running and if that code isn't known at the time of deployment; for example, the device may expose an API and allow end users or field engineers to write modules. In this case, having the libraries on the device would afford the greatest flexibility for those implementing new features.
Shared libraries also would be a good choice if the system contained many separate programs instead of one or two programs. In this case, having one copy of the shared code would be smaller than the same code duplicated in several files.
Systems with a few programs merit closer consideration. When only a few programs are in use, the best thing to do is create a system each way and compare the resulting size. In most cases, the smaller system is the one with no shared libraries. As an added benefit, systems without shared libraries load and start running programs faster (as there's no linking step), so users benefit from an efficiency perspective as well.
Although there's no magic tool for making a system smaller, there is no shortage of tools to help make a system as small as possible. Furthermore, making Linux “small” is more than reducing the size of the kernel; the root filesystem needs to be examined critically and paired down, as this component usually consumes more space than the kernel. This article concentrated on the executable image size; reducing the memory requirements of the program once it is running constitutes a separate project.