GNU Compiler Collection 4.2

Parallel Speed


The latest GNU compiler provides better support for parallel programming, and GCC also rolls out some new optimization features. We took GCC 4.2 for a test drive.

By René Rebe and Susanne Klaus

m.waidmann, photocase.com

After much debate and the usual delays, the latest version of the GNU C/C++ compiler (GCC) has finally materialized. Version 4.2 of GCC [1] follows in the trail of many major and minor changes. For a complete list of changes, refer to the GCC homepage [2].

The most significant change with version 4.2 is support for OpenMP [3], an open standard for program parallelization - especially for systems with shared memory. OpenMP lets programmers specify how the compiler and run-time systems will distribute code segments over multiple threads for parallel execution on multi-core systems.

The implementation of OpenMP in the new GCC version simplifies parallel programming on supported systems. Developers no longer need to go through the complicated and error-prone process of manually customizing their source code to match architecture-specific APIs.

OpenMP Features

The current GCC version supports the full set of OpenMP features. During the build, keywords determine how parallel execution of the code takes effect.

Pragmas allow compilers to build code with OpenMP extensions without being OpenMP-aware, which avoids platform-specific code and an inpenetrable Ifdef jungle. The #pragma omp ... keyword tells the compiler that parallelized optimization can take place. Behind the scenes, GCC uses the Pthread library to create the matching threads for Unix-style systems such as Linux.

In the simplest case, loops with a significant iteration space - that is, wherever it is worthwhile creating new threads - are tagged as follows:

#pragma omp for
for(i = 0; i < N; i++)
  a[i] += b[i];

OpenMP has flexible controls; for example, you can tell a thread in a complex algorithm to use a local variable, which the program adds at the end (reduction):

#pragma omp parallel forprivate(w) reduction(+:sum)schedule(static,1)
for(i = 0; i < N; i++)
{
  w = i*i;
  sum = sum + w*a[i];
}

The ability to ascertain the thread ID is more useful for testing than it is for algorithms:

#pragma omp parallel private(id)
int id = omp_get_thread_num();
printf \
("This is thread %d\n", id);

With the x86 processor family continuing to grow, the x86 back-end now has two new architecture options. The native architecture directive tells GCC to apply best-possible optimization for the existing process at build time, based on cpuid instructions, while generic creates programs that will run equally well on AMD, Intel, or Via CPUs.

Bits and Bobs

A new warning, which can be enabled using the -Waddress option, and which is contained in -Wall, points out typical programming errors occurring in comparisons between function pointers and string literal addresses. The -Wextra option issues a warning in the case of an if expression followed by a semicolon to avoid typos like this one:

if (a);
return 1;
return 0;

The new compiler promises to reduce program launch time and, more specifically, the time the dynamic linker needs to resolve symbols, which is an issue that developers have been unhappy with for some time, especially in the case of C++. Local symbols are no longer visible by default, and the compiler automatically applies class visibility attributes to members.

The -fno-toplevel-reorder option now makes it possible to output functions and variables in source code file order for code such as inline assembler that relies on a specific code order.

Note that new overflow optimization takes place as of optimization level -O2. The new complier can assume that an overflow will not occur for a loop such as for (int i=1; i>0; i*=2) and thus optimize to form an infinite loop.

The GCC developers have added functionality to the new "200x" C++ standard, which is still in the standardization phase. For example, the TR1 namespace now includes <random>, <complex>. The lock-free container templates developed during Google's Summer of Code have also been integrated.

Regression

The good news is that GCC version 4.2 does not introduce many new bugs. A short test, in which we used the new compiler to build a complete system with T2 [4], yielded just two errors.

For one thing, far more memory was needed to build a couple of files that belong to the Xorg server [5] package, forcing the kernel to terminate the compiler on systems with less than 1GB RAM. For another, OpenSSL uses function pointer typecasts [6] in a way that the C standard does not define; this causes the program to quit at run time [7].

Benchmarks

The lab machine was an Intel Core 2 Duo with a clock speed of 2GHz and 1GB RAM.

We used the current version of Open Bench to test GCC versions 3.4, 4.0, 4.1, and 4.2 and compiled with the CPU in 64-bit mode for 64-bit programs. We measured the build time (Figure 1) and the run time in seconds (or the run time per iteration in milliseconds in the case of OpenSSL) (Figure 2).

Figure 1: Benchmark results for build times of programs in the test.

Figure 2: This diagram shows the run times for various compiler versions.

Shorter Run Time

On initial inspection, we noticed that version 4.2 spends more time optimizing than its predecessors. The reward for this effort is a shorter run time, even for some legacy C programs.

Faster Build Time

Although the compiler is far slower when the typical -O2 and -O3 optimization levels are enabled, the build time during software development using -O0 is faster.

A quick inspection of the logfiles for the benchmark build reveals that the new compiler vectorizes more loops - 14 for Gzip compared with 12 for GCC 4.1.

Conclusions

OpenMP integration in GCC 4.2 facilitates the task of programming on multi-core systems, which helps the free compiler project keep pace with commercial compilers.

Thanks to the widespread introduction of multi-core CPUs, parallelization has become a big topic for many programmers. The fact that each version of the compiler has taken more time to optimize code is slightly worrying.

New Projects

The new projects scheduled for completion before the new GCC version 4.3 is released include the Eclipse project's Java compiler, which has full support for Java 1.5. Integration of the MPFR library will help standardize calls to standard mathematical functions.

Support for the future 200x C++ standard will be extended in the next version of GCC. Optimization functions for more recent CPU types, such as Core 2 Duo and AMD Geode, have already made their way into the current GCC developer version.

INFO
[1] GCC homepage: http://gcc.gnu.org
[2] GCC 4.2 changelog: http://gcc.gnu.org/gcc-4.2/changes.html
[3] OpenMP: http://www.openmp.org/
[4] T2 SDE: http://www.t2-project.org
[5] Bugzilla report on Xorg server: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31172
[6] Patch for OpenSSL with GCC 4.2: http://www.nabble.com/-PATCH--OpenSSL-vs-GCC-4.2.0-t3795606.htm
[7] Open Bench: http://www.exactcode.de/site/open_source/openbench