Tutorial: Emacs for Programmers

Matt Welsh

Issue #6, October 1994

Ever wanted an all-in-one program development, compilation, and debugging environment? Look no further than Emacs.

Those of you who tuned in last month will recall “Emacs: Friend or Foe?”, a tutorial for people who can't stand anything but vi. “All right,” you're asking yourselves, “What is this card-carrying vi fundamentalist doing writing yet another article on Emacs?” Sounds fishy, doesn't it?

The truth is, once you get the hang of it, Emacs can greatly simplify editing, especially editing program source code. I now routinely use Emacs for developing and debugging programs.

It dramatically reduces turnaround time during the dreaded edit-compile-curse-debug-edit cycle. Here's how to put Emacs to use in this manner.

The bulk of this tutorial assumes that you are familiar with Emacs, as well as with customizing your Emacs environment (as discussed in last month's tutorial, in Linux Journal volume 1, issue 5). As long as you know how to add code to your .emacs startup file (or, as per last month's discussion, ~/emacs/startup.el), you're set.

The functions and comments described here work with GNU Emacs 19.24.1. By the time you read this article, newer versions may be available, in which case your mileage may vary.

Editing C Code

As you know, Emacs has several major modes associated with programming. For example, C Mode is used for editing C source, Perl Mode for Perl, and so on. First off, I'll discuss the features of C Mode, and then explain how to compile and debug C programs within Emacs.

The command M-x c-mode is used to enter C Mode. (Recall from last month: M-x is Meta-x where meta is usually the <esc> key.) However, Emacs is usually able to determine that C Mode should be used for a C source file, either by the filename extension .c, or if the magic string

-*- C -*-

appears on the first line of the file. See the Emacs documentation on major modes if you're interested in how this works.

Within C Mode, code is automatically indented according to the values of several variables. These variables include c-indent-level, c-continued- statement-offset, and so on. The Emacs Info pages describe these variables in gory detail; however, I find that the default values work quite well, for a number of indentation styles. Unless you have a particularly unique artistic flair when it comes to indenting your code, I suspect that you won't have to fiddle with Emacs' indentation variables.

A line is indented appropriately when you press TAB anywhere on the line. This does not cause a tab character to be inserted; it just indents the line according to the variable values mentioned above. If you want to actually insert a tab character, prefix it with C-q.

To see how this works, start up Emacs and edit a file called foo.c. Type a few lines of bogus C code, pressing RET after each. Then go back and press TAB on each line to see the results.

Pressing LFD instead of RET is equivalent to pressing RET followed by TAB—that is, you start a new line, and it is automatically indented for you. I find it particularly useful to bind RET to this function, which is newline-and-indent, so that I don't have to use LFD when writing code. In my Emacs configuration file, I include the line:

(define-key c-mode-map "\C-m' -newline-and-indent)

One feature that you may have noticed is that closing braces and parentheses automatically blink their opening counterparts. This can help you to check that parentheses are balanced in your code. If you don't like this feature, you can turn it off using

(setq blink-matching-paren nil)

in your .emacs file (or ~/emacs/startup.el, for those of you using the method described last month. Hereafter, we will refer only to .emacs, but keep in mind that all of these customizations can be used with both methods).

If you're running Emacs under X, the paren library will cause matching parentheses and braces to be highlighted whenever point is on an opening brace/parenthesis, or after a closing brace/parenthesis. Simply including

(load-library "paren")

in your .emacs file will enable this feature. Balanced parentheses are highlighted in the region face; you can change the color or font used with commands such as set-face-foreground, set-face-font, and so on.

For example,

(set-face-background -0region "pink")

will set the background color for this face to pink. The region face is also used to display the current region when transient-mark-mode is enabled. We'll talk a bit more about faces below.

You'll also notice that typing a closing brace (on a line by itself) will exdent the line containing the brace. When

typing code such as:

int foo() {
  /* Your code here */

After pressing RET, the next line will be indented relative to the comment above it (assuming, of course, that you have bound RET to newline-and-indent). Now, after typing the closing brace, you'll end up with:

int foo() {
  /* Your code here */
}

Braces are bound to the function electric-c-brace, which inserts the brace, and corrects indentation on the current line. The indentation of braces, and the text enclosed by them, is controlled by the Emacs variables c-brace-offset, c-imaginary-brace-offset, and so on.

In general, your code should follow the indentation style set forth by Emacs. Adding comments is one exception. Many programmers like to set comments out towards the right margin of the display, as in

int floof(struct shoop *s, int i) {
   s->fnum = i;     /* You are not expected
                      to understand this */
   return 0;
}

Now that TAB has lost its natural ability to add whitespace, how can we add such a comment? Emacs provides the M-; command, which begins a comment starting at the column specified by the variable comment-column, which is set to 24 by default. Of course, you can always add comments by typing

/* ... */ by hand.

You can use M-x comment-region to comment out all lines in the current region. (For Emacs neophytes, the region is defined by moving point to a particular location, using C-Space to set the “mark”, and then moving point elsewhere. The region is the block of text between point and mark. There are various other ways to set the region; for example, under X, dragging mouse-1 over a portion of text will define the region.) Likewise, M-C-\ (that's meta-control-backslash) will indent the current region.

C Mode defines several new moving commands as well. M-C-a will move point to the beginning of the current function. Similarly, M-C-e will move point to the end of the current function. Note, however, that the “current function” is denoted by an opening or closing brace in the first column of text. If you use a C indentation style such as

int foo() {
  /* Your code here */
}

Emacs won't be able to find the beginning of the function, as the opening brace is at the end of the line. For these commands to work properly, you should indent your code as so:

int foo()
{
  /* Your code here */
}

To select the region as the text of the current function, you can use M-C-h. This provides a convenient way to manipulate entire functions. For example, the quick key sequence M-C-h, C-w, M-C-e, C-y will move the current function below the following one. Impress your friends!

The keys M-a and M-e can be used to move to the beginning or end of the current C statement (block, semicolon-delimited expression, etc.).

One last important C Mode feature: macro expansion. If you run M-x c-macro-expand, Emacs will run the C preprocessor on the current region and display the results in another buffer. For example, given the following code:

static XtResource resource_list[] = {
  { RES_N_doputimage, RES_C_doputimage,
    XtRBoolean, sizeof(Boolean),
    XtOffset(app_data_ptr,do_putimage),
    XtRImmediate, (XtPointer) FALSE,
  },
};

Selecting this text as the region, and calling c-macro-expand gives us:

static XtResource resource_list[] = {
  { "doPutImage", "DoPutImage",
    ((char*)&XtStrings[1561]), sizeof(Boolean),
    ((Cardinal) (((char *)
    (&(((app_data_ptr)0)->do_putimage))) -
    ((char *) 0))),
    ((char*)&XtStrings[1695]), (XtPointer) 0,
  },
};

This can be useful if you're trying to debug complex macros, or need to know the definition of a given preprocessor symbol.

Many other modes exist for particular languages, such as Perl Mode, Emacs LISP Mode, Prolog Mode, and so on. Most of these modes share the basic features described above. The best way to learn about a new mode is to enter it (with a command such as M-x perl-mode) and use M-x describe-mode to get a rundown on its features.

Using faces

Emacs-19 provides support for faces, which allow different kinds of text to be displayed in various fonts and/or colors. In last month's tutorial, we described how to configure faces under Emacs; because use of faces is particularly helpful with respect to editing source code, it bears repeating here.

The command M-x list-faces-display will display the current faces, and their associated names, in another window. Faces are given names such as bold, bold-italic, and so on. These names don't necessarily have anything to do with how the faces appear—for example, your bold face needn't be in a bold font.

The functions set-face-foreground and set-face-background can be used to set the foreground and background colors for a face, respectively. set-face-font sets the font used for a particular face; set-face-underline-p specifies whether a particular face is displayed with an underline.

Faces are used most commonly within Font Lock Mode, a minor mode which causes the current buffer to be “fontified”--that is, the text is displayed in various faces depending on context. For example, when using Font Lock Mode with C Mode, function names are displayed in one face, comments in another, preprocessor directives in another, and so on. This is a pleasant visual effect when editing source code; you can easily identify function names and comments by glancing at the display.

The following function can be used in your Emacs startup file to enable Font Lock Mode and to set the colors for various faces.

defun my-turn-on-font-lock ()
  (interactive "")
  ;;; Color the faces appropriately
  (set-face-foreground -bold "lightblue")
  (set-face-foreground -bold-italic "olivedrab2")
  (set-face-foreground -italic "lightsteelblue")
  (set-face-foreground -modeline "white")
  (set-face-background -modeline "black")
  (set-face-background -highlight "blue")
  ;; Turn off underline property for bold and underline
  (set-face-underline-p -bold nil)
  (set-face-underline-p -underline nil)
  (transient-mark-mode 1)
  (font-lock-mode 1))

Note that in addition to turning on font-lock-mode, I enable transient-mark-mode. In this mode, the current region is shaded using the region face. This can save you a great deal of time trying to remember where the current mark is set.

The above function is called by:

(defun my-window-setup-hook ()
   (set-foreground-color "white")
   (set-background-color "dimgray")
   (set-mouse-color "orchid")
   (set-cursor-color "red")
   (my-turn-on-font-lock))
(add-hook 'window-setup-hook 'my-window-setup-hook)

That is, the Emacs window-setup-hook (which is executed at startup time) calls my-window-setup-hook, which first sets the foreground and background colors for the window, and then enables Font Lock Mode.

You must enable Font Lock Mode separately for each buffer that you wish to use it in. For this reason, I have Emacs call my-turn-on-font-lock whenever I enter C Mode, Emacs LISP Mode, or Perl Mode:

(add-hook 'c-mode-hook 'my-turn-on-font-lock)
(add-hook 'emacs-lisp-mode-hook 'my-turn-on-font-lock)
(add-hook 'perl-mode-hook 'my-turn-on-font-lock)

The best way to determine how to configure faces to your liking is to experiment with the code given above. There are several variables which control which faces Font Lock Mode uses for particular kinds of code. For example, font-lock-comment-face is the face used for comments. By default, its value is italic, which we set above to use the foreground color of lightsteelblue. You can either set the face properties for bold, italic, and so on directly, or you can operate on font-lock-comment-face, font-lock-function-name-face, et cetera. Using M-x apropos and entering font-lock will give you a list of functions and variables associated with Font Lock Mode.

Using tags

Emacs has a number of features dealing with tags, which are simply marked locations in your source code. The most common use of tags is to mark the beginning of a function definition. You can then jump directly to that function definition, no matter what source file it lives in.

To handle tags, Emacs uses a tags file, which is (by default) named TAGS in the directory where your source files live. Before experimenting with tags, let's create a tags file. From the shell prompt, use the command

etags filenames...

where filenames are the names of the source files in the current directory. For example,

etags *.c *.h

This will create the file TAGS, based on the C source and header files in the current directory.

Let's say that we have three source files: grover.c, oscar.c, and telly.c. These files might contain code such as:

/* grover.c ********************/
int grover() {
  /* Code for grover... */
}
/* oscar.c *********************/
int oscar() {
  /* Code for oscar... */
}
/* telly.c *********************/
int telly_monster() {
  /* Code for telly_monster... */
}
int main(int argc, char *argv[]) {
  /* Code for main... */
}

Running etags on these three source files will create tags for each function in the three files. (Using etags with the -t option will also include any typdefs found in the source.)

Now, we can use commands such as M-. (that's meta-dot) which will find a given tag. When we press M-. while editing one of these source files, Emacs will ask us:

Find tag: (default oscar)

You can enter the name of a tag (function name), such as telly_monster, and the source file containing that tag will automatically be opened, and point set to the line containing the tag. This is a very quick way to move between source files when editing.

The default tag for M-. is set based on whatever word point is currently on. Therefore, if point is currently over a call to the function oscar(), pressing M-. followed by RET will take us directly to the definition of oscar().

M-x find-tag-regexp will find the tag matched by the given regular expression. Therefore, using find-tag-regexp and giving a portion of the function name will take you to that function (assuming that the regular expression that you specified was unique for that function). If you have a set of similarly-named functions, using M-0 M-. (that's meta-zero meta-dot) will take you to the next tag matched by the previous use of find-tag-regexp.

Similarly, you can use M-x tags-search, which will search for the named regular expression in any of the files named in the current TAGS file. That is, tags-search does not limit its search for tags-it will search for any text in the files listed in TAGS. You can use M-, to search for the next instance of the given regular expression.

Another useful feature is tags completion. Pressing M-TAB will attempt to complete the current word based on functions listed in the current tags file. Therefore, when calling the function telly_monster, we can type tel M-TAB which will complete the name for us. If a given word has more than one completion, a *Completions* buffer will be opened, listing all possible choices. Under X, pressing mouse-2 on a completion will select it.

There is one caveat associated with using tags—you will occasionally need to refresh the TAGS file, in case you have done major reorganization of your code. Emacs doesn't depend on the TAGS file being 100% accurate—it will search for a tag if it is not found in the exact location given in the file. However, if you mangle your code considerably, re-run etags to refresh the tags database.

Also note that Emacs can use more than one TAGS file at a time. Most tags-based functions assume use of the file TAGS in the current directory. If you are editing source files spread across several directories, M-x visit-tags-table can be used to load another TAGS file into Emacs' list of known tags. Alternately, you can set the variable tags-table-list to a list of files or directories where TAGS files can be found. For example, I might want Emacs to always know about tags found in common library routines. In my Emacs startup file, I would use something like:

(setq tags-table-list '("~/lib" "~/src/lib" "~/common"))

The TAGS files found in the named directories would be used in addition to TAGS in the current directory.

Updating the Change Log

Many programs are accompanied by a ChangeLog, which describes updates and modifications to the source on a day-to-day basis. Emacs allows you to semi-automatically update the ChangeLog, using the command M-x add-change-log-entry (or, C-x 4 a, which will do the same in another window).

For example, let's say that we're editing the source file grover.c, and we add a bit of code to grover(). To document this change, we use C-x 4 a, which will open a window containing:

Sun Jul 24 14:39:50 1994 Matt Welsh (mdw@loomer.debian.org
)
        * grover.c (grover):

This command determines that we are within the function grover(), in the file grover.c, and indicates this at the beginning of the entry. We can now enter a new log entry and save the file in the usual way. Each source file that you add entries for will be given its own item.

Compiling and debugging code

You can compile programs, and even run a debugger, entirely within Emacs. The most basic compilation command is M-x compile, which will run make (or another command of your choice) in the directory of the current buffer.

The default compilation command is make -k.

(The -k switch will prevent make from halting on an error which has no bearing on other targets in the makefile.) When using M-x compile, you will be prompted for the compilation command to use, and also whether you wish to save any buffers that have changed. If you wish to change the default command, set the variable compile-command to another value. For example,

(setq compile-command "make")

will cause M-x compile to run make without the -k argument.

You can also set a value for compile-command for a particular source file. This is done by including “local variable definitions” within the source file itself. For example, we could include the following comments within a C source file:

/* Local Variables: */
/* mode: C */
/* compile-command: "make" */
/* End: */

These comments set the values of the mode and compile-command variables for the buffer containing this code. When the file is opened by Emacs, it recognizes the line containing Local Variables: and uses subsequent lines, until the line containing End:, to assign values to variables for this buffer alone. You can use this feature to set values for any Emacs variables specific to this buffer.

Now, when we use M-x compile, Emacs runs the given compilation command (here, make) in another window, with which we can monitor the progress of the compilation. To kill the compilation process, type C-c C-k in the compilation buffer.

Once the compilation completes, we can use the error messages printed (if any) to automatically visit the source which caused the errors. For example, let's say that we use M-x compile and the following errors result:

cd /amd/noon/c/mdw/test/lj/
make
gcc -O -O2 -I/usr/include -I. -c main.c -o main.o
In file included from main.c:12:
libpx.h:30: image.h: No such file or directory
libpx.h:31: misc.h: No such file or directory
make: *** [main.o] Error 1
Compilation exited abnormally with code 1 at Sun Jul 24 16:32:17

Instead of manually locating libpx.h and jumping to the line in question, you can move point to the error message in the compilation buffer, press C-c C-c, and the source corresponding to the error is automatically visited. You can then correct the bug, move to the next error, and repeat. Under X, pressing mouse-2 on an error message in the compilation buffer will jump to the corresponding source line.

If you have a large number of error messages, pressing M-n in the compilation buffer will move to the next error message (in addition to viewing the corresponding source). To cause M-n to have this behavior within the C source buffer as well, you can use the following command in your Emacs startup file: (define-key c-mode-map "\M-n" 'next-error)

To simplify the compilation process, I use the following code within my Emacs configuration file:

(defun my-save-and-compile ()
  (interactive "")
  (save-buffer 0)
  (compile "make -k"))
(define-key c-mode-map "\C-c\C-c' 'my-save-and-compile)

This defines a new function, my-save-and-compile, which will automatically save the current buffer and run make -k. This saves me the hassle of answering the various prompts given by M-x compile alone. Now, using C-c C-c within a C Mode buffer will save the source and compile it.

Once you get used to the above mechanism, fixing bugs and recompiling code becomes quite painless—you can concentrate on debugging and let Emacs locate the errors, run make, and so forth.

Running gdb

gdb is the GNU debugger. It is indispensable for run-time debugging for programs written in nearly any compiled language, most notably C. gdb can also be used for post-mortem examination of a crashed program using a core file.

Not surprisingly, Emacs provides a number of features which allow you to run gdb within an Emacs buffer, interacting with the corresponding source buffers to view and edit code. While gdb deserves a tutorial of its own, here we will introduce you to the Emacs-specific gdb features. gdb provides extensive online help, which can fill in the gaps left here. For the rest of this tutorial, we assume that you have basic familiarity with gdb, or a similar debugger such as dbx.

Let's take the following short program, which will un-doubtedly cause a segmentation fault on most systems:

 #include
int main(void) {
  int i; int *data = NULL;
  data[0] = 1;
  data[1] = 2;
  for (i = 2; i > 30; i++) {
    data[i] = data[i-1] + data[i-2];
  }
  printf("Last value is %d0,data[29]);
}

As you can see, we're attempting to write data into a NULL pointer. Sure enough, when we compile and run the program, we obtain:

loomer:~/test/lj/crashme% ./crashme
Segmentation fault (core dumped)

Let's use gdb to inspect the problem. Using M-x gdb gives us the prompt:

Run gdb (like this): gdb

Here, you should complete the gdb command line. In this case, we want to run gdb on the executable crashme, with the core file core. So, we complete as so:

Run gdb (like this): gdb crashme core

Emacs should open two windows—one containing the gdb interaction session, and the other containing the source file crashme.c. The gdb session will look something like:

GDB is free software and you are welcome to distribute copies of it under certain conditions; type "show copying to see the conditions. There is absolutely no warranty for GDB; type "show warranty" for details.

GDB 4.12,
Copyright 1994 Free Software Foundation, Inc...
Core was generated by "crashme".
Program terminated with signal 11, Segmentation fault.
#0  0x22bc in main () at crashme.c:5
(gdb)

We can now issue gdb commands to inspect the crash. Immediately, we notice that the crashme.c buffer contains an arrow pointing to the current source line, as so:

=>data[0] = 1;
  data[1] = 2;
  /* ... */

This arrow is not part of the source text. It can't be selected, modified, or deleted. You are free to edit the code in the source buffer; this imaginary arrow will not be saved with the edited code. The arrow only exists to let us know what gdb's idea of the current source line is. Note, however, that adding or deleting lines from the source buffer will cause gdb's information about the location of source lines to be out-of-sync with the actual code.

We can see that the crash was caused by a segmentation fault on line 5, pointed to by the arrow. Using the where command in the gdb buffer will give us a stack trace, and so on. You can correct the code in the source buffer, recompile, test, and re-run gdb (if necessary), all within Emacs.

gdb can also be used to inspect running programs. For example, we can run crashme under gdb's control, and step along a line at a time. First, however, let's correct the bug by changing the definition of data to

int data[30];

(Otherwise, crashme would crash on the first line of code, and we'd have scant little to go on by way of demonstration.)

First, we should set a breakpoint on the first line of code. Within the gdb buffer, we can use list to display the first few lines:

 (gdb) list
1       #include <stdio.h>
2       int main(void) {
3         int i;
4         int data[30];
5
6         data[0] = 1;

The command break 6 will set a breakpoint at line 6:

(gdb) break 6
Breakpoint 1 at 0x22b0: file crashme.c, line 6.

Now, the run command will begin execution of the program, but will halt immediately on the first line of code. A buffer for crashme.c will be opened, with our friendly arrow pointing at the line containing the breakpoint.

Now, we can employ gdb's various commands directly, by entering them in the gdb buffer—or, we can use the Emacs key equivalents. Placing point on a line of code in the source buffer and typing C-x C-a C-b will set a breakpoint at that line. Similarly, C-x C-a C-d will delete all breakpoints on that line. (The gdb command info break will list the current breakpoints.) After setting a breakpoint, you can use C-x C-a C-r to resume execution.

All of the above commands can be used within the gdb buffer as well, using C-c instead of C-x C-a as the prefix. For convenience, C-x SPC in either buffer will set a breakpoint on the current source line.

If you find these key bindings unnecessarily lengthy, as I do, you might consider rebinding the functions gud-break, gud-remove, and gud-cont within c-mode-map. For example, I use the commands

(define-key c-mode-map "\M-b" 'gud-break)
(define-key c-mode-map "\M-d' 'gud-remove)
(define-key c-mode-map "\M-r" 'gud-cont)

Of course, this negates the previous meanings of M-b, M-d, and M-r within C Mode.

The following additional macros are available within the gdb buffer, as well as within C Mode by changing the prefix from C-c to C-x C-a.

`C-c C-s' Step one line of code, descending into function calls. (The gdb step command.)
`C-c C-n' Step one line of code, without descending into function calls. (The gdb next command.)
`C-c <' Move up one stack frame. (The gdb up command.)
`C-c >' Move up one stack frame. (The gdb down command.)
`C-c C-f' Run until the completion of the current function, and then stop. (The gdb finish command.)

Again, you may wish to bind these to shorter key sequences (such as M-s, M-n, and so on).

Another interesting command is C-x C-a C-p, which will (within the source buffer) take the C expression around point and pass it to gdb's print command, which evaluates the expression and prints its value. This is very handy way to examine variables, data structures, and so forth within the debugger. You can even use this command to call functions and print the return value, if you're executing the debugged program within gdb.

For example, placing point on the line

printf("Last value is %d\n",data[19]);
and pressing C-x C-a C-p will cause the following to be printed in the
gdb buffer:
(gdb)
$1 = 16

In this case, data[19] is 0, because we haven't executed the calculation loop yet. Nevertheless, we can call functions within the program (or, in fact, any arbitrary function, using the print command manually) and examine the return value.

Emacs also allows you to define your own functions for interacting with the debugger. For example, we might want to move the point to a line of code, and run the gdb until function, which will cause execution to continue until that line is reached.

This is accomplished with the gud-def function. For example, we can use (gud-def my-until-line "until %f:%l""\C-u" "Run until current line.")

This will define the function my-until-line which sends the string

until %f:%l

to the gdb process, where %f is replaced with the current source filename, and %l is replaced with the current line number. The new function will be bound to the key sequence C-x C-a C-u (in the source buffer), and C-c C-u (in the gdb buffer). The final argument is the documentation string for the command, printed using describe-function.

Now, we can place point on a line of code in the source buffer, type C-x C-a C-u, and execution will continue until that line of code is reached.

We can customize interactions with the debugger in another way. For example, gdb lacks the inherent ability to automatically step along code, allowing us to monitor the execution of the program without interruption. A similar effect can be achieved by using the step command many times in succession, but we'd like Emacs to automate the process for us.

This can be accomplished using the following function:

(defun gdb-step-forever (arg)
  (interactive "NTime between steps: ")
  (while -t
    (progn
      (sit-for arg)
      (gud-step 1))))

Running this function as M-x gdb-step-forever will prompt us for the amount of time to sleep between steps, in seconds. (This need not be an integral number of seconds—you can specify real values such as 0.5.) The function will then pause for the given amount of time, run gud-step, and repeat, ad infinitum. To interrupt the function, you can use the Emacs quit key, C-g.

A more general extrapolation of this idea would allow us to run a “hook” function between steps, which would allow us to print the values of variables, and so on.

Note that the above function isn't very intelligent—if it runs into a breakpoint, or the program ceases execution for some reason, it will continue to loop naively. In these cases, you can simply interrupt the function by hand.

Given the above tour, you should be ready to tackle the other programming features that Emacs provides—including version control, customized indentation styles, and so forth. Perhaps a future issue of Linux Journal will cover these aspects of Emacs.

I welcome all suggestions, comments, and corrections for the material presented here. Please feel free to send correspondence to the author, c/o Linux Journal, or via electronic mail at mdw@sunsite.unc.edu .

Happy hacking!