Unix Power ToolsUnix Power ToolsSearch this book

13.7. Search RCS Files with rcsgrep

Storing multiple versions of a file in RCS (Section 39.5) saves space. How can you search a lot of those files at once? You could check out all the files, then run grep -- but you'll have to remove the files after you're done searching. Or, you could search the RCS files themselves with a command like grep foo RCS/*,v -- but that can show you garbage lines from previous revisions, log messages, and other text that isn't in the latest revision of your file. This article has two ways to solve that problem.

13.7.1. rcsgrep, rcsegrep, rcsfgrep

The rcsgrep script -- and two links to it named rcsegrep and rcsfgrep -- run grep , egrep (Section 13.4), and fgrep on all files in the RCS directory. (You can also choose the files to search.)

The script tests its name to decide whether to act like grep, egrep, or fgrep. Then it checks out each file and pipes it to the version of grep you chose. The output looks just like grep's -- although, by default, you'll also see the messages from the co command (the -s option silences those messages).

By default, rcsgrep searches the latest revision of every file. With the -a option, rcsgrep will search all revisions of every file, from first to last. This is very handy when you're trying to see what was changed in a particular place and to find which revision(s) have some text that was deleted some time ago. (rcsgrep uses rcsrevs (Section 39.6) to implement -a.)

Some grep options need special handling to work right in the script: -e, -f, and -l. (For instance, -e and -f have an argument after them. The script has to pass both the option and its argument.) The script passes any other options you type to the grep command. Your grep versions may have some other options that need special handling, too. Just edit the script to handle them.

13.7.2. rcsegrep.fast

To search an RCS file, rcsgrep and its cousins run several Unix processes: co, grep, sed, and others. Each process takes time to start and run. If your directory has hundreds of RCS files (like our directory for this book does), searching the whole thing can take a lot of time. I could have cut the number of processes by rewriting rcsgrep in Perl; Perl has the functionality of grep, sed, and others built in, so all it would need to do is run hundreds of co processes . . . which would still make it too slow.

Figure Go to http://examples.oreilly.com/upt3 for more information on: rcsegrep.fast

The solution I came up with was to do everything in (basically) one process: a gawk (Section 20.11) script. Instead of using the RCS co command to extract each file's latest revision, the rcsegrep.fast script reads each RCS file directly (The rcsfile(5) manpage explains the format of an RCS file.) An RCS file contains the latest revision of its working file as plain text, with one difference: each @ character is changed to @@. rcsegrep.fast searches the RCS file until it finds the beginning of the latest revision. Then it applies an egrep-like regular expression to each line. Matching lines are written to standard output with the filename first; the -n option gives a line number after the filename.

rcsegrep.fast is sort of a kludge because it's accessing RCS files without using RCS tools. There's a chance that it won't work on some versions of RCS or that I've made some other programming goof. But it's worked very well for us. It's much faster than rcsgrep and friends. I'd recommend using rcsegrep.fast when you need to search the latest revisions of a lot of RCS files; otherwise, stick to the rcsgreps.

-- JP



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.