Home  >  Magazine  >  #73 May 2000  >  Shell Functions and Path Variables, Part 3
Tuesday, March 21, 2000 | Last Updated 11:46:45 AM


  

Shell Functions and Path Variables, Part 3

A continuation of our introduction to path variables and elements.

by Steve Collyer

In this final article in the series, I'll describe the remaining path handling functions and point out a few implementation issues. Before I do that, however, I will describe a utility called makepath. This reads either standard input or its argument list, builds a colon-separated path variable (pathvar) from those lines read and echoes it to standard output. For example:

$ makepath /bin /usr/bin /opt/kde/bin
/bin:/usr/bin:/opt/kde/bin
makepath is used in several of the pathvar utilities to reconstruct a pathvar after its path elements (pathels) have been altered. I won't show you the innards of makepath, as they're somewhat tangential to the main topic and rather trivial.

listpath

First, let's look at listpath, which echoes the pathels making up a pathvar on separate lines, as in:

$ listpath -p MANPATH
/usr/man
/usr/local/man
/opt/CC/man
Using listpath has two advantages over merely echoing $MANPATH. First, it's much easier to read the pathels when they appear on separate lines; and secondly, you can pipe its output through grep:

$ listpath | grep bin 
/opt/kde/bin
/usr/local/bin
/bin
There is no option-handling code we did not see in the addpath function, so let's look at the main code:

eval echo 
This is very simple. We just echo the contents of the specified pathvar into the colon2line function (included in the tar file mentioned at the end of this article), which turns the embedded : characters into newline characters. I described the operation of this piece of code in some detail in Part 2, so I won't repeat it here. Take a look at that article (http://www.linuxjournal.com/lj.issues/issue72/3768.html) if you're not sure why the eval is there.

delpath

We have already seen the addpath function, which performs an idempotent addition of a pathel to a pathvar. The converse of this behaviour is provided by delpath, which removes pathels from a pathvar. So, for example:

delpath /opt/CC/test/bin
will remove the /opt/CC/test/bin directory from $PATH and:

delpath -e "(bill|steve)" -p MANPATH
will remove all pathels matching the egrep-style regular expression "(bill|steve)" from $MANPATH. The command

delpath -n
removes all non-existent directories from $PATH.

Although delpath is not a function you are likely to need often, there is one place where it can be useful. Many UNIX machines have a file called /etc/PATH, which is sourced by /etc/profile. It sets up a default PATH containing directories required by all users. Too often, though, /etc/PATH is not modified for years at a time, and the directories added either no longer exist or are not truly required by all. In this case, you can call delpath at the start of an appropriate login script (.profile or .bash_profile) to remove the directories you do not need.

Let's look at the delpath code. I'll skip most of the option handling, as much of it is identical to that in addpath.

MATCH="-x"            # default
[ -n "$opt_e" ] && MATCH= # make grep use regexps
FILTER=              # default
[ -n "$opt_n" ] && FILTER="| realpath_filter"
Here, we see the final section of the option handling. The MATCH variable determines whether we handle the supplied path description as a regular expression or not. It is used later as an option to grep; grep -x tells grep to perform exact string matches.

The FILTER variable implements the -n option, i.e., the ``remove non-existent directories'' behaviour. If the user supplies the -n option, FILTER contains a string which pipes the output of a previous command through a program called realpath_filter. This program reads directory names from its standard input and writes the name to standard output only if it is an existing directory. I'll leave it as an easy exercise for the reader to implement such a filter.

The remainder of delpath is as follows:

eval listpath -p $pathvar $FILTER |
  grep -v -E $MATCH "$1"> /tmp/makepath_in.$$
eval $pathvar=$(makepath < /tmp/makepath_in.$$)
rm /tmp/makepath_in.$$
The function does its work in three stages. The first command generates a file in /tmp containing those directories that are not to be deleted. The second command rebuilds the pathvar from that file using makepath. Finally we remove the file; we don't want it cluttering up the file system after the function finishes. (The shell expands $$ to the process ID of the shell running the command; I'll assume it is 20610 in this article.)

Let's look at the first line. Essentially, it uses listpath to break the appropriate pathvar into separate lines, and grep to remove those we don't want. It's slightly complicated, however, by the presence of the FILTER variable. Suppose the user types:

delpath -e "^opt"
which means ``remove all directories starting with the opt string from $PATH''. In this case, pathvar will contain PATH, while MATCH and FILTER will be empty. The first line will therefore expand to:

eval listpath -p PATH |
  grep -v -E "^opt<" > /tmp/makepath_in.20610
This is straightforward--listpath writes the pathels in PATH into the grep command, which we use to echo non-matching lines only (-v). We redirect the output into our temporary file, which will contain those pathels not starting with opt. In this case, the leading eval is unnecessary. However, if the user types

delpath -n
to remove all non-existent directories from $PATH, then the first line expands to

eval listpath -p PATH | realpath_filter |
grep -v -E "" > /tmp/makepath_in.20610
During the initial processing of the line (i.e., before the eval has forced re-evaluation), the shell saw the pipe symbol preceding the grep, but it did not see the pipe symbol preceding realpath_filter. As things stand at the moment, the shell sees the first | as a literal character and will pass it as an argument to listpath. This happened because the shell looks for | characters before it expands variables, and the | character preceding realpath_filter was stored in a variable. The second evaluation caused by the eval ensures the pipeline that runs the realpath_filter command is constructed.

We now have a file containing only the required pathels. The second line in delpath rebuilds the pathvar from that file, using the following code:

eval $pathvar=$(makepath < /tmp/makepath_in.$$)
This shouldn't cause us too many problems. First, makepath simply reads the lines in the file, builds a colon-separated pathvar and echoes it. We run makepath in command-substitution mode (that's the $(...) which I described in Part 2), so makepath's output is used as the right-hand side of the variable assignment. The initial eval is required due to the order in which the shell evaluates a command. Because it looks for assignment statements before expanding variables, it won't recognize that the command contains a valid assignment. The eval ensures the assignment takes place the second time the line is processed.

uniqpath

Suppose you log on to your UNIX system and discover, for reasons beyond your control, that PATH is full of duplicate entries. (Humour me. It does happen. Maybe your system administrator modified /etc/PATH inadvisedly). Let's assume these duplicates are making your PATH undesirably long. Is there anything you can do to clean things up? Yes, you can type at the prompt:

$ uniqpath
This will remove any duplicate entries from your path, leaving the order of the remaining pathels intact. For example:

$ NEWP=fred:bill:steve:fred:dave:bill
$ uniqpath -p NEWP
$ echo $NEWP
fred:bill:steve:dave
Let's skip the options-handling code again, and look at the meat:

npath=$(listpath -p $pathvar | awk '{seen[$0]++;
if (seen[$0]==1){print}}')
    eval $pathvar=$(makepath "$npath")
As usual, $pathvar contains the name of the pathvar we want to modify. The code is rather similar to that of delpath. The first line generates a variable (npath) containing the unique path elements, and the second line rebuilds the pathvar from those elements using makepath. We don't use an external file to store the pathels, but keep everything in shell variables. This is done in order to demonstrate an alternative technique--there is no deeper reason.

The first line runs listpath to break the pathvar into separate lines and pipes them through an awk filter which removes duplicate pathels. You may be wondering why we don't just use the uniq program instead of awk's magic. It's because uniq will remove duplicate lines from its input only if they happen to be adjacent. In our case, the duplicate pathels will generally not be adjacent, so uniq won't work. ``Aha,'' you say, ``why not use sort -u? That will sort the lines and remove duplicates.'' True enough, however, it may also modify the directory search order, if we ran uniqpath to alter PATH. Usually, people care about the order in which their PATH directories are searched, and it's a bad idea to modify it.

Thus, we have the awk solution. This uses a powerful feature of awk known as an associative array or hash (if you have a Perl background). If you're a C programmer, you'll know what an array is: a group of objects of the same type, indexed by an integer. The contents of an array can be accessed by expressions like values[0] or values[20], which refer to the first and twenty-first elements, respectively. A hash is rather like an array which can be indexed by an arbitrary string of characters. So, in awk notation, we could write

age["bill"]=27
to assign 27 to the hash element indexed by the string bill in the hash called age. Let's look at the awk code shown above.

Between the single quotes, we have a block of code run each time awk reads a new line from its standard input. When awk reads a line, it is stored in a special variable called $0, and we use $0 as an index into a hash called seen. (We haven't declared this anywhere--that's okay in awk. Variables spring into existence, with numerical value 0, when they appear in the code). We use the seen hash to tell us whether awk has already seen an identical line of input since it started executing. Let's see what happens in the NEWP example shown above.

First, listpath splits NEWP into lines containing the following strings: ``fred'', ``bill'', ``steve'', ``fred'', ``dave'' and ``bill'', which are read in that order by awk. awk stores each line it reads in $0, so $0 takes on the values ``fred'', ``bill'' and so on, in turn. Each time a line is read, the corresponding element of the seen hash is incremented (by the line seen[$0]++) and is printed only if it has been seen exactly once (by the print statement in the if block, which prints $0 to standard output by default). If we look at the hash element seen["fred"], this is initially 0 and is then set to 1 when awk reads the first ``fred'' line, remains at 1 for the next two lines, and is set to 2 when awk reads the second ``fred'' line. It is printed only when it is seen for the first time. C programmers should note how syntactically elegant this solution is and how little code is required when compared to the equivalent in C.

edpath

The final pathvar function we're going to see is edpath. This breaks the pathels in a pathvar into separate lines, writes them to a temporary file and runs an editor on that file. You can edit the pathels to your heart's content and quit from the editor when you're finished. The pathvar is then reconstructed from the modified lines in the file. edpath allows you to perform arbitrary modifications on a pathvar. I use it most often when I wish to swap the order of directories in PATH.

The code for edpath is fairly straightforward (ignoring once again the boring details of option handling):

TEMP=/tmp/edpath.out.$$
VAR=\$$pathvar  # VAR="$LIBPATH" for example
eval export OLD$pathvar=$VAR  # store old path in 
                              # e.g. OLDPATH
listpath -p $pathvar > $TEMP  # write path
                              #  elements to file
${EDITOR:-vi} $TEMP           # edit the file eval
$pathvar=$(makepath < $TEMP) # reconstruct path
/bin/rm -f $TEMP           # remove temporary file
Let's skip the first three lines for now. The real work is done by the block of code starting with listpath. This follows a similar pattern as delpath and uniqpath. First, we separate the pathels in the pathvar using listpath, but this time, we redirect the output into a temporary file. The next line edits that file. The expression ${EDITOR:-vi} may be unfamiliar; it means ``Use the value of the EDITOR variable if it is non-null, else use vi.'' This allows the user to specify his favourite editor by setting the EDITOR environment variable (to Emacs, perhaps) but uses vi if he has not done so. Note that the edit command is run in the foreground, so the shell will wait until the editor process terminates before running any more commands from the shell function. When this occurs, the modified pathvar will be reconstructed by the line starting with eval. If you read the description of delpath given above, you'll know how this line works.

Lines 2 and 3 of the code are a safety net. They store the initial value of the pathvar to be edited in a new environment variable. If the user is editing PATH, for example, then the code creates a variable called OLDPATH. If the user makes unwanted modifications to her PATH, she can simply type:

$ PATH=$OLDPATH
and all will be well.

Conclusion

UNIX can present a bewildering array of tools and techniques, and it's almost impossible for any individual to be intimately familiar with all of them. In my experience, the best developers carry around a large bag of simple but useful techniques and are able to combine them rapidly into a working solution. You don't need to know every detail of every tool to do useful work, but you do need a bag of tricks you understand.

Please feel free to use any of the ideas I've described in this series. You can get a hold of the source code to the shell functions from ftp://ftp.demon.co.uk/pub/unix/misc/pathfunc.tgz. Let me know if you find any bugs, would like a new feature added, or make an improvement.

Stephen Collyer (stephen@twocats.demon.co.uk) is a freelance software developer working in the UK. His interests include scripting languages and distributed and thread-based systems. Occasionally, he finds the time to talk to his wife and two remarkably attractive and highly intelligent children.