Work the Shell

Handling Errors and Making Scripts Bulletproof

Dave Taylor

Issue #169, May 2008

Shell scripts may be quick, easy and lightweight, but proper scripting includes the ability to anticipate and respond to error situations gracefully and without anything breaking. Dave explores some of the basic shell script error-handling options.

I realize I've been playing a bit fast and loose with my shell scripts over the last few months, because I haven't talked about how to ensure that error conditions don't break things. If you read the Letters section in Linux Journal, you know I haven't covered this topic because, well, you have covered it for me!

This topic ranges from the simple to the sophisticated, so let's start with a basic test: the return status after an application or utility is invoked.

The Magical $? Sequence

Different shells have different return status indicators (the C shell, for example, uses $status), but the most basic is Bash/the Bourne shell, which is what we've focused on since I started writing Work the Shell, and it uses $?.

Here's a quick example:

#!/bin/sh

mkdir /
echo "return status is $?"

mkdir /tmp/foobar
echo "return status is $?"

rmdir /tmp/foobar
echo "return status is $?"

rmdir /tmp
echo "return status is $?"

exit 0

Run this, and you can see the difference between commands that succeed and those that fail:

mkdir: /: Is a directory
return status is 1
return status is 0
return status is 0
rmdir: /tmp: Not a directory
return status is 1

You can see that when invoking mkdir or rmdir with an error condition, they output an error and—the important part—the $? return status is nonzero.

In fact, check out the man page for a typical command like mkdir, and you'll see: “DIAGNOSTICS: The mkdir utility exits 0 on success, and >0 if an error occurs.”

In a perfect world, the >0 return code would actually tell you what happened, but although that's true with the functions accessible via software, it's not true for the shell.

On the other hand, it's still helpful to explore how to make a shell function that does error handling too. Here's a basic example function:

makedirectory()
{
   mkdir $1
   status=$?

   echo "return status is $status"
}

This just makes a simple function that calls mkdir, and it should be no surprise that it works as follows if I invoke it three times—twice in error situations and once without an error:

mkdir: /: Is a directory
return status is 1
mkdir: /tmp/foobar: File exists
return status is 1

It's a drag to have mkdir generate an error message when you can produce your own simply by testing the $? status variable.

Here's how you can do just that:


makedirectory()
{
   mkdir $1 2>&1 > /dev/null
   status=$?

   echo "makedirectory failed trying to make $1 (error $status)"
}

This is a bit tricky to understand, because you have to suppress the error message from mkdir so you can generate your own. That's done by redirecting standard error to standard out (the 2>&1 sequence) and then redirect standard output to /dev/null (the > /dev/null sequence).

Tip: there's a shorthand you could use here too, if you wanted to be a bit more cryptic: &>/dev/null.

Now when running this, however, the output is far more sophisticated:

makedirectory failed trying to make / (error 1)
makedirectory failed trying to make /tmp/foobar (error 1)

That's a nice way to deal with errors, and of course, the function can also return the error code, with return $status as the last line.

Using test to Avoid Error Conditions

The best way to handle errors is to capture error conditions beforehand. This is best done with the wonderful and powerful test command. For example, the two typical error conditions that you'd encounter with the makedirectory() function are the directory already existing or the script not having permission to create the directory.

The first is pretty easy to test:

if [ -d "$1" ] ; then
  echo "Error: directory $1 already exists."
  exit 0
fi

The second is a bit trickier because you need to grab the parent directory portion of the requested directory then test it to see whether you have write and execute permission to create the subdirectory. This can be done with the dirname function (which returns . if there's no explicit directory given), followed by a test for -w for writeable and -x for executable.

It all combines like this:

parentdir="$(dirname $1)" 
if [ ! -x $parentdir -o ! -w $parentdir ] 
then
  echo "Uh oh, can't create requested directory $1"
  exit 0
fi

This is a sophisticated use of the test command, but read “!” as “not” and “-o” as “or”, and you can see the test is “if not executable $parentdir or not writeable $parentdir then...”, and that should make sense!

Avoiding Output Problems with noclobber

Finally, another thing to be aware of with the shell is that it's all too easy to zap important files with a redirect. For example, this shouldn't work:

$ who > who.output
$ ls > who.output

The second command should generate an error because the output file already exists, right? But it doesn't, and it simply trashes the who output without a warning or error—not good.

To avoid that problem, you'll want to set -o noclobber in scripts or, better, for your login shell, and let it be inherited by subshells, including those that run your shell scripts. A good place to put it could be in your .profile or .bashrc.

With noclobber set, the two commands behave differently:

$ ls > who.output
-bash: who.output: cannot overwrite existing file

That's useful for everyone, and doubly so for us shell script hackers, right?