LJ Archive

How To Read 950 E-mail Messages Before Lunch

Jay D. Allen

Issue #22, February 1996

A discussion of the use of e-mail filters on Unix computers that use Sendmail-like mail systems.

I have heard it said that electronic mail is the biggest reason that people first go to the Internet, and that Usenet news is why they stay. My question is, what is the single biggest reason that people leave the Internet? The answer is junk mail! Well, maybe they haven't left. It just looks that way because they never answer e-mail. It's easy to understand why in a public forum that spans the globe and includes tens of millions of members, people might be receiving more mail than they can read.

There are thousands of opportunities to subscribe to mailing lists, and interneters can quickly overfill their plates. I know, I've done it. In this article I'll discuss a powerful set of tools that allow you to get control of your in-box and reduce your chances of heart trouble related to starting your mail reader. These tools are called “E-mail filters”. E-mail filters are programs that sort mail based on a your directions. For instance; all mail from my brother should be moved from my in-box mail folder to a folder labeled “Frank”. Filters work by processing mail, after the system delivers the mail to you, but before you actually read it.

What do I mean by that?

How an e-mail Transaction Works

Let's start at the point that mail for you is delivered. When mail is delivered to your computer on the Internet, it arrives via a mail system which is using Simple Mail Transfer Protocol (SMTP). Although there is probably a different e-mail daemon for every week of the year, sendmail is by far the most popular. For the rest of this article when I speak of the “mail system” I'll just call it sendmail, though most of what I say applies with smail, MMDF, and others. You can ignore the differences for the purposes of this article.

So, to continue, “sendmail” takes the e-mail and then decides whether the e-mail needs to be sent directly on to another system or delivered locally. If sendmail decides that the mail should be delivered locally, it goes through a short list of actions. First it looks to see if there is a .forward file in the user's home directory. If there is no .forward file, sendmail writes the mail to the user's system-wide mail file.

If the user does have a .forward file, sendmail reads the file. If the file contains an e-mail address, sendmail fowards the message to that address. If the forward file contains a pipe | character, sendmail runs the specified program, sending it the mail message, letting the program deliver the mail. This last case is how e-mail filters work. The sendmail daemon “delivers” the mail to your filter, and the filter delivers it (or doesn't deliver it, if you prefer) to the folders, following your set of rules. If you use the elm filter program, your .forward file might look like this:

"| /usr/local/bin/filter"

sendmail would deliver all your mail to the program called filter.

What would happen if filter was not there, or otherwise broken? We can guard against failure, by providing an alternative for sendmail.

"| /usr/local/bin/filter || exit 75 "

In this example, if the filter fails, the delivery to the user's .forward file will exit with an error number 75. This forces sendmail to back off the .forward file, and try again later instead. The || exit 75 only protects against catastrophe, not bad choices. If you do not correctly configure your filter, it may lose mail, but would not “fail” from the perspective of sendmail. The exit 75 would not help you to get the mail back.

The most common place the exit 75 can help you is when your home directory runs out of space. Most filters will gracefully fail, allowing your mail to be delivered to the system mailbox instead. This is especially helpful on systems that have disk quotas on the home directories, because the mail spool generally does not have user quotas.

Choosing a Mail Filter

There are at least four popular mail filters available: Procmail, Elm-filter, Mailagent, and MH's slocal. Procmail is a robust general purpose mail filter. By design, it is small, easy to install, and dependable. Elm is a user mail program for reading and sending e-mail. The Elm-filter is a separate program that comes with the Elm package, and can be used with or without the rest of Elm. Slocal is the mail agent that comes with the Rand corporation's Mail Handler (MH). You might be buying the Cadillac for the cigarette lighter if you install MH just to use slocal, though. Unfortunately, Slocal does not support regular expressions (see msort sidebar for a possible solution). In contrast to Slocal, the mail filtering package called “Mailagent” supports a very rich regular expression syntax. Unlike the other filters which are written in C, mailagent is written primarily in Perl, and uses Perl's powerful regular expressions.

Procmail can write mail to “mbox” style mail files, as well as MH style mail directories. Slocal can write to “mbox” style mail files, as well as its native MH style “folders” (directories). In general, mail agents can be used interchangably with many different e-mail readers.

I use procmail to filter my mail, MH for my mail package, and “Exmh” as an X-based frontend to MH. (Exmh is written in Tcl/Tk, and is possibly the best way to do e-mail. [I agree!---ED])

If you are lucky, one or more of the e-mail filters described here may already be installed on your system.

Filter Installation

If you are not so lucky, you may need to install an e-mail filter. These filter programs have a varying degree of ease for installation. They all can be installed without root privileges, but save yourself some trouble, and ask your system administrator to install it for general use. If you are handy with a compiler, offer to help out. If you have to do it yourself, you'll need to get the sources, compile them, and create any needed configuration files. See the sidebar “Where to get a mail filter” for more information on getting and installing the common e-mail filters.

Spend some time figuring out your needs for mail file locking. Locking is used to guarantee that sendmail does not try to write to a file at the same time you do. The documentation for each of these packages mentions this subject, so pay attention. I found out the hard way that my mail reader was using one type of lock while my filter was using another (a moment of silence for all my lost mail, please).

After you either install a mail filter (or locate it, if it's already installed) you will need to start pushing your mail through it. As in the example above, most e-mail filters can be invoked using the .forward file. If your system does not support this feature, you can still use e-mail filters, but you may need to periodically invoke the filter using a script. You can run your script every time you login, or once each time you read mail, or use cron to run it on a regular basis.

How a Filter Works

Now that we have a way to get e-mail pumped into our filter, let's talk about what happens next. What does the filter do with your mail? To start with, “E-mail filters” might be more appropriately labeled “e-mail sorters”. We all have “filters” in our kitchen: flour sifters are used to filter based on size, strainers let liquids through but not the pasta, and coffee filters let the good stuff through leaving behind the brown sludge.

Most people are not satisfied with filtering e-mail based on size, color, or fluid state. They would prefer to sort their mail based on: Where it's from, Who it's from, and What it's about. In e-mail terms, this might be done by looking at the From: or Subject: lines. Instead of “filter” we might do better to talk about an “agent”, “secretary”, “processor”, or even “e-mail dog”. We can give instructions to “an agent”, but not a coffee pot.

With that in mind, we can create a set of rules for our “filter” to follow. If an e-mail message matches on a rule, an action is taken. Procmail and Slocal can have only one action per rule, although there are a few ways to get around that limit. Unless otherwise directed, filtering stops after a match, and the action completes, making the order of rules important. Keep the most specific rules at the top (first), and the default case (what happens if no rules match) at the bottom. Let's look at a few examples:

Using the procmail program, we can create a file in our home directory called .procmailrc. Procmail calls its filtering directions “recipes”. Simple recipes look like this:

:0
* ^From.*jay@fork.com
forkmail
:0
* ^From.*cory
* ^Subject.*Elvis
/dev/null

The first recipe tells procmail to look for mail that contains a line starting with the the word From, and contains the string jay@fork.com. If procmail finds a match for this description, it stores the e-mail message in the file forkmail. The second recipe matches on two criteria; Email from cory, with a subject that includes the word Elvis, is deleted. The matches are based on regular expression syntax of egrep.

The equivalent using Elm's filter ($HOME/.elm/filter-rules) would look like:

if (from contains "jay@fork.com")  ?
        save "~/mail/forkmail"
if (from contains "cory" and subject = "Elvis") then
        delete

Elm's filter calls these stanzas “rules”. The matches are “egrep like”, but not as fully-featured as the matching you can get using procmail.

In slocal's .maildelivery file our rules would look like:

# header pattern action result string
# lines starting with a " are ignored,
# as are blank lines
>From jay@fork.com ^ "/pkgs/mh/lib/rcvstore +inbox"
>From cory         file R /dev/null
Subject Elvis     destroy N -

General Strategies for Filters

When constructing filter rules, err on the side of caution. Try constructing a few simple rules first. Test the rules by sending mail to yourself, and further test by leaving the filter in place for a few days. If it all works, you can start to add more complex rules. Make sure you have the “default” behavior of your filter set. Learn how to turn on debugging, and then to turn it off. Try out the logging functions. Using procmail's “mailstat”, you can get a summary of what actions procmail has taken. By investing a bit of time in figuring out how to make your filter work, you'll reap manifold time savings later.

While you are learning how to use your filter, you may want to keep a backup mail file. Instead of using one line in your .forward file which invokes your filter, you might want two lines:

\username
"| /path/to/filter"

where username is your login. This way, your mail will all be delivered to your standard system mailbox as well as through your mail filter. If you have misconfigured your mail filter, you will have a backup to retrieve your mail from. There are several other ways to do this, as well.

I find that people tend towards two schemes when building mail filters. The first technique is to sort out mail based primarily on who it is from. The second is to filter based on content or function. Mail from my manager could go in a folder called boss, but I get a lot of mail from him that's not particularly important, not because it's from him, but because of what it's about. For instance, mail from my manager with a subject of “Hardware budget for next year”, would better go into my “budgets” folder than my “boss” folder.

If you are not particularly worried about disk space, why not try both? Filter all the e-mail based on who it's from and on other factors like key words in the Subject line or other headers. In my case, my filter could save a copy of all mail I get from my boss in the folder called “boss”, and store all e-mail with a subject line that contains the word “budget” in the folder called “budgets”.

A side effect of using filters is the excellent tracking you'll get of your e-mail. By using the logging and reporting that comes with Mailagent, Elm-filter, and Procmail, you will learn a lot about your e-mail. How much mail do you get from the “Elvis-lifestyles” mailing list? What percent of the mail is it? How does it compare to the amount of mail you get from your brother? I use the mailstat program to gauge how much time I spend supporting different departments' computers.

More Advanced Rules

PGP key handling has a special need. People who use Phil Zimmerman's PGP program to send encrypted e-mail need to exchange “public keys” to communicate securely. A common convention is to put the public portion of the PGP key in the .plan file. Anyone who wants the key can then just finger you. This is great, unless your system does not support finger, or your account is behind a firewall. The next best thing might be some sort of automatic mail response. Using procmail, you can filter for a special phrase in the subject line, like this:

:0 h c
* !^FROM_DAEMON
* ^Subject.*SEND-PGP-KEY
| (formail -r -A"Precedence: junk";\
   cat ~/.plan ) | $SENDMAIL -t

This procmail recipe first checks to see if the mail comes from the mail daemon, to prevent e-mail loops, or “ringing”. Then if the subject includes the special phrase SEND-PGP-KEY, procmail invokes formail which automatically constructs a reply to the sender. The e-mail that is sent back includes the contents of the .plan file. If you keep your PGP public key in this file, anybody can request a copy of your key, even if they can't finger you. The equivalent using Elm-filter would look like:

if ( subject contains "SEND-PGP-KEY" ) then
  execute "cat ~/.plan | mail -s \"RE: %s\" %r"

Elm's filter uses a macro %s to represent the original subject of the message, and %r to represent the return address.

Mail agents are really great if you have special e-mail needs. I carry an alphanumeric pager. I use procmail to watch for mail that has the magic word PAGEJAY in the subject line. If procmail sees the magic word two things happen. First, a copy of the e-mail is forwarded to my e-mail->pager gateway at page-jay@pager.fork.com. When the forwarded message gets through the gateway, it will be broadcast to my pager. (The folks in my office call this “Belt-Mail”, or if the pager is set to vibrate, it's an e-mail massage.) Second, a copy of the e-mail is stored in a local folder called “pages”. Here is what the procmail recipe looks like:

:0 c
* ^Subject.*PAGEJAY
! page-jay@pager.fork.com
:0
* ^Subject.*PAGEJAY
pages/.

The first recipe uses the c flag. This tells procmail that even if there is a match, the matching should continue past. In the next recipe, the belt-mail will get filed, and the matching of this e-mail will end. The equivalent in Elm's-filter rules would be:

if ( subject contains "PAGEJAY") then
      forward "page-jay@pager.fork.com"
      save "~Mail/pages"
endif

E-mail filters/agents are powerful tools that enable people who get lots of e-mail to communicate effectively. From the mundane but important task of sorting e-mail, to the complex task of responding automatically to e-mail requests, filters can deliver. Everyone involved benefits from the use of filters. You like it because your mail is always sorted the same way (Where did I put that CERT advisory?). The people who send e-mail to you like it because you start to respond faster by e-mail than via Canadian “First Class”. The only people who don't like it are the direct-marketing-spamers who keep trying to send you ads for “Investment opportunities”. It's OK if they get mad, because you don't ever see their mail since you installed the filter.

Any time spent learning how to configure and use the filters is won back many-fold in time-savings later.

For more information about e-mail filters, see sidebar.

Jay Allen (jay@fork.com) got his BS in chemistry from Portland State University in 1991. He is currently a lead Unix Systems Engineer at Blue Cross/Blue Shield of Oregon. Among other things, he is interested in the use of cryptography for commerce on the Internet, and secure private networks.

LJ Archive