LJ Archive CD

Using Apache Proxy to Suppress Banner Ads

Raj Mathur

Issue #71, March 2000

If you find Internet advertising annoying, time wasting and resource-consuming, get rid of it.

No matter how much you have, Internet bandwidth never seems to be enough. At home, I dial up on an oldish analog exchange to the Internet, rarely getting connections better than 4800 bps; at the office, I have a (shared) 64KB leased line to our U.S. office and thence to the Internet. However, no matter where I surf (it's part of my job), I have always wished there were a way to avoid waiting for pages to display while loading banner ad graphics from the remote server.

I am running an old, completely hacked version of Red Hat (2.0 to be precise) with kernel 2.2.10 on my 66MHz 486 PC at home. I use this system for browsing (mainly using Lynx), software development and fooling around with Linux. My wife and two children also use it, primarily for browsing and e-mail using one of the web-based e-mail services with Netscape Communicator.

Since I first installed Linux on this system a few years ago, I've been running the Apache web server to serve local content. More recently, I've also enabled Apache's caching Proxy module to maintain a local cache of commonly accessed documents, speeding up Internet access somewhat. The version of Apache I'm running is 1.3.3 (yes, I should upgrade to the latest and greatest, but there's no pressing hurry).

The Apache web server (http://www.apache.org/) is a well-known HTTP server and is part of most, if not all, Linux distributions. Along with being the world's most popular web server, running on over 56% of web sites (Netcraft survey at http://www.netcraft.com/), Apache also has features which helped me replace graphics from well-known banner ad sites with an innocuous local graphic.

Since the distribution I'm running is so old, I will describe the methodology of replicating this on a more modern distribution, namely Red Hat 6.0. Most other popular distributions of Linux will require the same or similar steps.

Background

In order to use Apache for this purpose, I had to set it up as an HTTP Proxy, using mod_proxy, and enable and use the URL rewriting engine built into Apache (mod_rewrite).

In a nutshell, you must first set up Apache to act as a proxy server for the browsers on your local network (or for a single system, which is what I do at home) and tell the client browsers to use that Apache host as a proxy. Then use the powerful mod_rewrite URL rewriting engine to search for banner ad HTTP requests (these match the URLs of well-known banner ad server hosts) and substitute a local graphic for them. With this done, you can sit back and watch your web pages flow into your browser, while that irritating advertising is automatically replaced with a more soothing and altogether more relevant graphic.

You will need to be root while following the steps discussed here.

Enabling Proxy

The first step is loading and enabling the proxy module (mod_proxy) which is part of Apache. In most distributions, you enable mod_proxy by appending the following line to your Apache server's configuration file (/etc/httpd/conf/httpd.conf or /usr/local/apache/etc/httpd.conf):

ProxyRequests on

To check if the module was loaded okay, restart the web server with the command:

killall -1 httpd
Check the last few lines of the Apache error log (usually in ../logs/error_log or ../var/log/error_log relative to the httpd.conf file), and if there are no errors, celebrate! Apache's proxy feature is enabled. This has been tested on Red Hat Linux 6.0 and Debian's “Potato” GNU/Linux 2.2, and it should work with SuSE Linux 6.1.

If you get an error message like:

Invalid command 'ProxyRequests', perhaps misspelled or defined
by a module not included in the server configuration

then you need to either locate the Apache Proxy module, or download and install a version of Apache which has mod_proxy available. Installation is outside the scope of this article. See the Apache HOWTO at www.apache.org/docs-1.2/mod/mod_proxy.html.

Once you have the Apache proxy configured and running, you can tell your client browsers to use it. In Netscape Communicator, select Edit/Preferences/Advanced/Proxies and “Manual Proxy Configuration”. Click on the “View” button, and enter the name of the host running Apache in the text entry boxes for “FTP Proxy:” and “HTTP Proxy:”, using 80 as the port number for each.

For Lynx, edit the global Lynx configuration file found by default in /etc/lynx.cfg or /usr/local/lib/lynx.cfg and change these lines

#http_proxy:http://some.server.dom:port/
#http://some.server.dom:port/
#ftp_proxy:http://some.server.dom:port/

to look like:

http_proxy:http://apache.my.dom:80/
http://apache.my.dom:80/
ftp_proxy:http://apache.my.dom:80/
Then, replace apache.my.dom with the name of the host running the Apache server. Similar techniques will apply to other web browsers—consult the browser's manual for details.

Configuring the Rewrite Engine

Now that you have the Apache Proxy configured and the browsers set up to use it, you can get to the heart of the matter, which is redirecting banner ads to a local graphic, saving yourself bandwidth, time and money.

Before you actually decide which URLs to redirect, you need to do the basic setup for mod_rewrite. Add the following lines to Apache's configuration file, httpd.conf:

RewriteEngine on
RewriteLog logs/rewrite_log
RewriteLogLevel 1

You may need to twiddle the file name in the RewriteLog directive a bit to conform to your distribution's file system conventions. For example, in the default Apache build, the appropriate directive would be:

RewriteLog var/log/rewrite_log
In addition, you can increase the RewriteLogLevel from 1 up to a maximum of 9 to get more detailed information on what the rewrite engine is doing, which helps in debugging. However, once you are satisfied with the way it is working, it is a good idea to reduce the RewriteLogLevel to 0. This will stop all logging and reduce the load on the Apache server.

Having done this, you are now ready to redirect banner ads. Let us assume you want to redirect all graphics originating from URLs http://anything.unclick.net/adi_anything, http://anyhost.unclick.net/jump_anything and http://image_anything.click2com.net/anything. I chose to display the “Powered by Apache” logo (which comes with Apache in the file icons/apache_pb.gif) in place of the advertisements. Add the lines shown in Listing 1 to your httpd.conf file to display the Apache GIF instead of images from these sources, where, of course, you replace apache.your.dom with the name of the Apache server you have set up.

Listing 1

As you can see, each redirection consists of two parts: a RewriteCond and a RewriteRule. The RewriteCond identifies the host (%{HTTP_HOST}) we want to redirect from. The third string in the directive is the name of the host, given as a regular expression. In regular expressions, the period (.) stands for any character, so we use "." to denote a literal period in the domain name. ".*" stands for any arbitrary string, and the caret (^) forces a match at the beginning of the host name. Thus, for example, the regular expression ^image.*\.click2com\.net matches the hosts image.click2com.net, images.click2com.net and images05.adverts.click2com.net, but does not match ads.image.click2com.net.

Once we have found a host with RewriteCond, we use the following RewriteRule directive to see if any part of the URL matches the first string (regular expression) in the rule; if it does, we redirect to the appropriate (local) graphic. For example, having found a host in the unclick.net domain in the first RewriteCond, we then check in the RewriteRule which follows it if the string /adanything occurs anywhere in the URL. If it does, we redirect the Apache proxy to send instead the image /icons/apache_pb.gif from the proxy host. Of course, you can use any image you like, and it need not necessarily be on the proxy server host—any URL is fine. The final [R] converts the HTTP request into an HTTP REDIRECT (302). Don't worry if you don't know HTTP; I don't either, but it's required.

You can add as many rewrite conditions and rules as you like to redirect banner ad sites. I'm still looking for a comprehensive listing of well-known ad URLs—let me know if you can add some.

Notes

The method of setting up the Apache proxy I have described is purely for the sake of redirecting banner ads. The Proxy module has many other features which are not covered here. Perhaps the most useful of these is its ability to cache often-accessed documents on a local disk, and further reduce bandwidth and time needed to browse the Web. Try playing around with the cache directives in httpd.conf, and judge the effect on your browsing speed.

Also, the method of configuring the rewrite engine I have described here is only one way. This method may not be optimal, as it is the result of one day's fooling around with mod_rewrite, but it works and serves my purpose.

mod_rewrite is an extremely powerful and versatile rewriting engine, and we have seen only one application of its features in this small redirection application. More information on Apache, its modules in general, and mod_rewrite in particular, can be found in Resources.

Another tip you may find useful is to combine the two secondary Apache config files, access.conf and srm.conf, into the primary config file, httpd.conf. This speeds up Apache quite a bit, especially on heavily loaded servers. In order to do this, copy the contents of access.conf and srm.conf into httpd.conf, preferably somewhere at the end, although location doesn't really matter. Now add the following two lines near the beginning of httpd.conf:

AccessConfig    /dev/null
ResourceConfig  /dev/null

Again, location isn't very important, as long as you don't insert those lines into the middle of a multi-line directive.

A rather whimsical description of this approach is available at the Apache site, listed in Resources (see “why are there three config files?”).

Conclusion

I have been using Apache to redirect Internet advertising for a while now, and am very happy with the results. Browsing on my slow link at home is a much more enjoyable experience with highly reduced wait time.

When I first started using Apache, I never expected I would be able to use it pro-actively for this particular purpose. Ralf Engelschall, the author of mod_rewrite, didn't expect his brain-child to be used for this purpose, either. The fact that it can underlines once again the beauty of the philosophy behind UNIX, Linux and Apache—if you make the parts general purpose enough and give the facility of combining them, their sum invariably becomes greater than the whole.

Resources

email: raju@sgi.com

Raju Mathur (raju@sgi.com) ostensibly works for SGI in India, but manages to spend inordinately large amounts of time with his first love, Linux. He has been using Linux since the kernel 0.99.11 era and is currently the coordinator of the Delhi Chapter of the India Linux Users' Group. He is married to Aparna, a past-life therapist, and is the proud father of two children, Shiv and Ella.

LJ Archive CD