Access Information Through World Wide Web

Eric Kasten

Issue #13, May 1995

The World Wide Web is designed to be easy to attach to. Not just by using Mosaic, Lynx, or Netscape to read other people's pages, but also by publishing your own information. Eric Kasten tells you how to start.

Access to networked multimedia data has become commonplace on the Internet. Chances are good that you've gone surfing in search of data for a project or technical paper or out of simple curiosity. You have probably passed through a large number of sites scattered across the globe and have seen what can be done. Now you want to set up your own collection of information, images, and sound so that the networked world can come and visit. One of the first steps is to set up a world wide web (WWW) server. The remainder of this article will endeavor to provide the necessary information to successfully install the WWW server provided by CERN.

Preparation and Compilation

The three basic parts of the CERN WWW source package are the line-mode client, the daemon, and the common library. Of these, the daemon and common library source are necessary to compile the server. You may also want to retrieve and compile the client code for future use. These packages may be retrieved by clicking on the appropriate links in the document found at URL www.w3.org/hypertext/WWW/Daemon/Status.html, or by anonymous ftp from gatekeeper.dec.com in directory /put/net/infosys/www/src. The file names are shown in the tar commands below. Once you have retrieved the tar archives, move them to the directory where you wish to compile the server, then extract the source. Using GNU's tar this can be done by executing the following commands:

tar xzvf WWWDaemon.tar.Z
tar xzvf WWWLibrary.tar.Z
tar xzvf WWWLineMode.tar.Z

The final file and tar command is optional since it extracts the line-mode client. Once extracted, the source will be in subdirectory WWW. To build the server under Linux, change to subdirectory WWW and execute BUILD. This is done as follows:

cd WWW
./BUILD

The build process should recognize that it is running on a Linux system, and proceed to compile the common library followed by the server. If you have not extracted the source for the line-mode client, the build process will exit with an error once the server has been built. This is normal and can be ignored. You can verify that you have successfully built the server by examining the files in subdirectory Daemon/linux. In addition to a number of .o files there should be the following six files:

cgiparse: A tool for parsing CGI environment variables.
cgiutils: A tool for generating replies from CGI scripts.
htimage: Used to parse the input from a clickable image.
httpd_3.0: The server or daemon.
httpd: A symbolic link to httpd_3.0
htadm: An administration tool for managing server access files.

CGI stands for Common Gateway Interface. CGI is a set of specifications for how scripts and other programs can interact with a WWW server.

Sidebar: The NCSA http Daemon

Basic Installation

You may install the server and associated tools in any directory you see fit. This article assumes that the installation directory is /usr/local/WWW. If you are installing in some other directory, be sure to make the necessary changes in the following references.

To install the server list, create a user id and a group for the server. This user id and group will help control what access rights the server possesses and allow selective access to the server files for administrative users. This article uses user www and group wwwgroup. They can be created using your favorite method for adding users and groups to the system.

Next, create the directory tree where the server and associated tools and files will reside. A typical server directory will have the following directories (and possibly others):

config: Configuration files.
cgi-bin: CGI scripts and programs.
icons: Icons.
htdocs: HTML documents.

To create these directories, execute the following:

mkdir /usr/local/WWW
cd /usr/local/WWW
mkdir config cgi-bin icons htdocs

Now these directories can be populated. Return to the directory where you compiled the WWW server, and execute the following:

cd Daemon/linux
tar cf - htadm httpd httpd_3.0 | \
  (cd /usr/local/WWW; tar xvf -)
tar cf - htimage cgiparse cgiutils | \
  (cd /usr/local/WWW/cgi-bin; tar xvf -)
cd ../../server_root/icons
tar cf - * | (cd /usr/local/WWW/icons; tar xvf -)
cd ../config
tar cf - * | (cd /usr/local/WWW/config; tar xvf -)

This sequence of commands copies the server binaries, CGI utility programs, supplied icons, and several example server configuration files to the proper subdirectories in /usr/local/WWW. Note how tar is used. This is an often-used method to successfully copy files, links, and subdirectories without accidentally modifying links and other characteristics during the copy operation due to improper flags passed to cp.

You should also set the ownership and permissions on this directory tree. Here is a possible set of permissions:

cd /usr/local
chown -R www.wwwgroup WWW
chmod a+rx WWW WWW/htdocs WWW/icons \
  WWW/cgi-bin WWW/config
chmod g+rx WWW/httpd_3.0
cd WWW/icons
chmod a+r *
cd ../cgi-bin
chmod a+rx *

These are typical permissions that allow access to the server when documents or icons are requested or the execution of a CGI program or script is required. However, it is possible to remove world permissions from most of these files as long as the server runs as user www and group wwwgroup (see UserId and GroupId below).

Basic Configuration

The /usr/local/WWW/config directory contains several example configuration files. The CERN server has a rich collection of options, including caching specifications, proxy support, and access control. This article will cover only a set of the basic options to get you started.

Listing 1 is a basic configuration file, which, with the following descriptions, you can modify to get your server up and running. This file should be created in /usr/local/WWW/config/ and can have any name you desire; however, here it will be called cern_httpd.conf. The default file name that the server will search for at startup is /etc/httpd.conf; however, this is easily overridden on the command line, or you can create a symbolic link between /usr/local/WWW/cern_httpd.conf and /etc/httpd.conf. I prefer to simply override the default, thus providing an obvious indication as to which configuration file is currently in use.

Upon examining listing 1 you will find a number of options set to various values. Note that a comment line can be added to a configuration file by using the shell script convention of placing a # in the first column.

# cern_httpd.conf
# An example httpd configuration file
ServerRoot /usr/local/WWW
HostName       wwwhost.my.domain.name
Port           80
PidFile        httpd-pid
UserId         www
GroupId        wwwgroup
AccessLog      /var/log/httpd.access
ErrorLog       /var/log/httpd.error
LogFormat      Common
LogTime        LocalTime
UserDir        public_html
Welcome        welcome.html
Welcome        index.html
AlwaysWelcome  On
#
# enable/disable methods
Enable         GET
Enable         HEAD
Enable         POST
Disable        DELETE
Disable        PUT
#
# Rules
Exec    /cgi-bin/*      /usr/local/WWW/cgi-bin/*
Pass    /*              /usr/local/WWW/htdocs/*

Listing 1. WWW Configuration File

One of the first configuration options is ServerRoot. This option determines the directory which the server will use as the default root directory. This may be prepended to other option settings (such as PidFile) in the case that an absolute path is not specified. In our case, ServerRoot should be set to /usr/local/WWW.

The HostName directive should be set to the fully qualified, dot-separated host and domain name of the host your server will run on. This is necessary so that the server can properly construct references to itself. This option may also be used to specify a hostname alias to be used in constructing URLs, as opposed to the hostname which is returned by the system.

Port specifies the port the server will accept connections on. When a client (such as Mosaic or Netscape) retrieves a document from your server, it will contact your host at this port to make a request. Ports provide a fixed location at which to access a particular service on a host. Many ports have been defined universally to be the access point for certain services. You may examine the /etc/services file to discover some of the ports that have been reserved on your system. If you are setting up a WWW server which you want accessible to the general public, you should probably use port 80. This port has been established as the default port for providing a hypertext transport protocol (http) service.

The PidFile directive specifies the file in which the server should log the process id of the principle httpd server. This process id can be used to help locate the server in the event that you want to send it a signal. The path specified can be either an absolute path or a path relative to the ServerRoot. For example, setting PidFile to httpd-pid would cause the server to log its process id in the file httpd-pid in the ServerRoot directory.

The next options are UserId and GroupId. These options specify the user and group ids under which the server will execute. In this article, as pointed out earlier, I will be using www as the user and wwwgroup as the group for the server.

Next are a set of options which control the kind of information the server will log about access activity and errors. AccessLog and ErrorLog should be set to valid file names. These logs may be useful for providing insight on tuning or security. You may want to keep these in /var/log or in a directory specially created for your WWW server logs. LogFormat should be set to Common, thus indicating a logging format that is likely to be recognized by many of the tools available to help you process the log information. LogTime can be set to either GMT or LocalTime depending on how you want the log records time stamped.

UserDir specifies the name of the public HTML document directory under each user's home directory. The example specifies public_html as the directory the server should support. This means that a universal resource locator (URL) of the form http://wwwhost.my.domain.name/~username is redirected to the directory ~username/public_html/. Each user on your system can then create a public_html directory where they can set up their own home pages or other publically available documentation.

There are several Welcome directives in the example. Welcome indicates which file should be presented when a URL is passed to the server where the path specifies only a directory. For instance, using the example configuration file, the URL http://wwwhost.my.domain.name/docs/ would result in either Welcome.html, welcome.html, or index.html being retrieved from directory /usr/local/WWW/htdocs/docs/ on the server. The order that the Welcome directives appear in the configuration file determines the search order which the server will use for finding the welcome document. Only the first document found will be displayed. AlwaysWelcome should normally be set on. If this option is off, the server will differentiate between URLs specifying directories with and without a trailing /. With this option off, a URL directory without a trailing / will result in a directory listing being displayed instead of the welcome document.

The Enable and Disable directives specify which methods are enabled on the server. Methods are actions that may be conducted during client-server sessions. For instance, the GET method allows documents to be retrieved from the server while PUT allows documents to be written to the server. By default, GET, HEAD and POST are enabled and DELETE and PUT are disabled. I prefer to explicitly define the methods so as to clearly control the accesses that I wish to allow. It is usually best not to allow destructive methods, such as PUT or DELETE, since it is possible to accidentally allow insecure accesses to the server which may provide a method for an intruder to enter your system.

The last section of our example configuration file deals with rules. Rules are used to control the processing of URLs which are passed to the server. These rules may map URL strings to specific files or operations. In the example file, two rules are included to simplify URL construction and to specify an executable path for the CGI programs and scripts. The first rule is an Exec. This rule tells the server that URLs which contain the string /cgi-bin/ are to result in the execution of a CGI program or script in the physical directory /usr/local/WWW/cgi-bin/. Pass indicates a rule which causes all URLs starting with the string /to be mapped to the directory /usr/local/WWW/htdocs/. This is the directory where you should place your server's public HTML documents.

The Default Welcome Document

You will probably want to have a default document that will be displayed when a client accesses your server without specifying a particular document. For instance, a client might connect to your server using a URL of the form http://wwwhost.my.domain.name/. If there is no default welcome document, a file listing of the directory as specified by the Pass directive will be displayed. A client may then access any documents in this directory with the correct permissions. Also, keep in mind that if a document is in the htdocs directory, a client can access it by explicitly naming that file in a URL. Only documents which you want made public should be here.

To create a default welcome page for your server, you need simply create a welcome document in the directory /usr/local/WWW/htdocs using one of the file names you specified in the server configuration file with a Welcome directive. To set up a simple default welcome document create the file /usr/local/WWW/htdocs/Welcome.html containing the following text and set the file permissions to be world readable (chmod a+r):

<TITLE>Welcome!</TITLE>
Welcome to my WWW server!

This will display a document with the title Welcome! and a simple welcome message displayed in the body. Later, you will probably want to add more interesting information to this file, along with links to other documents that your server provides.

Testing and Startup

Now the server is ready for an initial test run. You will first want to start up the server in verbose mode from a terminal connection so that you can check for any configuration errors. From a terminal, enter the following:

/usr/local/WWW/httpd -v -r /usr/local/WWW/config/cern_httpd.conf &

The -v flag indicates that server should run in verbose mode, and the -r flag specifies the server configuration file. The server will print a number of initialization messages and messages about any configuration errors it encounters as it starts. Check these messages for errors or discrepancies. Once the server has completed its start-up processing, you can use your favorite WWW browser to connect to it using an URL of the form:

http://wwwhost.my.domain.name/

If all goes well, you should see the message from the default welcome document. Be sure to examine the output of the server for any errors, and correct them as necessary.

The server can be started from one of your rc scripts (probably rc.local or rc.inet2) when the system is booted. It should be started after the network is initialized. A good place to start the daemon is in the same rc script used to start sendmail or inetd. The following entry can be used to start the httpd server daemon and to warn you if it cannot be started:

# Start the CERN httpd server
if [ -x /usr/local/WWW/httpd ]; then
          echo -n ", httpd"
          /usr/local/WWW/httpd -r /usr/local/WWW/config/cern_httpd.conf &
else
          echo
          echo "=================="
          echo " httpd not found. "
          echo "=================="
fi

This is a standalone startup of the daemon. You can also start the server using inetd, but you will get better response from a standalone startup. Proper configuration of inetd is outside of the scope of this article. If you wish to use inetd, please refer to the man pages and documentation on inetd and the relevant section of CERN's documentation available at www.w3.org/hypertext/WWW/Daemon/User/Installation/Inetd.html. inetd was also documented in the System Administration column in issue 12 of Linux Journal.

Controlling the Daemon

The httpd daemon accepts various types of signals. Two of these are most notable. A KILL signal will cause the daemon to terminate, and can be issued with the following command:

kill -KILL `cat /usr/local/WWW/httpd-pid`

killall -KILL httpd

Sometimes you may want to reconfigure the daemon without shutting it down. This can be done by changing the configuration files then issuing a HUP signal to the daemon as follows:

kill -HUP `cat /usr/local/WWW/httpd-pid`

killall -HUP httpd

As always, be sure to check the error log for any possible configuration errors.

User's Home Pages

For a user to create a home page, the subdirectory specified by the UserDir directive in the configuration file (public_html in the example) must be created in the user's home directory. The user should place all publically available documents under this directory. The user may want to create a welcome page for presentation when the server resolves URLs of the form http://wwwhost.my.domain.name/~joe. The server will attempt to resolve a URL of this form to reference a welcome document in the public_html directory. If a welcome document is not found, a directory listing will be presented to the client instead.

For the server to properly access a user's home pages, the permissions on the user's home directory must be set to allow the server to read through the home directory into the public_html subdirectory. This can be done by setting the user's home directory so that it has world execute permissions (chmod a+x). The public_html directory must be set to allow world read and execute (chmod a+rx). Finally, the documents in the public_html directory should have permissions which allow world reading (chmod a+r).

An alternative to having users home pages in subdirectories of their home directories is to create a special directory tree under /usr/local/WWW/htdocs/. This directory tree will typically have a subdirectory for each user who will be creating public documents accessible via the httpd daemon. With this arrangement special permissions need not be set on users' home directories, maintaining some additional security. If this method is implemented, the UserDir directive shown in the example should probably be omitted. This will disable the support of URLs of the form http://wwwhost.my.domain.name/~username.

For example, you could create a /usr/local/WWW/htdocs/home/ directory. Under this directory you could create a subdirectory for each user who will be creating public documents. If the user is to be able to freely modify the directory contents, the ownership and permissions must be set accordingly. The permissions must also be set to allow the server to access this directory. This usually means making the directory world readable (chmod a+r). A user's home page can then be accessed using a URL such as http://wwwhost.my.domain.name/home/joe/.

You Have Just Begun...

What has been presented here has been designed to get a basic server up and running. Many topics are left out, including proxy support and special access control. The CERN httpd daemon is a very flexible and configurable WWW server. For further research you will probably want to explore the online documentation provided by CERN at www.w3.org/hypertext/WWW/Daemon/Status.html.