Using Squid's URL redirection to implement WWW usage quotas.
The University of Zululand is a “historically disadvantaged” university campus with approximately 6,000 students located in the northeastern coastal region of South Africa. Our disadvantaged legacy continues in post-apartheid South Africa, and our students are mostly black and are from financially disadvantaged backgrounds. Consequently, our operating budgets are severely constrained, and the daily challenge of doing more with less prevails.
In this environment, Linux is the obvious choice for our internet services platform, and we use it for practically everything: e-mail and WWW servers, DNS, DHCP, HTTP proxies, firewalling and dial-up access. Our management loves the “free-as-in-beer” aspect of Linux. However, our main reason for using Linux is more compelling—using Linux is the most fun way to deliver our internet services.
Early in 2000, our biggest financial challenge was to provide internet access to all our staff and students. The cost of internet bandwidth in South Africa is much higher than in the US. Our 128Kbps access circuit was costing approximately $5,000 US per month. This circuit already was saturated during the day by our 400 staff users, and we still needed to provide internet access to the 350 workstations in our heavily used student labs.
We needed more bandwidth, but our budget wouldn't cover doubling or tripling our access bandwidth. Without monitoring or placing controls on internet usage, management was uneasy about committing to large increases in spending on internet access. They were skeptical of paying large monthly bills when they didn't know who was using the Internet or for what purpose.
Our problem was really one of how to provide an acceptable quality of service (QoS) for WWW browsing within our budgetary constraints. With unregulated internet access, congestion of the access circuit is the only factor limiting demand, which usually leads to the poorest QoS that the core group of die-hard downloaders can bear.
We needed to provide some type of “cost” for internet usage in order to regulate demand. This system would educate our users about the real costs of providing internet access at the university. For students, this cost should be prepaid. The level of bad tuition debt at disadvantaged universities in South Africa is a significant problem, and we didn't want to add bills for internet access to the debt collection problem. We needed a quota system for internet usage.
Our bandwidth constraints have since eased with the introduction of a new network for higher education in 2001. We upgraded our internet access to 768Kbps (384Kbps CIR for international traffic), while maintaining our monthly costs at roughly the same level. However, we remain convinced that our quota system is still necessary for maintaining a reasonable QoS for our users.
We use NLANR's popular Squid HTTP caching proxy software to provide WWW access for our users. Our firewall ensures that staff and students must use our proxy servers to access the WWW, and we configured the proxies to require login names and passwords for WWW access. With this configuration, the access.log file generated by Squid has a complete record of all WWW activity for all users. Tallying WWW usage for a given user is straightforward with the information in access.log.
Squid can use external URL redirector programs. When a URL redirector is configured, every URL request to the proxy is sent to the standard input of the URL redirector program, which responds by either telling the proxy to fetch the requested URL or to redirect the request to a different URL. This feature is used by content filtering with programs like SquidGuard where the URL redirector checks each URL request against a database of undesirable URL patterns. Any request for an undesirable URL is redirected to a page telling the user that she may not view the requested URL.
We realised that a URL redirector also could be used to enforce usage quotas. The URL redirector checks the user's quota, and if it has been exceeded, it would redirect her to a page telling her she was out of credit. While simple in principle, this has to happen quickly or it impacts the overall performance of the proxy server. In our implementation, the URL redirector usually responds in two milliseconds when the URL request is permitted.
QUORUM (QUOta-based Resource Usage Manager) is a server that provides a simple message API with two basic message types: tallies (bill a particular account/session for an item) and queries (ask the server if a given account/session is still in credit).
We use the QUORUM server to manage the tallies of all URLs retrieved by the proxy as well as the current credit balances of user accounts and also to track user sessions, which correspond to periods of uninterrupted WWW usage by a user. The server keeps a cache of all active account and session tallies and credit balances. Updated items in these caches periodically are written to a database. The server has a WWW interface for displaying account balances, user sessions and other administrative and debugging information.
An early version of the server was written in C, using various CGI programs to provide the WWW interface. However, this approach was abandoned when we discovered Java servlets. The QUORUM server is written as a collection of Java servlets and JSPs, with JDBC for the back-end database connectivity. We currently run it under Caucho's Resin Servlet/JSP container. We use MySQL for the back-end database with Resin's built-in JDBC classes for MySQL access and database connection pooling.
Java servlets are a natural choice for implementing an application like QUORUM. The server contains several persistent shared data structures for user sessions, per-session tallies and caches of account tallies and credit balances. Since servlets are simply Java classes that are part of a container application that implements a WWW server, it is simple to create these persistent objects as part of the server, as well as background threads that check for idle sessions and periodically flush changed data to the accounting database.
When a user requests a URL, her browser sends the URL request to the Squid proxy. Squid then sends this URL request information to the URL redirector. The URL redirector sends a querySsn message to the QUORUM server to ask if the user has a current QUORUM session and if the user's account is still in credit. If everything is okay, then the URL redirector replies to Squid with a blank line telling Squid to fetch the requested URL.
If the user does not have a current session, or the querySsn response indicates that the user's account has no credit, the browser is redirected to a JSP, ssnInfo.jsp, in the QUORUM server that shows the user's current account usage and available credit and asks her to start a QUORUM session by clicking on a link to another JSP, beginSsn.jsp. If the user's account is in credit, this JSP creates a QUORUM session for the user.
The URL redirector actually sends a redirect to a servlet that sends another redirect to ssnInfo.jsp. The URL redirector passes the user name and IP address of the user as a parameter in the request to the servlet:
The servlet saves the ssn_id in the HTTP session associated with the client's browser. This lets all the other JSPs use an HTTP session variable to retrieve the user's account and session information.
Usage accounting depends on the information in Squid's access.log file. After a URL has been fetched by Squid, it appends a line to the access.log file, containing the URL fetched, the size of the object in bytes, the user ID of the requester and the IP address of the workstation making the request. A Python script, squidTallies.py, reads the tail of access.log and generates tallySsnItem request messages to the QUORUM server. The QUORUM server processes these tally messages and keeps running usage tallies for user accounts and sessions, as well as credit balances for all active user accounts.
Account balances, usage tallies and logs of user sessions are written to a MySQL database. A background thread in the QUORUM server periodically walks through the caches of session tallies, account tallies and credit balances and flushes any changed data to the database. This enables the server to keep reasonably up-to-date information in the database without overwhelming the database with multiple updates for every single URL fetched.
QUORUM request messages are simple text messages terminated by a newline. For example:
ref000001 querySsn email@example.com ccode=11000
is a request message asking the server if the user soren at IP address 220.127.116.11 has a current QUORUM session, and if the user is still in credit for “cost code” 11000. The first field of all request messages is a user reference that is echoed back to the client in the response message, which allows multithreaded clients to match responses to request messages.
As a J2EE application, you might expect a hipper message transport such as SOAP over HTTP. We considered encoding the requests as individual HTTP GET requests to servlets that would implement the various QUORUM API messages. This would have been a cleaner fit with the servlet framework, but our tests indicated that the HTTP overhead would be too large for our application. The peak demand from our student labs is 20-30 URLs per second. Since each URL request would correspond to two QUORUM request messages (one querySsn request and one tallySsnItem request), the server would need to handle 60 requests per second comfortably. We use a single QUORUM server to handle both our staff and student proxies, so we need to handle at least 100-150 requests per second. We could not achieve these rates using a separate HTTP request for each QUORUM request message, even making back-to-back requests over a single persistent HTTP connection.
Instead, a single servlet, PostMsg, handles all QUORUM request messages from clients. A client application makes an HTTP POST request and then sends QUORUM request messages on the TCP connection of the POST request, as if it was uploading a file or other data in the request. The PostMsg servlet reads request messages, one per line, from the ServletInputStream with the readLine() method of the input stream. It processes the request messages and sends responses back to the client on the ServletOutputStream. After sending the response, it calls the flushBuffer() method on the ServletHTTPResponse object to ensure the response is sent to the client. Using this technique, a single QUORUM server can handle several thousand requests per second on a single client connection.
There are two minor problems with this technique. First, the servlet container will close any HTTP connection that is idle for more than a set time, typically 30 seconds or so. This means that a client will be disconnected when it doesn't send any requests for a while. The client will then have to reconnect and send another HTTP POST request in order to send more QUORUM request messages. We wrote a Python class that provides a UNIX pipe for sending requests and reading responses. The class establishes the HTTP connection on demand and sends the HTTP POST request. An advantage of the connect-on-demand technique is that the message transport is very robust—idle connections don't hang around and there aren't problems with clients getting stuck, thinking they have a connection that the server thinks is closed.
The other problem is that HTTP/1.1 requires that a POST request has a Content-length header in both the request and the response. While we found that Tomcat ignored this restriction, Resin did not, and it closed the connection when the number of bytes sent by either client or server exceeded the Content-length value. Our workaround was to set a very large value in the Content-length header and arrange for the client to close the connection well before that number of bytes was sent.
Performance of the URL redirector is critical for the proxy as each URL request is delayed until the URL redirector responds. QUORUM's URL redirector keeps a cache of querySsn replies for all recent sessions. This makes the redirector very fast for URL requests once a session has been established and is in credit, as the URL redirector responds based on the most recent reply message in the cache. The response time in this case is typically two milliseconds or less. Even if there is a cached reply, the redirector always sends a querySsn request to the QUORUM server. The reply message is used to update the reply cache, so the cache always contains up-to-date information about the status of user sessions and credit balances.
When the redirector doesn't have a cached reply message, or if the cached reply is negative (user out of credit), then the redirector sends a querySsn request and blocks the reply to the proxy server until the response from the QUORUM server has been received. In this case, the redirector may take from 20 to 50 milliseconds to respond.
Squid can be configured to start up several copies of the URL redirector so that URL redirection requests can be processed in parallel. The QUORUM URL redirector is multithreaded so that Squid can make several simultaneous URL requests. Squid actually talks to several copies of a small stub program that forwards the URL redirector requests to the URL redirector process over unix-domain socket connections. This allows the URL redirector to share the same cache of querySsn replies for all the requests.
Painless administration of user accounts is crucial to the successful adoption of this system. In our deployment of QUORUM, students must have a usage quota for WWW access. Staff usage is currently tallied by the QUORUM server, but staff members are not yet subject to quotas. (Tracking staff usage with QUORUM has been invaluable in cracking down on the use of stolen staff accounts by students, however.)
For internet usage related to a particular course, we credit all students in the course with a specified amount of usage credit, using our database of student registration information. We do this at the start of every term for all courses where the lecturers have requested internet access.
In the past, we found that there were continual cases of students who had registered late, whose registration was not in our database for some other reason or who simply needed additional usage quota. We now have additional discretionary accounts for some departments. Nominated lecturers can transfer funds from their departmental accounts to individual student accounts. This relieves our department of an administrative burden and gives the departments some discretion for the management of student accounts.
For students who require additional usage over and above the quota they are issued, we will introduce a prepaid voucher system. This system is intentionally reminiscent of the voucher systems for prepaid cell phones, which are very popular with our students. Students will be able to buy a voucher from the university bookshop that has a secret access number. The voucher can be redeemed by typing this number into a WWW form and submitting it. The MD5 hash of the number entered is compared to hash values stored in a database table, and if the hashed value matches one of these, the student's account is credited with the amount shown on the voucher. We have had initial discussions with the suppliers of our new campus point-of-sale (PoS) system to see if we can sell our internet quota vouchers using a system similar to the one currently used to sell prepaid cell-phone vouchers at till points.
The voucher system is an important part of our goal to offer a flexible internet service to our users, subsidize internet use for academic purposes and recover the costs of other usage. We currently are testing the voucher system. There is pressure from the students to introduce the voucher system quickly, but we are being very cautious about the rollout of the system as we this will be a service that students are paying for directly. It is important that the system is well established and well understood by our student users so they can be completely comfortable with what they are paying for.
From the outset, QUORUM was designed for applications beyond WWW usage accounting and quota management. Any service that generates a log file of usage information can be parsed by a script that sends tally request messages to the QUORUM server. In the near future, we will add accounting for e-mail usage so that mail messages with large attachments are billed against the sender's usage quota. Gratuitous e-mail usage (sending that cool AVI to all your friends) has been a recurring problem at our site. In the future, our users will have to decide if sending a large e-mail attachment now might mean having to pay for WWW access later.
Usage accounting in QUORUM is structured into cost codes. Currently, WWW usage is broken down into international and national traffic, which are charged at different rates, and Squid cache hits, which are not charged for. The breakdown of charges is hierarchical, where internet usage may be split into WWW and e-mail access, each of which is split into further categories. Eventually, we plan to have additional usage cost codes for network printing, dial-up access and direct TCP/IP traffic through our firewall. The hierarchical structure of the cost codes means that we can have one quota for all services or that specific quota credits apply to only parts of the cost-code hierarchy. For example, we could decide that the quota credits issued to students in a course apply only to internet access, but prepaid voucher credits could be used for network printing as well.
We are trying to utilise our internet bandwidth more effectively by encouraging our users to do so. Our charging structure benefits users who download large files from mirror sites within South Africa rather than from overseas. It is cheaper for us to purchase guaranteed bandwidth for national destinations than for overseas ones. Similarly, we offer after-hour discounts to encourage users to download large files in the evenings when our internet access is less congested.
Another intriguing idea is to integrate a messaging/bulletin system into QUORUM, a sort of intranet instant messaging. We currently can see which users are browsing the WWW. QUORUM could be extended so messages could be sent to a user or group of users. When one of the recipients next requests a URL, the QUORUM server could redirect the user to a page that shows the message/bulletin.
We have developed a flexible accounting and quota management system that works with our existing Linux and OSS-based internet services platform. This system will play an important role in offering a well-managed internet service on our campus, both from the control it offers over usage and the information it provides about the demand for services.