Book HomePHP CookbookSearch this book

8.28. Program: Abusive User Checker

Shared memory's speed makes it an ideal way to store data different web server processes need to access frequently when a file or database would be too slow. Example 8-7 shows the pc_Web_Abuse_Check class, which uses shared memory to track accesses to web pages in order to cut off users that abuse a site by bombarding it with requests.

Example 8-7. pc_Web_Abuse_Check class

class pc_Web_Abuse_Check {
  var $sem_key;
  var $shm_key;
  var $shm_size;
  var $recalc_seconds;
  var $pageview_threshold;
  var $sem;
  var $shm;
  var $data;
  var $exclude;
  var $block_message;

  function pc_Web_Abuse_Check() {
    $this->sem_key = 5000;
    $this->shm_key = 5001;
    $this->shm_size = 16000;
    $this->recalc_seconds = 60;
    $this->pageview_threshold = 30;

    $this->exclude['/ok-to-bombard.html'] = 1;
    $this->block_message =<<<END
<html>
<head><title>403 Forbidden</title></head>
<body>
<h1>Forbidden</h1>
You have been blocked from retrieving pages from this site due to
abusive repetitive activity from your account. If you believe this
is an error, please contact 
<a href="mailto:webmaster@example.com?subject=Site+Abuse">webmaster@example.com</a>.
</body>
</html>
END;
   }
  
  function get_lock() {
    $this->sem = sem_get($this->sem_key,1,0600);
    if (sem_acquire($this->sem)) {
      $this->shm = shm_attach($this->shm_key,$this->shm_size,0600);
      $this->data = shm_get_var($this->shm,'data');
    } else {
      error_log("Can't acquire semaphore $this->sem_key");
    }
  }

  function release_lock() {
    if (isset($this->data)) {
      shm_put_var($this->shm,'data',$this->data);
    }
    shm_detach($this->shm);
    sem_release($this->sem);
  }

  function check_abuse($user) {
    $this->get_lock();
    if ($this->data['abusive_users'][$user]) {
      // if user is on the list release the semaphore & memory 
      $this->release_lock();
      //  serve the "you are blocked" page 
      header('HTTP/1.0 403 Forbidden');
      print $this->block_message;
      return true;
    } else {
     // mark this user looking at a page at this time 
     $now = time();
     if (! $this->exclude[$_SERVER['PHP_SELF']]) {
       $this->data['user_traffic'][$user]++;
     }
     // (sometimes) tote up the list and add bad people 
     if (! $this->data['traffic_start']) {
       $this->data['traffic_start'] = $now;
     } else {
       if (($now - $this->data['traffic_start']) > $this->recalc_seconds) {
         while (list($k,$v) = each($this->data['user_traffic'])) {
           if ($v > $this->pageview_threshold) {
             $this->data['abusive_users'][$k] = $v;
             // log the user's addition to the abusive user list 
             error_log("Abuse: [$k] (from ".$_SERVER['REMOTE_ADDR'].')');
           }
         }
         $this->data['traffic_start'] = $now;
         $this->data['user_traffic'] = array();
       }
     }
     $this->release_lock();
    }
    return false;
  }
}

To use this class, call its check_abuse( ) method at the top of a page, passing it the username of a logged in user:

// get_logged_in_user_name() is a function that finds out if a user is logged in 
if ($user = get_logged_in_user_name( )) {
    $abuse = new pc_Web_Abuse_Check( );
    if ($abuse->check_abuse($user)) {
        exit;
    }
}

The check_abuse( ) method secures exclusive access to the shared memory segment in which information about users and traffic is stored with the get_lock( ) method. If the current user is already on the list of abusive users, it releases its lock on the shared memory, prints out an error page to the user, and returns true. The error page is defined in the class's constructor.

If the user isn't on the abusive user list, and the current page (stored in $_SERVER['PHP_SELF']) isn't on a list of pages to exclude from abuse checking, the count of pages that the user has looked at is incremented. The list of pages to exclude is also defined in the constructor. By calling check_abuse( ) at the top of every page and putting pages that don't count as potentially abusive in the $exclude array, you ensure that an abusive user will see the error page even when retrieving a page that doesn't count towards the abuse threshold. This makes your site behave more consistently.

The next section of check_abuse( ) is responsible for adding users to the abusive users list. If more than $this->recalc_seconds have passed since the last time it added users to the abusive users list, it looks at each user's pageview count and if any are over $this->pageview_threshold, they are added to the abusive users list, and a message is put in the error log. The code that sets $this->data['traffic_start'] if it's not already set is executed only the very first time check_abuse( ) is called. After adding any new abusive users, check_abuse( ) resets the count of users and pageviews and starts a new interval until the next time the abusive users list is updated. After releasing its lock on the shared memory segment, it returns false.

All the information check_abuse( ) needs for its calculations, such as the abusive user list, recent pageview counts for users, and the last time abusive users were calculated, is stored inside a single associative array, $data. This makes reading the values from and writing the values to shared memory easier than if the information was stored in separate variables, because only one call to shm_get_var( ) and shm_put_var( ) are necessary.

The pc_Web_Abuse_Check class blocks abusive users, but it doesn't provide any reporting capabilities or a way to add or remove specific users from the list. Example 8-8 shows the abuse-manage.php program, which lets you manage the abusive user data.

Example 8-8. abuse-manage.php

// the pc_Web_Abuse_Check class is defined in abuse-check.php
require 'abuse-check.php';

$abuse = new pc_Web_Abuse_Check();
$now = time();

// process commands, if any 
$abuse->get_lock();
switch ($_REQUEST['cmd']) {
    case 'clear':
      $abuse->data['traffic_start'] = 0;
      $abuse->data['abusive_users'] = array();
      $abuse->data['user_traffic'] = array();
      break;
    case 'add':
      $abuse->data['abusive_users'][$_REQUEST['user']] = 'web @ '.strftime('%c',$now);
      break;
    case 'remove':
      $abuse->data['abusive_users'][$_REQUEST['user']] = 0;
      break;
}
$abuse->release_lock();

// now the relevant info is in $abuse->data 

print 'It is now <b>'.strftime('%c',$now).'</b><br>';
print 'Current interval started at <b>'.strftime('%c',$abuse->data['traffic_start']);
print '</b> ('.($now - $abuse->data['traffic_start']).' seconds ago).<p>';

print 'Traffic in the current interval:<br>';
if (count($abuse->data['user_traffic'])) {
  print '<table border="1"><tr><th>User</th><th>Pages</th></tr>';
  while (list($user,$pages) = each($abuse->data['user_traffic'])) { 
    print "<tr><td>$user</td><td>$pages</td></tr>";
  }
  print "</table>";
} else {
  print "<i>No traffic.</i>";
}
print '<p>Abusive Users:';

if ($abuse->data['abusive_users']) {
  print '<table border="1"><tr><th>User</th><th>Pages</th></tr>';
  while (list($user,$pages) = each($abuse->data['abusive_users'])) {
    if (0 === $pages) {
      $pages = 'Removed';
      $remove_command = '';
    } else {
      $remove_command = 
         "<a href=\"$_SERVER[PHP_SELF]?cmd=remove&user=".urlencode($user)."\">remove</a>";
    }
    print "<tr><td>$user</td><td>$pages</td><td>$remove_command</td></tr>";
  }
  print '</table>';
} else {
  print "<i>No abusive users.</i>";
}

print<<<END
<form method="post" action="$_SERVER[PHP_SELF]">
<input type="hidden" name="cmd" value="add">
Add this user to the abusive users list:
<input type="text" name="user" value="">
<br>
<input type="submit" value="Add User">
</form>
<hr>
<form method="post" action="$_SERVER[PHP_SELF]">
<input type="hidden" name="cmd" value="clear">
<input type="submit" value="Clear the abusive users list">
END;

Example 8-8 prints out information about current user page view counts and the current abusive user list, as shown in Figure 8-1. It also lets you add or remove specific users from the list and clear the whole list.

Figure 8-1

Figure 8-1. Abusive users

When it removes users from the abusive users list, instead of:

unset($abuse->data['abusive_users'][$_REQUEST['user']]) 

it sets the following to 0:

$abuse->data['abusive_users'][$_REQUEST['user']] 

This still causes check_abuse( ) to return false, but it allows the page to explicitly note that the user was on the abusive users list but was removed. This is helpful to know in case a user that was removed starts causing trouble again.

When a user is added to the abusive users list, instead of recording a pageview count, the script records the time the user was added. This is helpful in tracking down who or why the user was manually added to the list.

If you deploy pc_Web_Abuse_Check and this maintenance page on your server, make sure that the maintenance page is protected by a password or otherwise inaccessible to the general public. Obviously, this code isn't very helpful if abusive users can remove themselves from the list of abusive users.



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.