Work the Shell

Web Administration Scripts

Dave Taylor

Issue #232, August 2013

After some unpleasant experiences of his own, Dave explains how to create a script to detect DDOS attacks on a Web server.

Phew. I'm done with that Cribbage game coding, after months of shell script programming in directions doubtless unanticipated by the original Bash authors. It mostly worked, but after publishing last month's column, I did realize there are a few niggling bugs in the scoring code. Those, however, are now an exercise for you, dear reader, to identify and fix. Because you need homework, right?

During the past month or so, I've also been dealing with an aggressive DDOS (that's a “distributed denial of service”) attack on my server, one that's been a huge pain, as you might expect. What's odd is that with multiple domains on the same server, it's one of my less-popular sites that seems to have been the target of the attacks.

So, that's the jumping off point for this article's scripts: analyzing log files to understand what's going on and why.

To start, a handy check is to see how many processes are running, because my DDOS was characterized by a ridiculous number of comment and search scripts being triggered—hundreds a minute. How to check?

The ps command offers a list of running processes at any given time, but for many versions, all you see is the Web server “httpd” without any further details. The -C cmd flag narrows down output only to those processes, like this:

: ps -C httpd
  PID TTY          TIME CMD
20225 ?        00:13:21 httpd
28162 ?        00:00:01 httpd
...
5681 ?        00:00:00 httpd
5683 ?        00:00:00 httpd <defunct>

(Note the “defunct” process that's about to vanish.)

So one easy test is to see how many httpd processes are running:

$ ps -C httpd | wc -l
108

That seems like a lot, but this server is hosting several sites, including the super-busy AskDaveTaylor.com tech-support site, which sees more than 100k hits/day. So how does this vary over time? Hmm...still working on the command line:

$ while /bin/true
> do
>   ps -C httpd | wc -l
>   sleep 5
> done
108
107
103
99
94
91
87
84
91
121
120
116

So there's a max of 121 and a min of 87. But, what if I actually want to analyze this and get min, max and average over a longer period of time? Here's how I solve it:

#!/bin/sh
# Calculates the number of processes running that matches
#   a set pattern over time, producing min, max and average.
min=999; max=0; average=0; tally=0; sumtotal=0
pattern="httpd"  # ps -C pattern
while /bin/true
do
  count=$(ps -C $pattern | wc -l)
  tally=$(( $tally + 1 ))
  if [ $count -gt $max ] ; then
    max=$count
  fi
  if [ $count -lt $min ] ; then
    min=$count
  fi
  sumtotal=$(( $sumtotal + $count ))
  average=$(( $sumtotal / $tally ))
  echo "Current ps count=$count: min=$min, max=$max, tally=$tally 
  ↪and average=$average"
  sleep 5 # seconds
done
exit 0

Notice in the script that I'm not falling into the trap of calculating the average by having a running average and somehow factoring in the latest value as a diminishing additive, but instead I use a sumtotal variable that keeps having the latest processor count added. That divided by tally is always the average, although at some point this probably would be greater than MAXINT (2**32) and would start to produce bad results. On a modern computer, however, that should take a while. (And the quantum, the period of time between iterations, also can be adjusted. Five seconds might be too granular for a process that's going to be run for hours or even days.)

The following are the first few lines of output. Notice how the min and max vary as the different values are calculated:

sh processes.sh
Current ps count=132: min=132, max=132, tally=1 and average=132
Current ps count=128: min=128, max=132, tally=2 and average=130
Current ps count=124: min=124, max=132, tally=3 and average=128
Current ps count=123: min=123, max=132, tally=4 and average=126

If I let the script run for a longer period of time, the values become a bit more varied:

Current ps count=90: min=76, max=150, tally=70 and average=107

During the 15 minutes or so that I ran the script, an average of 107 “httpd” processes were running, with a minimum of 76 and a max of 150.

Armed with that information, another script could keep an eye on things via a cron job, like this:

#!/bin/sh
# DDOS - keep an eye on process count to 
# detect a blossoming DDOS attack
pattern="httpd"
max=200   # avoid false positives
admin="d1taylor@gmail.com"
count="$(ps -C $pattern | wc -l)"
if [ $count -gt $max ] ; then
  echo "Warning: DDOS in process? Current httpd count = 
  ↪$count" | sendmail $admin
fi
exit 0

That's a superficial solution, however, and it has two problems: 1) what I'd really like is to be able to identify the potential DDOS based on processor count and watch to see if it's sustained over the next few invocations of the script, and 2) once it's triggered, if it is a DDOS, in addition to everything else, I'll also start drowning in e-mail from this script saying essentially the same thing each time. Not good.

What the script needs is contextual memory so it can differentiate between a sudden spike in traffic and a persistent DDOS attack. In the former case, the script might trigger positive, then the next time it runs, it's all within acceptable limits again. In the latter case, once the attack starts, it'll probably just accelerate.

That's the opposite of the e-mail non-repeat condition though, because in the latter case, I want to know that the e-mail has been sent and not send it again within, say, a 60-minute window.

I'll dig in to both of those situations next month. For now, I need to get back to my server and keep bringing things back on-line, program by program, to try to avoid any problems. Stay tuned!