Issue #16, August 1995

Perl is often considered a scripting language for systems administrators. Jim demonstrates that it is useful to applications and scientific programmers as well—as a prototyping tool.

If you are like many Linux users you may have heard of Perl, but have been reluctant to learn another language. This was my situation several months ago. A friend suggested I give Perl a try. Since I already knew C, Perl was a snap to learn. I soon found myself doing all sorts of text reformatting using Perl. My friends and coworkers were impressed, but skeptical. Could Perl cut the mustard on big files where a ton of data had to be read, massaged and written?

Their skepticism subsided, however, when I wrote a Perl program (I prefer to call them programs rather than scripts just to separate them from shell scripts) to load over three and a half million lines of US Postal data. Now I actually teach Perl (but I digress).

In this article I would like to suggest a use for Perl which is often overlooked—Perl as a prototyping tool. Most C programmers spend a fair amount of time managing memory, and I am no exception. Memory management is a necessary function, especially if you want to keep your C programs tight—not using more memory than necessary—and well behaved—not crashing with the resulting core dumps. The problem with managing memory yourself is that it can divert attention from the program's purpose, which is typically to get an algorithm running.

With Perl, not only do you get solid memory handling routines, you get them in an interpreted/compiled environment and, thanks to the Free Software Foundation, they will not cost you a penny. In fact, if you have Linux you probably already have Perl as well. Let me illustrate how Perl can be used as a prototyping tool with two examples, a simple Monte Carlo calculation and a more substantial geometric problem.

Monte Carlo techniques use probabilistic methods to make estimates. Typically, one or more random numbers are substituted into a function and the resulting value is tested for validity. The program keeps track of both the number of satisfactory tests and the total number of tests. The result is the ratio of satisfactory to total tests and this ratio is monitored as the number of tests is increased.

#!/usr/bin/perl # Monte Carlo calculation of pi srand(time | $$); for($i = 1, $inside = 0.0; $i <= 1000000; $i++) { $x = rand; $y = rand; $inside++ if $x * $x + $y * $y < 1.0; printf "After %7d points pi ~= %9.7lf\n", $i, 4.0 * $inside / $i if ($i % 10000) == 0; }

**Figure 1. A monte Carlo Calculation of
pi**

Let us get our feet wet with a short and simple Perl program.
Figure 1 is a program which estimates pi using a Monte Carlo
calculation. Consider a circle inscribed inside a square of side
two centered at the origin. The area of the square is four and,
since the circle has a radius of one, its area is pi. The ratio of
the area of the circle to that of the square is thus pi / 4, and
this is also the chance that a random point within the square is
also inside the circle. This program repeats a loop a million
times, each time calling Perl's **rand** function
twice, once for **x** and once for
**y**. The distance of the **x**,
**y** pair from the origin is calculated and, if it
is less than one, it is counted as a successful test. (Technically,
this program uses only the parts of the circle and the square in
the first quadrant for simplicity, but, by symmetry, the ratio of
areas is the same as if the whole of both figures had been
used.)

A Linux tip—if you give your Perl programs a unique extension, like .pl, it is easy to make them stand out in ls type listings. Adding the line:

.pl 01;33 # perl programs (yellow)

to the file /etc/DIR_COLORS will make the names of all files
with .pl extensions in listings appear in yellow. See the man pages
for **ls** and the file
/etc/DIR_COLORS for details.

Note, first of all, how short the Perl program is. Also, note
how much it resembles a C program, especially the
**for** loop and the **printf**
function. There are important differences, however. Variables, like
**$i**, **$x**, etc. are used when
needed without a specific declaration. It is not even clear to the
casual observer what the types of the variables are. And what is
with these if statements **after** the
statements they control? And **rand** is used like a
function, but there are no parentheses—maybe it is a
keyword.

All of these differences are features of Perl. Perl keeps
track of your variables for you. The variables are really strings
internally, but they get converted to doubles when needed such as
when the distance of the random point from the origin is calculated
and compared to one. You needn't concern yourself with any of these
details, however. The **if** test at the end of a
line is a handy equivalent to C's **if** block (the
C style is OK in Perl also) but a lot shorter. You will find
yourself using Perl's **if** style all the time once
you get the hang of it. Perl's **if** has three
relatives (**unless**, **while** and
**until**) which also can be used before a block or
at the end of a series of comma separated statements. Finally,
**rand** really is a function—in this case the
parentheses are optional. This is the case with many functions,
including the **printf** at the bottom of the
**for** loop.

If you run the program in Figure 1, you will get an estimate
for pi after every 10,000 tests. I call the program m_c.pl and run
it by typing its name. The first line is the path to the Perl
program on my system. Change this path if yours is different. You
can also test a program for syntax errors with a
**-c** command line switch, i.e.

perl -c m_c.pl

Perl will also provide warnings, such as when you assign to a
variable and never use the variable again. Use the
**-w** command line switch to turn on warnings. This
is a handy way to uncover spelling errors which can easily crop up
in an environment without explicit variable declarations. I usually
use both tests simultaneously during development, i.e.

perl -cw m_c.pl

Next let us consider a more difficult problem. How do we test whether a point is inside a polygon? This problem is not as simple as it may first appear, especially when you take into account the special cases—such as when the point is on the boundary of the polygon. Is that inside, outside, or do you want to count it as a separate case—on the boundary? Let's break the problem down and start by writing a Perl subroutine to test whether a point is on a line.

A not-too-efficient but easy-to-code routine works as
follows. If a point is on a line the sum of the distances from the
point to each of the line endpoints is the same as the distance
between the endpoints, i.e. the length of the line. You might want
to reread the previous sentence and make a little sketch to
convince yourself that this is indeed the case. We will first need
a routine (in Perl they are called subroutines) to find the
distance between two points. C has a built-in function, hypot, but
the one-liner in Figure 2 is the Perl equivalent. The
**sub** keyword denotes a subroutine and the
subroutine's name follows. There is no parameter list in the
definition of a subroutine. Those are comments following the
subroutine name in Figure 2. We will supply the parameter list when
we call the subroutine, as you will see shortly.

sub hypot # (x, y) returns sqrt(x * x + y * y) { sqrt $_[0] * $_[0] + $_[1] * $_[1]; }

**Figure 2. Perl Subroutine to Find
Distance Between Points**

When you call a subroutine in Perl all of the parameters are
automatically put into an array **@_** (The
**@** denotes a standard array). In this case the
array has two elements, the differences in the **x**
and **y** coordinates of the two points. Note that
the elements are referenced by address so that we need to be
careful. Any modifications to these variables would change the
values outside this subroutine. In Perl you can also create local
variables within a subroutine, or inside of any block for that
matter. Normally I would do so, but since this subroutine is so
short and is likely to be called many times, I avoided the overhead
of local variables. Note again, that a Perl function,
**sqrt**, was called without any parentheses. The
return value of a subroutine is the last thing calculated (or you
can overrride this behavior with a specific return value). Here the
Pythagorean result is the last value calculated, so no explicit
**return** is necessary. You will probably find, as
I have, that explicit **return** statements are
rarely needed.

Now we need to tell our program what points and areas are. In
C we seem to have the advantage of structures at our disposal. We
would probably set up something like the **typedef**
and declaration in Figure 3.

typedef struct { double x, y; } POINT; POINT *polygon_p;

**Figure 3. Typedef and
Declaration**

A point has **x** and **y**
coordinates and a polygon has three or more points. Every time we
need a polygon we can malloc the space for it and fill it with
points—and free the space when we are done with it. This is where
Perl shines.

Perl has neither explicit memory allocation nor structures. The good news is that we do not have to concern ourselves with memory—it is there when we need it. The bad news is that we have to come up with some system like C's structures. This turns out to be trivial. Let us put everything into strings. Perl takes care of strings of any length, relieving us of the pesky problem of counting characters. You do not even have to add one for the terminating null character—Perl does not use a terminator.

Our point will simply be a string containing two doubles joined by a comma; our line will be a string containing two points joined by a colon; and our polygon will be a string containing three or more points joined by colons, like so:

$point = "1.0,2.0"; $line = "0.0,0.0:0.0,1.0"; $polygon = "1.0,2.0:6.0,5.0:0.0,3.1";

These “structures” turn out to be easy to scan visually and
very easy to manage, thanks to Perl's join and split functions.
Figure 4 is the **point_on_line** subroutine I
developed using the above scheme.

$epsilon = 1.0e-10; sub point_on_line #(point, line) returns 0 or 1 { # if on - sum of distance to ends should be distance between ends local($point, $line) = @_; local($p_x, $p_y) = split(",", $point); local($l_b, $l_e) = split(":", $line); local($l_b_x, $l_b_y) = split(",", $l_b); local($l_e_x, $l_e_y) = split(",", $l_e); &fabs(&hypot($p_x - $l_b_x, $p_y - $l_b_y) + &hypot($p_x - $l_e_x, $p_y - $l_e_y) - &hypot($l_e_x - $l_b_x, $l_e_y - $l_b_y)) < $epsilon; } sub fabs # (x) returns absolute value of x { local($rv) = @_; $rv = -$rv if $rv < 0.0; $rv; }

**Figure 4. point-on-line
Subroutine**

This subroutine is called with two strings, i.e.

&point_on_line($point, $line);

and returns one if the point is on the line, and zero
otherwise. Note that in this case I put the calling parameters into
local arrays for safety and ease of
maintenance—**$point** and **$line**
are easier to remember next week than **$_[0]** and
**$_[1]**. The **split** function's
use should be obvious. The **point** string gets
split into two coordinate strings by the comma, and the
**line** string gets broken into two point strings
by the colon. Then the points at the beginning and end of the line,
**$l_b** and **$l_e**, get broken
down into their respective coordinates. Finally the coordinates are
passed to our **hypot** function. Note how the
**&** sign is used to prefix a subroutine for
the call. Perl has no equivalent to C's **fabs** so
I quickly rolled my own in Figure 4. I used the
**$epsilon** variable to avoid floating point
roundoff problems.

We have been building this thing from the bottom up. So far we have a routine to test whether a point is on a line. Our goal is a routine to test whether a point is within a polygon. Let me give you an overview of how we are going to solve the problem, provide you with two more Perl routines (which you should be able to read now), and then show you the external “workhorse” routine that does the final inside/on/outside determination.

Our polygon is a closed figure and every point is either inside, outside, or on the boundary (there is actually a mathematical statement called Jordan's Curve Theorem that proves this!). If we can establish a point that is guaranteed to be outside of our polygon, we can test the “insideness” of any point by connecting this test point to our “outside” point by a straight line and counting the number of times the line crosses the polygon. Clearly if the line never crosses the polygon, the point is outside; if it crosses once the point is inside; and, in general, if the line crosses the polygon's boundary an odd number of times, the point is inside. An even number of crossings means the point is outside. To put it another way, every time the line crosses the polygon it changes from being either inside to outside or vice-versa. Starting from the outside point our first crossing (if any) puts us inside; the next crossing (if any) puts us back outside; etc., odd-inside, even-outside. So, it looks like we will need a routine to test whether two lines intersect.

One way of testing for line intersection is to make sure both
endpoints of each line are on opposite sides of the other line.
(Reread and make a few pictures, as before.) This leads to our
final routine. Now we will test whether, when we walk along a line
from the beginning point to the opposite end and then turn to go
straight to another point, we turn counterclockwise or clockwise.
This discussion and the resulting routines are modeled after
Sedgewick's in “Algorithms in C”, pages 349-354 (Robert Sedgewick,
Addison Wesley, 1990). Let us start with the counterclockwise
subroutine, **ccw** in Figure 5.

sub ccw # (three points) return -1, 0, or 1 { local(@points) = @_; local($rv) = 0; local($dx1, $dx2, $dy1, $dy2, $p0x, $p0y, $p1x, $p1y, $p2x, $p2y); ($p0x, $p0y) = split(",", $points[0]); ($p1x, $p1y) = split(",", $points[1]); ($p2x, $p2y) = split(",", $points[2]); $dx1 = $p1x - $p0x; $dy1 = $p1y - $p0y; $dx2 = $p2x - $p0x; $dy2 = $p2y - $p0y; switch: { $rv = 1, last if $dx1 * $dy2 > $dy1 * $dx2; $rv = -1, last if $dx1 * $dy2 < $dy1 * $dx2; $rv = -1, last if ($dx1 * $dx2 < 0.0) || ($dy1 * $dy2 < 0.0); $rv = 1, last if ($dx1 * $dx1 + $dy1 * $dy1) < ($dx2 * $dx2 + $dy2 * $dy2); } $rv; }

**Figure 5. Counterclockwise (ccw
Subroutine)**

The **ccw** routine accepts three points and
compares the slope of the line from the second to the third with
the slope of the line from the first to the second. It is carefully
constructed to handle vertical lines and even the case of collinear
points. Note how the input points are collected into a local array,
**@points**, avoiding side effects and making the
program easy to maintain and understand. The points each get split
into their respective coordinates with the test being carried out
in a block labeled **switch** (for obvious reasons).
By now you have probably guessed that Perl allows labels, just like
C. There is no specific **switch** construct in
Perl, however. The block in **ccw** above is Perl's
way of building a switch. The **last** keyword
immediately exits the block and sometimes the Perl interpretor is
smart enough to convert the block to a C switch statement
internally (although not in this case).

sub intersect # (two lines) returns 0 or 1 { local($l1, $l2) = @_; local($l1_b, $l1_e) = split(":", $l1); local($l2_b, $l2_e) = split(":", $l2); &ccw($l1_b, $l1_e, $l2_b) * &ccw($l1_b, $l1_e, $l2_e) <= 0 && &ccw($l2_b, $l2_e, $l1_b) * &ccw($l2_b, $l2_e, $l1_e) <= 0; }

**Figure 6. Intersection
Subroutine**

Figure 6 is the intersection routine. It is very straightforward. The lines are split into endpoints and the endpoints are tested for “sideness”.

We now have all the building blocks to construct our “inside” subroutine. First we connect our test point with an “outside” point via a straight line. Then we walk around the polygon, testing each polygon side in turn, accumulating the intersections of the sides with the test line. If the number of intersections is odd the point is inside, and vice-versa.

$big_float = 1.0e7; sub lower_left_index # (polygon) returns index of lower left corner { local($polygon) = @_; local($index) = 0; local(@vertices) = split(":", $polygon); local($x_min, $y_min) = split(",", $vertices[0]); local($i, $x, $y); for($i = 1; $i <= $#vertices; $i++) { ($x, $y) = split(",", $vertices[$i]); if(($y < $y_min) || (($y == $y_min) && ($x < $x_min))) { $x_min = $x; $y_min = $y; $index = $i; } } $index; } sub inside # (point, polygon) returns 0 or 1 { local($point, $polygon) = @_; local(@vertices) = split(":", $polygon); local($index) = &lower_left_index($polygon); local($last_index) = $index ? $index - 1 : $#vertices; local($count, $holding_point) = (0, 0); local($i, $lp, $lt, $vertex, $x, $y, $big_x_point); local($check_index) = $index; # true if index is not zero OUTER: for(;;) { # one pass loop for($i = 0, $vertex = $vertices[$#vertices]; $i <= $#vertices; $i++) { $lp = join(":", $vertex, $vertices[$i]); $vertex = $vertices[$i]; if(&point_on_line_2($point, $lp)) { $count = 1; print "Point on boundary\n" if defined $verbose; last OUTER; } } ($x, $y) = split(",", $point); $big_x_point = join(",", $big_float, $y); $lt = join(":", $point, $big_x_point); for($i = 0; $i <= $#vertices; $i++) { if(&point_on_line_2($vertices[$index], $lt)) { $holding_point = 1; } else { if(!$holding_point) { $lp = join(":", $vertices[$last_index], $vertices[$index]); $count++ if &intersect($lp, $lt); } elsif($holding_point && (&ccw($point, $big_x_point, $vertices[$index]) != &ccw($point, $big_x_point, $vertices[$last_index]))) { $count++; } $last_index = $index; $holding_point = 0; } $index++; if($check_index && ($index == @vertices)) { $check_index = 0; $index = 0; } } last; } # one pass "loop" $count & 1; }

**Figure 7. lower-left-index and inside
Subroutines**

The pair of subroutines in Figure 7 are Perl implementations
of functions suggested by Sedgewick in the section “Inclusion in a
Polygon” (pages 353-355), although more complete. The
**lower_left_index** subroutine returns the index of
the polygon point with the smallest x coordinate among all points
with the smallest y coordinate. This is necessary to account for
the special cases when a polygon vertex is on the test line. Note
how, in the third line within the block, the
**@vertices** array is automatically constructed by
splitting the **$polygon** string with colons. Each
element of the **@vertices** array is a pair of
coordinates separated by a comma, one for each vertex. Whenever we
need individual x, y pairs the split function is called, as we have
seen before. This occurs before the **for** loop to
initialize **$x_min** and **$y_min**,
and inside the loop to generate a new test pair
**$x**, **$y**. The upper limit in
the loop variable **$i** is
**$#vertices**, which is the index of the last
member of **@vertices**. Perl automatically keeps
one of these variables for each array. The last statement in this
routine, **$index;** simply establishes the return
value.

The inside subroutine is our “workhorse” function. It is
admittedly fairly complicated, but you should be able to follow the
logic if you have read this far. Here is some help. The variable
**$index** is used to walk from vertex to vertex
around the polygon starting from the index returned by
**lower_left_index**. If this index is anything
other than zero, it will need to be reset after the vertex with the
largest index is encountered. The variable
**$check_index** keeps track of both whether this
resetting will be necessary and, if so, whether it has been done
yet. The variable **$last_index** is the index of
the last vertex that was not on the test line. Generally this is
the index of the vertex “behind” **$index**.

The **OUTER** label takes advantage of one of
Perl's handiest features, the ability to exit effortlessly from
nested loops. You can do this in C if you like
**goto**. In the present example the polygon sides
are created via the **join** function using the
colon as the separator. The sides are stored in
**$lp**. If the test point is on a polygon edge,
there is no need to test further, so the **OUTER**
loop is exited immediately. Note that setting
**$count** to one in this case is equivalent to
counting the boundary as inside the polygon, since one is an odd
number. It would be trivial to count the boundary as outside or
even as a special case by modifying the if block containing the
statement: **last OUTER;**

If the point is not on an edge, a horizontal line from the
test point to an outside point with an x coordinate equal to
**$big_float** is constructed and stored in
**$lt**. The remainder of this function tests for
either line intersection or “sideness” depending on whether the
previous vertex was on the test line, incrementing
**$count** as appropriate. The return value of the
inside function is 1, if the point is inside, and 0 if the point is
outside.

#!/usr/local/bin/perl while(<DATA>) { chop; $polygon .= $_ . ":"; } chop $polygon; for(;;) { print "Enter x and y separated by a comma (q to quit): "; chop($point = <STDIN>); last if $point =~ /^[qQ]/; print("No comma! Try again.\n"), redo if $point !~ /,/; $point =~ s/ +//g; print "Checking point: $point\n"; printf "%s\n", &inside($point, $polygon) ? "inside" : "outside"; } __END__ 5.0,4.0 0.0,0.0 10.0,5.0 0.0,10.0

**Figure 8. Driver Routine for Inside
Subroutine**

A simple driver routine for the inside subroutine is presented in Figure 8. This routine reads its data from the end of the perl program itself. Anything following the line:

__END__

is considered data and is accessed through the (automatically
opened) **DATA** file handle. Two new operators
introduced in this driver are “**.**” which
concatenates two strings and “**.=**” which appends
one string to another. That is, “**.=**” is to
“**.**” as “**+=**” is to
“**+**”. The **chop** function
removes the last character from each element of its argument list.
Note how the line:

chop $polygon;

trims the final colon from the polygon string, so that it is a legitimate polygon. Replace my data with your own if you want to run this driver, but be sure to put a comma between the x and y coordinates.

You have seen two examples of how Perl can be used for prototyping. I hope that from these examples you have gained a feel for Perl's syntax. More importantly, I hope that you have seen how using Perl can free you from concentrating on programming specific details, like memory allocation. Instead, you can direct your efforts toward getting your algorithm up and running. I have discovered that, in many cases, the Perl prototype was sufficient for my purposes, saving me the time of coding the program in C or C++ at all!

When I do recode a prototyped algorithm from Perl to another language, I have found that it is easy to change gears. The logic is behind me, freeing me to concentrate on C specifics, memory allocation/deallocation, input/output, error reporting, etc.

My suggestion to the reader is to program a simple
application in Perl and see for yourself how this very elegant and
powerful language works. You may not save any time with the first
program or two, but it will not be long before the benefits of Perl
appear. If you feel ambitious, try writing a routine to replace my
**point_on_line**. I mentioned earlier that my
algorithm for testing whether a point is on a line is not very
efficient. Another, more efficient scheme, is to first check
whether the point's x coordinate is within the x range of the line
and, if so, whether the point's y coordinate satisfies the equation
of the line. Vertical lines are special cases.

Among the many algorithms I have prototyped in Perl are LZW
data compression (the same as used in the UNIX compress utility),
RSA encryption, many matrix operations including
eigenvalue/eigenvector determination and a code generator that
outputs C code from a database. I even have a little program called
“perls” that reads a database of perl programming tips and prints a
random tip to the screen. [I can provide this program to The *Linux Journal* and/or its audience via Internet. Let me know if you are
interested.]

[Yes, we are. We would like to put it on our web site, perhaps even in a cgi script.]

Jim Shapiro is a consultant specializing in programming mathematical algorithms. He is presently developing a GIS system for a telecommunications company. When he isn't on his Linux system hacking away in C or Perl he can often be found on the squash courts. Jim is a founding member of LUGOR, the Linux User's Group Of the Rockies.

*Programming Perl* by Larry Wall and
Randal L. Schwartz, O'Reilly & Associates, Inc., 1992. If you
are serious about learning Perl, this is the book to read. It is
all here, including some very sophisticated examples. Not
recommended for beginners, however.

*Learning Perl* by Randal L. Schwartz,
O'Reilly & Associates, Inc., 1993. A tutorial divided into
lesson sized chapters.

*Teach Yourself Perl in 21 Days* by David
Till, SAMS publishing, 1995. My personal favorite. Looks more
daunting (841 pages) than it is. I got so excited I read it in
seven days. Read this one, then “Programming perl”, and you will
soon be an expert.

The “man” pages. Not bad if you want to get the flavor of the language, but mine seem dated.

Copyright © 1994 - 2018 Linux Journal. All rights reserved.