Tcl/Tk: The Swiss Army Knife of Web Applications

Bill Schongar

Issue #55, November 1998

Tcl/Tk offers many uses to the web programmer. Mr. Schongar describes a few of them.

While many people think of Tcl/Tk (Tool Command Language, pronounced “tickle”; Tk stands for TCL toolkit) as a great tool for cross-platform GUI (graphical user interface) applications, not as many consider it for web programming tasks. The truth is, its ease of use and flexibility make it a natural choice for CGI (common gateway interface), server-parsed embedded scripting, applications delivered through a plug-in, and even as a tool for creating your own web server from the ground up.

I will present examples of Tcl at work in these situations and also some ideas and additional ways Tcl can be used for whatever web programming task is needed.

CGI Basics in Tcl

Reading environment variables and printing to standard output are the building blocks of CGI. Sure, you want to deal with incoming data, run fun processes and format the output, but first things first. By understanding how any given language performs the basic tasks, you can start building things that actually do something without spending a lot of time mired in details.

Listing 1 is a simple script in Tcl that reads the environment variables set by the web server and sends them back as HTML output to the user. The first thing to note is the choice of shell for interpreting the script. Tcl/Tk distributions include two shells: wish and tclsh. wish is the “windowing” shell, and it carries the overhead of initializing the GUI functions, consuming more than double the system resources of its command-line-only counterpart. For this article, I'm assuming you are running version 8.0 or later of Tcl. (Check by running tclsh and then typing info tclversion.)

Next, a standard header is sent back to let the browser know what type of information is coming. In the Listing 1 script, the browser learns the data is HTML text rather than plain text, an image or something else entirely. puts is the Tcl equivalent of Perl print or C printf, complete with \n for newlines, and the output can be redirected to a file instead of the default standard out (stdout). So, once the script has told the browser that the input is HTML, it provides the HTML, in this case a title tag and text.

To print out every environment variable, we need to generate a list of which ones exist, then loop through and write out the names and their values. Doing this requires a better sense of how Tcl treats variables and executes commands. First, Tcl doesn't care too much about what type of data is stored in a variable—text or numbers of any length are just fine. Second, variables in Tcl come in three forms: a single variable, an array and a formatted list. To see how these work, look at the following lines of code:

set foo "123abc"
set junk(1) "a"
set junk(2) "b"
set bar "a b c d e f g h i j"

The set function is used to create or modify the values of a variable. In this case, foo is a single variable while junk is an array and bar is a list. What makes bar a list is that each character is separated from the other by a space, allowing certain TCL functions to automatically parse the data.

When you see something in square brackets in a Tcl program, it is normally being used to execute what is contained in those brackets, and to substitute the return value for the entire expression. For example, if I have a function called countdown which returns the number of seconds remaining until the year 2000, I can easily display that result by writing puts [countdown] in my Tcl code. If it returns 300, the number 300 is printed. There are exceptions to this general rule, of course, and we'll run into one of them when we get to parsing user input. So, the line

set mylist [array names env]

means “create a variable named mylist and set it to the result of the command array names env”.

Arrays are useful for related groups of information, such as environment variables. The “array” group of functions allows you to search through the names of array elements, as well as many other helpful operations. Executing the command array names junk will make a list of all the element names in the array, which right now would be “1 2”. By telling Tcl you want to look through the env array, you obtain a list of names for all the environment variables set by the server, and you store that list in the variable mylist.

Looping through each item in mylist is made easy by the foreach command. It automatically parses a Tcl list and assigns the value of the current element to the variable name you specify. In Listing 1, the name of each environment variable will be stored in foo when its time comes, and the puts line will show the name of the variable as it originally appeared, followed by its interpreted value. The “$” sign indicates to Tcl that this is a variable and its value should be substituted.

Processing User Data

Since you'll want to do more than just print out some environment variables or static data, here's how to process incoming data in Tcl. The first step involves getting access to the environment variables, which we've covered. This will tell us whether the data is coming in by the GET method, stored in the QUERY_STRING environment variable, or as a POST, stored in standard input. Let's find out using the slightly modified version of our previous program shown in Listing 2. Yes, this program is a lot longer. The extra length, however, comes from dealing with getting the user input, formatting it and storing it in an array. Once I explain how those parts work, I'll show you a much shorter way of writing this program.

The parse (cgiParse) and decoding (urlDecode) procedures are taken from Brent Welch's Practical Programming in Tcl/Tk, with minor modifications. The parse routine is reasonably straightforward. It determines whether the data is stored in QUERY_STRING or standard input, then stores it into a variable called text.

Special characters can cause problems for CGI programs, so the server encodes percent signs and slashes as their hexadecimal equivalents and spaces as plus signs. Your program must convert the data to its original form. Doing that in C and other languages can be difficult, but Tcl makes it fairly easy. As you can see, each time data is being processed, either for the name or value of the variable, that data is sent to urlDecode. There, the regsub command works its magic. To use it, specify (in this order) a search pattern, the original data, its replacement and the variable in which to store it.

Notice that the foreach loop at the end of the cgiParse procedure is doing two things that our previous foreach loop didn't do. First, it is specifying more than one temporary value for use in each loop. That is, the first time through the loop, element 1 of the list will be stored in name, and element 2 will be stored in value. The next loop will use elements 3 and 4, and so on. Any number of variables can be specified in this way, which makes processing long lists a breeze. The second thing it is doing differently is directly using the results of a command as the list, rather than creating a variable to hold the list first. You can do it either way, but in this example you save a fractional bit of processing time and memory.

Once the parsing and decoding libraries have been laid out, the program starts its run. The content header is sent to the browser, and cgiParse is run in order to store all user-entered values (from a form or some other way) into the variable array cgi. Then it loops through each element in the cgi array and prints out the names and values of all the elements.

One benefit to the way the parsing functions are set up is that you can test user-input values on the command line. Since it doesn't rely on finding a GET or POST method, it will get the data wherever possible, defaulting to the command line. So, you could easily test your cgi script before uploading it to the server, without having to create an elaborate wrapper to set environment variables.

Function Libraries—Yours and Everyone Else's

Tcl procedures, or procs, are your subroutines. If you have created some procs, you can easily put them in their own Tcl script, then use the source command to load those scripts so they are ready for use. To keep your code to a minimum, you may want to use the cgiParse and urlDecode routines shown in Listing 2. If you saved them as “cgistuff.tcl”, you could rewrite the script in Listing 2 as:

#!/usr/bin/tclsh
source cgistuff.tcl
puts "Content-type: text/html \n\n"
cgiParse
foreach foo [array names cgi] {
   puts "Variable: $foo Value: $cgi($foo)"
}

The source command loads and executes a Tcl script, so be careful that you don't have any unwanted commands hiding in that script outside of a procedure.

Before you go off writing too many of your own procedures, though, you'll want to take a look at what is already available. A lot of talented people have put time and effort into writing well-documented, very functional procedure libraries, such as Don Libes' cgi.tcl library, which covers everything from basic parsing to cookies and file uploads. (See Resources.)

Data Handling

Sometimes the data you need doesn't come from the user. Product catalogs, maps, schedules and more all come from some sort of external data file, whether real databases like Oracle or Informix or something as simple as a delimited ASCII file. No matter what your data needs are, Tcl can help you out.

Flat files are the easiest to deal with. Open a file, read through line by line until you reach the end or find what you are looking for, then close the file. In Tcl, reading in a file line by line looks like this:

set f [open foo.txt r]
while {[gets $f stuff] != -1} {
  # Do something with the line
  # of data
  (`stuff')
}
close $f

Just as in Perl or C, you create a file handle from which all subsequent operations work. The gets command grabs one line at a time from a file and stores it to a variable. If the return value from gets is 1, you've reached the end of the file. So what do you do with the data once you have it? For the most part, you're going to become fast friends with the split and lindex commands.

split breaks up a string, either character-by-character or at every occurrence of specified characters and returns a new list of the elements. If you want to access specific elements of the list, lindex allows you to specify the list and the element's position and returns that element's value. Note that elements are numbered starting at 0, so an index value of 1 points to the second element in a list.

A bit higher on the effort scale is processing a special database format, such as a dBASE file or some other defined database format. You may be fortunate enough to find existing filters for this kind of file (two different filters exist for dBASE files), but if you need to write your own, Tcl 8.0 handles binary data quite well. Use the read command to grab whatever size byte blocks you want, then use binary scan to quickly break up and format it.

If you're concerned about speed or already have a C routine to parse your external data, Tcl makes it easy to create new Tcl commands encapsulated in loadable libraries. For most functions, it's as easy as cutting and pasting into the library framework provided by Tcl and adding some Tcl-specific commands to create or set variables.

When you get to the top of the database world and are dealing with Oracle or Informix, you're already covered. Tcl extensions have been made for Oracle, Informix and probably others by the time you read this article. Most of them provide access to the SQL layer for the database, but you can also access the lower-level functions of the system. All of them are available on-line, although compiling them sometimes requires access to the commercial libraries shipped with your RDBMS.

Client-Server CGI

One problem with basic CGI is that it doesn't provide for real persistence. Sure, you can use cookies, file-based data on the server side or append horribly long strings to the URL, but none of those is an ideal solution. In addition, if you're loading things like inventory data from a database, you have to account for initialization time and overhead every time the script is run.

In some situations, the best solution is to have a secondary server process running that shares data with Tcl through sockets in a true client-server fashion. In that way, your server could load the needed information and become a persistent data store, for whatever purpose. In Tcl, that's an easier task than you might expect. While the actual code is too lengthy to include in this article, I'll include an overview here and will be happy to provide additional details by e-mail (bills@multimedia.com).

Sockets in Tcl are designed to be easy to use. The socket command is used by applications wanting to establish a listening post on a port, as well as clients that want to connect to any server, Tcl or otherwise. How would you listen on a port? Just use:

socket -server sayHello 9999

Now you have a server listening on port 9999 that will execute the Tcl procedure sayHello whenever a new client connects. What if you want an asynchronous socket? Use:

socket -server -async sayHello 9999

When clients want to connect to you, they just point to your IP address and port using the socket command:

socket 10.0.0.1 9999

When sayHello executes, it receives three arguments: the socket channel your Tcl server has opened to the client, the IP address of the client and the client's port. You can configure the socket channel for the type of buffering and blocking you want, and you'll normally set up a fileevent for the channel. A fileevent is used to generate notification when the channel becomes either readable or writable (your choice or use both), so that you don't have to poll the socket for new data all the time. Now you and the client are ready to exchange information.

So, once you've decided on what your server will do, your CGI program can parse the data as usual, quickly establish a socket connection, and then let the server process the information.

Extending the Client—Tcl/Tk Plug-in

For some projects, you may want to do more than the browser is able to support. By providing the end user with a plug-in, you get the benefits of being able to run a real application right inside their existing browser without too much of a hassle. One drawback to most plug-ins is that they run only under Microsoft Windows, making them unsuitable for real cross-platform work. Tcl's plug-in doesn't have this problem—you can download precompiled binaries for Linux, Solaris, SunOS and yes, even MS Windows. You may also find that by the time you read this, it has been ported to other platforms as well, such as the Macintosh OS.

Using the plug-in, you can run Tclets, which are small Tcl/Tk scripts that run in a restricted (for security reasons) Tcl environment. You and your users can define just how much access you want the plug-in to provide, eliminating or rerouting commands and situations which could be hazardous to your machine's health.

Once you have a Tclet created and your users have the Tcl plug-in, reference it in an HTML page using the

<EMBED SRC>

tag. So, if your Tclet is called foo.tcl, the tag would look like this:

<embed src="foo.tcl" width=400 height=300>

If you're wondering what kinds of things have already been made to take advantage of the plug-in, look no further than http://www.tcltk.com/tclets/, which contains everything from Tetris clones to Adaptive Optics demonstrations and VRML editors.

Extending the Server with Tcl—Server-Parsed Tcl and More

Server-parsed HTML has been around for awhile, ranging from basic server-side includes (SSI) to integrated environments complete with database access. It provides dynamically-generated HTML pages without the overhead of calling an external CGI program, and makes it easy even for non-programmers to access all the functionality it provides.

Typically, when a file with a special extension such as .foo is referenced, the server scans through the HTML and looks for special tags. When those tags are found, it executes whatever instructions they contain, then replaces those sections in the document with the output from the command. Those tags could be anything from the current date to a dynamically generated HTML table with a product price list.

Several solutions exist for using Tcl as a server-parsed scripting language. Two of the most powerful commercial products are NeoWebScript from NeoSoft and Velocigen for Tcl from Binary Evolution. Both products extend the Apache web server with an in-process module, so that they are running all the time in wait mode, ready to do their work. One big difference between the two is that while Velocigen follows the common trend of using a special file extension to identify a file which needs parsing, NeoWebScript follows the more traditional SSI structure of embedding the command in comments. Examples are shown in Listing 3 and Listing 4.

With these more advanced server-side parsers, you can also obtain a level of data persistence through internal variables. For example, you could make a web scavenger hunt on your site to keep a list of the visited pages, and when all the required ones have been seen by a particular user, that user wins. Wins what? I don't know—let marketing worry about it.

Web Servers in Tcl

You won't be able to go out and compete with Apache for market share, but web servers created in Tcl are easy to write, extensible and portable across all platforms. As we saw earlier, sockets are easy to implement in Tcl, which gives you more time to focus on customizing the server to meet your needs, rather than spending it on getting the basics to work.

If you want to see a nice implementation of this concept, take a look at Tcl-HTTPD, freely available from Scriptics. It has CGI support, server-parsed scripting and a host of dynamic configuration options, just to name a few aspects. More basic examples are also available from a variety of Tcl sources on the web, as well as an excellent article by Steve Ball and a white paper by Brent Welch. (See Resources.)

Conclusion

Tcl provides an easy way of addressing almost any web programming issue. With a large development community, a wide selection of extensions and freely available function libraries, it is a web power tool waiting to be discovered. Whether client- or server-side, you get a lot of options without a lot of hassle.

Resources