Book HomeJava and XSLTSearch this book

Chapter 20. The LWP Library

Contents:

LWP Overview
The LWP Modules
The HTTP Modules
The HTML Modules
The URI Module

The library for web access in Perl (LWP), is a bundle of modules that provide a consistent, object-oriented approach to creating web applications. The library, downloaded as the single file named libwww-perl, contains the following classes:

File
Parses directory listings.

Font
Handles Adobe Font Metrics.

HTML
Parses HTML files and converts them to printable or other forms.

HTTP
Provides client requests, server responses, and protocol implementation.

LWP
The core of all web client programs. It creates network connections and manages the communication and transactions between client and server.

URI
Creates, parses, and translates URLs.

WWW
Implements standards used for robots (automatic client programs).

Each module in LWP provides different building blocks that allow you to do basically anything you can do in a Web browser—from connection, to request, to response and returned data. These parts are then encapsulated by an object to give a standard interface to every web program you write.

This chapter covers all the modules in the LWP distribution. While some modules are more useful than others, everything you need to know about LWP is included here. To begin with, the following section gives an overview of how LWP works to create a web client.

20.1. LWP Overview

Any web transaction requires an application that can establish a TCP/IP network connection and send and receive messages using the appropriate protocol (usually HTTP). TCP/IP connections are established using sockets, and messages are exchanged via socket filehandles. (See Chapter 15, "Sockets" for information on how to manually create socket applications.) LWP provides an object for this application with LWP::UserAgent for clients; HTTP::Daemon provides a server object. The UserAgent object acts as the browser: it connects to a server, sends requests, receives responses, and manages the received data. This is how you create a UserAgent object:

use LWP::UserAgent;
$ua = LWP::UserAgent->new( );

The UserAgent now needs to send a message to a server requesting a Universal Resource Locator (URL) using the request method. request forms an HTTP request from the object given as its argument. This request object is created by HTTP::Request.

An HTTP request message contains three elements. The first line of a message always contains an HTTP command called a method; a Universal Resource Identifier (URI), which identifies the file or resource the client is querying; and the HTTP version number. The following lines of a client request contain header information, which provides information about the client and any data it is sending the server. The third part of a client request is the entity body, which is data sent to the server (for the POST method). The following is a sample HTTP request:

GET /index.html HTTP/1.0
User-Agent: Mozilla/1.1N (Macintosh; I; 68K)
Accept: */*
Accept: image/gif
Accept: image/jpeg

LWP::UserAgent->request forms this message from an HTTP::Request object. A request object requires a method for the first argument. The GET method asks for a file, while the POST method supplies information such as form data to a server application. There are other methods, but these two are most commonly used.

The second argument is the URL for the request. The URL must contain the server name because this is how the UserAgent knows where to connect. The URL argument can be represented as a string or as a URI::URL object, which allows more complex URLs to be formed and managed. Optional parameters for an HTTP::Request include your own headers, in the form of an HTTP::Headers object, and any POST data for the message. The following example creates a request object:

use HTTP::Request;

$req = HTTP::Request->new(GET, $url, $hdrs);

The URL object is created like this:

use URI::URL;

$url = URI::URL->new('www.ora.com/index.html');

And a header object can be created like this:

use HTTP::Headers;

$hdrs = HTTP::Headers->new(Accept => 'text/plain',
                          User-Agent => 'MegaBrowser/1.0');

Then you can put them all together to make a request:

use LWP::UserAgent;  # This will cover all of them!

$hdrs = HTTP::Headers->new(Accept => 'text/plain',
                          User-Agent => 'MegaBrowser/1.0');

$url = URI::URL->new('www.ora.com/index.html');
$req = HTTP::Request->new(GET, $url, $hdrs);
$ua = LWP::UserAgent->new( );
$resp = $ua->request($req);
if ($resp->is_success) {
        print $resp->content;}
else {
        print $resp->message;}

Once the request has been made by the user agent, the response from the server is returned as another object, described by HTTP::Response. This object contains the status code of the request, returned headers, and the content you requested, if successful. In the example, is_success checks to see if the request was fulfilled without problems, thus outputting the content. If unsuccessful, a message describing the server's response code is printed.

There are other modules and classes that create useful objects for web clients in LWP, but the above examples show the most basic ones. For server applications, many of the objects used above become pieces of a server transaction, which you either create yourself (such as response objects) or receive from a client (such as request objects).

Additional functionality for both client and server applications is provided by the HTML module. This module provides many classes for both the creation and interpretation of HTML documents.

The rest of this chapter provides information for the LWP, HTTP, HTML, and URI modules.



Library Navigation Links

Copyright © 2002 O'Reilly & Associates. All rights reserved.