The application-level access to most web client activities is through modules called urllib and urllib2 (Section 42.6). urllib is the simple web interface; it provides basic functions for opening and retrieving web resources via their URLs.
The primary functions in urllib are urlopen( ), which opens an URL and returns a file-like object, and urlretrieve( ), which retrieves the entire web resource at the given URL. The file-like object returned by urlopen supports the following methods: read( ), readline( ), readlines( ), fileno( ), close( ), info( ), and geturl( ). The first five methods work just like their file counterparts. info( ) returns a mimetools.Message object, which for HTTP requests contains the HTTP headers associated with the URL. geturl( ) returns the real URL of the resource, since the client may have been redirected by the web server before getting the actual content.
urlretrieve( ) returns a tuple (filename, info), where filename is the local file to which the web resource was copied and info is the same as the return value from urlopen's info( ) method.
If the result from either urlopen( ) or urlretrieve( ) is HTML, you can use htmllib to parse it.
urllib also provides a function urlencode( ), which converts standard tuples or dictionaries into properly URL-encoded queries. Here is an example session that uses the GET method to retrieve a URL containing parameters:
>>> import urllib >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query?%s" % params) >>> print f.read( )
The following example performs the same query but uses the POST method instead:
>>> import urllib >>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params) >>> print f.read( )
-- DJPH
Copyright © 2003 O'Reilly & Associates. All rights reserved.