Unix Power ToolsUnix Power ToolsSearch this book

42.6. urllib2

urllib2 provides an extended, extensible interface to web resources. urllib2's application-level interface is essentially identical to urllib's urlopen( ) function (Section 42.5). Underneath, however, urllib2 explicitly supports proxies, caching, basic and digest authentication, and so forth.

urllib2 uses an Opener, made up of a series of Handlers, to open a URL; if you know you want to use a particular set of features, you tell urllib2 which Handlers to use before you call urlopen( ). urllib2 is extensible largely because if you need to deal with some odd set of interactions, you can write a Handler object to deal with just those interactions and incorporate it into an Opener with existing Handlers. This allows you to deal with complex behavior by just combining very simple sets of code.

For example, to retrieve a web resource that requires basic authentication over a secure socket connection:

>>> import urllib2
>>> authHandler = urllib2.HTTPBasicAuthHandler( )
>>> authHandler.add_password("private, "https://www.domain.com/private",
...                          "user", "password")
>>> opener = urllib2.build_opener(authHandler)
>>> urllib2.install_opener(opener)
>>> resource = urllib2.urlopen("https://www.domain.com/private/foo.html")
>>> print resource.read( )

To implement a new Handler, you simply subclass from urllib2.BaseHandler and implement the methods appropriate to the behavior you want to handle.

-- DJPH



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.