How to create a useful new Web service by tapping in to the power of two other freely available Web services.
Last month, we looked at the latest incarnation of Web services offered by on-line giant Amazon. Amazon was one of the first companies to embrace Web services, and although some of its newer offerings require payments on a monthly or per-query basis, basic catalog searches are still available free of charge.
If we think of each individual Web service as a function call, we can think of a collection of Web services, such as Amazon Web Services (AWS), as a software library. And although we can certainly create interesting applications with such libraries, it is often useful to create new libraries that sit on top of the existing ones. In many ways, the history of software is the history of creating increasingly powerful abstractions by stacking libraries on top of one another. Outside of the classroom, most of us haven't ever had to implement a sort algorithm or create a buffered I/O library, simply because such things have been written and optimized by previous generations of programmers.
I thus believe that it's useful for us to consider AWS not as a set of routines that we can incorporate into end-user programs, but rather as a set of low-level libraries on top of which we can (and should) create new libraries appropriate for our specific needs.
This month, we look at a simple example of what I mean. The project will reflect my love of books. The Internet has made it difficult for me to stop buying used books, because so many are available at low prices. But, I'm fortunate to be spending several years in Skokie, Illinois, which has an excellent public library. Skokie's library has not only an extensive collection, but it also has a Web-based interface to the book catalog. Our project for this month, thus, is to create a Web service that integrates Amazon's catalog with the information from the Skokie public library. In other words, we're going to write a Web service that itself relies upon another Web service. The input to our service will be an International Standard Book Number (ISBN); the output will be an indication of the book's availability and price at Amazon and the Skokie library.
In some ways, this Web service will duplicate the excellent Book Burro plugin for the Firefox Web browser, which I often use to find the best bargains. And indeed, Book Burro looks at both bookstores and public libraries in order to find books. I recommend Book Burro to everyone who uses Firefox. But, I believe that building your own simple Web service, even if it duplicates the functionality of another program, is a worthwhile endeavor.
Moreover, Web services have the advantage of being available from any programming language and any application. I can implement my Web service using Ruby, and people will still be able to access it from Java, Python, Perl or virtually any other language. In many ways, this achieves what object broker middleware services like CORBA had promised, only without the baggage that made CORBA a more complex (but arguably more secure and rich) programming platform. It makes a Web service more powerful than a simple software library, because it can be accessed from any platform or language, so long as the requesting computer is connected to the Internet.
In order to integrate an ISBN search for the Skokie library, we're going to need a way to query the library for information about book availability. Unfortunately, my library doesn't have a Web services API for querying its database. But, it does have the next-best thing, namely a simple Web interface that we can query.
There are several ways to look through the output from a Web page. Because many sites now use HTML that can be parsed as if it were XML, we might want to use an XML-parsing library to read through the response from the library's Web site, looking for particular text in specific places.
Much as I might like the idea of such an approach, I'm probably not the only Web developer who takes a more practical, quick-and-dirty look. I have used my library's Web site enough times to know that there is a limited number of responses it might send back to me. As a result, I'll use the reliable, if somewhat stupid, approach of looking for particular cues in the HTTP response.
Our program (skokie-lookup.rb, Listing 1) is written in Ruby, a language I have grown to enjoy more and more over the past few months. We begin by importing the included Net::HTTP module, which defines classes and methods that provide HTTP-based communication.
We then check to make sure that we have at least one command-line argument, by looking at the built-in ARGV array. If the length of ARGV is 0, we know we weren't passed any arguments, and we should give the user a brief indication of how the program should be used.
Then, we set up a number of variables that will be needed later on. The output variable is a string to which we will add any output we need to send to the user. We also create three Regexp (regular expression) objects, which we will use in our loop.
Next comes the meat of the program. We iterate over each element of ARGV, first checking that it is a ten-character ISBN containing only numbers and the letter X. We then query the Skokie library's Web site for that ISBN, passing Net::HTTP.get_response the hostname and path to the program we want. The HTTP response, including its headers and body, is then available in our response variable.
Now we compare the response body against our three regular expressions, checking which it matches. Using Ruby's << operator for concatenation, we add an appropriate message to the output variable for each ISBN. Finally, just before the program exits, it gives a full report of ISBNs.
The above program works just fine, and it provides an easier way to query the Skokie library catalog than the standard Web pages. But, I'm interested in knowing how much the book would cost if I were to buy it from Amazon, as well as whether it's available from the library. With all of this information, I can then decide if I want to buy the book, check it out of the library or neither.
Last month, we saw how we could use a REST-style request (that is, HTTP GET with arguments) to retrieve information from Amazon. Now we will write a program that performs that retrieval and then pulls out the relevant XML data.
As you might remember, we can retrieve Web services information from Amazon by sending an HTTP request to webservices.amazon.com, asking for the document /onca/xml, and then specifying the Service, Operation and AWSAccessKeyId name-value pairs. If we are interested in learning about new and used prices for that ISBN, we then pass the ItemId parameter, and indicate that we want the ResponseGroup known as OfferSummary.
Because Amazon returns XML in all of its responses, including those invoked with REST, we can parse through the XML to find the lowest prices for our book. Ruby comes with the REXML-parsing library, which works with XML in a number of different ways; we will use it to scan through Amazon's response for the appropriate code.
Finally, we can rework our existing code, such that it will search the Skokie library for the ISBN and produce a textual summary. Listing 2 contains a program (combined-lookup.rb) that produces such combined output.
combined-lookup.rb begins in almost the same way as skokie-lookup.rb, although it imports the rexml/document module along with the net/http module. It then iterates through ISBNs that were passed on the command line, ignoring those that don't fit the strict definition.
The main addition to this program begins with the creation of a string named amazon_params. In theory, we could have built this string in a number of different ways, many of them less complicated than the combination of methods I chose. But, I felt that using a hash in this way would make it easier to modify the code later on, even if it requires a bit more time to understand at first.
The basic idea is as follows: we create a hash, in which the keys are the AWS REST parameter names, and the values are the corresponding parameter values. In order to get these parameters into the standard format of param1=value1¶m2=value2, we use map to create an array from the keys and values of the hash. Our array will contain strings, each of which is in the param=value format, joined together with an equal sign. Finally, we use join to combine all of those pairs with & signs between them, producing a string that we assign to amazon_params.
With our parameters in place, we use Net::HTTP.get_response, just as we did before in skokie-lookup.rb. The hostname will be different, and the requested URL on that host will also be quite different, incorporating the parameters that we just assigned to amazon_params. But, the request is sent in the same way, and we retrieve the response in the same way as well.
However, whereas the Skokie library sends its response in HTML, Amazon replies using XML. So, we fire up REXML, creating a new instance of REXML::Document with the contents of the Amazon response. We then use the elements method on the response's root node to find the lowest new, used and collectible prices. (Amazon provides each of these prices separately, which I admit is a bit annoying.) If the text within that node is nil, no such price exists, and we indicate that to the user. Otherwise, we can assume we got a price back—and a price formatted with a dollar sign and decimal point, at that—and we display it for the user.
Now that we have created a combined lookup tool, how can we turn it into a Web service? (For the purposes of simplicity, I'm going to use XML-RPC. It would be equally valid to use SOAP or even to look for REST parameters.)
The answer is easier than you might think. We will need to modify the program to take its inputs from the Web instead of ARGV. We also will need to send the output over the XML-RPC, back to the client that sent the original request.
But the end result, as you can see in Listing 3, is not terribly different from what we had in Listing 2. And because it operates as a Web service, we can now incorporate its results into new programs that we might write. Better yet, we can create new Web services that use this service as an underlying foundation, thus stacking the functions even deeper, into even more useful libraries.
Listing 3 begins by creating a new instance of XMLRPC::Server on port 8080. It then adds a new handler, which we call atf.books, and which both accepts an array as input and returns one as output. Using Ruby's block notation, the handler then iterates over each ISBN that it receives via the XML-RPC method call.
The rest of the program is largely the same as combined-lookup.rb, with the exception of the output. Output to an XML-RPC method call, at least in this Ruby library, is accomplished by placing the output in the final line of the block. Because we plan to return an array, we need to create and populate the array. We thus define output variable as an empty array and add one element to it for each ISBN we check. Each element of that array then will be a hash (known as a struct in XML-RPC jargon), with the ISBN key pointing to the book's ISBN, and the New, Used and Collectible keys pointing to the prices retrieved from Amazon.
The server program then concludes with a call to server.serve, starting an infinite listener loop for a simple HTTP server.
To test this program, you need an RPC client; a simple one is shown in Listing 4 and takes its arguments from the command line. You'll notice that we use Ruby's exception-handling mechanism to watch for potential problems. If there is an error on the server, we can trap it and print a useful debugging message.
Seasoned programmers rarely implement everything themselves. The days in which every application needed its own video and printer drivers, to say nothing of a filesystem or operating system, are long behind us. Instead, we now have hierarchies of software libraries, with each library making use of lower-level data and functions and also performing similar tasks for higher-level libraries.
Web services haven't changed the need for building new libraries on top of old ones. Indeed, we can expect to see an explosion of such new libraries in the future. The difference is that new libraries will often be based on Web services, which provide platform and language independence. We will see basic, middleware and high-level Web services, available from anywhere on the Internet and callable from any operating system or language. This month, we looked at one way in which we can create a new Web service out of an old one. Each call to our xmlrpc-lookup server fired off a query to Amazon's Web services. Information from Amazon was then combined with another data set, with results that are useful to anyone living in Skokie, Illinois. We can expect to see similar aggregating Web services in the future, both free of charge and for pay.
Resources for this article: /article/8828.