Perl keeps track of online orders

Watching the Delivery Man


A distributed database based on the distributed Git version control system relies on a Perl script to help users track Internet orders. When the goods arrive, purchasers update their stock counts, wherever they may be at the time.

By Michael Schilli

UPS

If you are like me, and really enjoy buying cheap goodies on the Internet, you might feel uncertain at times as to whether the things you ordered really will arrive just three days later. The thrill of spending can cause bargain hunters to lose track. It seems only natural to store the orders you place in a database and update it when the goods arrive. Of course, you need the database to be available from wherever you spend your money.

And that could be anywhere: in the office, at home, or maybe on your laptop in a cheap motel room. Maybe you don't even have Internet access, and after recovering from the shock of being inundated by mysterious parcels, you might have to update the database locally, only to synchronize it later when access to the net has been re-established.

The Git distributed version control system [2] seems like a perfect choice for the job. Put together in no more than two weeks by the kernel guru Linus Torvalds to replace the proprietary Bitkeeper product, Git manages the Linux kernel, patching and merging thousands of files at the blink of an eye. Of course, speed isn't an issue for the application I have in mind, but it's good to know that Git can synchronize various distributed filesystem trees without breaking a sweat, thanks to its integrated push/pull replication mechanism (Figure 1).

Figure 1: The central Git repository on a hosted server is the point of exchange between the individual local repositories at home, at work, and on the road.

Being Prepared

Information concerning Internet purchases and their estimated times of arrival are cached on the local hard disk. The CPAN module used for this creates a separate file for each item. Git versions this information in a local repository, which is the typical approach for a distributed versioning system. This approach gives developers full functionality without Internet access and without fazing the central repositories with temporary developments. They can check in new versions, check out old ones, create parallel development branches or merge branches with others, and many other things.

To synchronize the local repository, the local user issues a "push" to another instance someplace else. I've chosen a "central" repository on a hosting service as the point of contact for all clients, from which they push their changes to and pull their updates from. In reality, of course, no such thing as a centralized Git repository exists, and it is up to you which instance you contact to download patches or new features.

If you are working on a new laptop that doesn't know about the marvels of this tracking system yet, you can create a clone of the centralized instance with shop clone. After doing so, you do not need Internet access to query the clone or feed new data to it; instead, the changes are simply synchronized later with the centralized instance once you have reestablished the connection.

By Your Own Bootstraps

If you are interested in implementing this solution, you will need to create an empty Git repository on a hosting service with SSH access. As you can see in Figure 2, you need git init for this. Just create a new directory, change to the directory you created, and give the git init command. Now you might think that the client could simply clone this repository locally, but you need to think again: For some obscure reasons, it has to jump through a burning hoop first.

Figure 2: Creating an empty repository, dubbed buy.git, server-side.

As you can see in Figure 3, the client also needs to run git init to create an empty repository. Next, add a testfile for test purposes, run git add to insert it, and complete the process by running the commit command. Then, with remote add, define a remote branch with an origin alias pointing to the central repository on the server. The push origin master command then synchronizes the master (default) branch on the client with the similarly named branch on the server. It is a good idea to have the client's public key in the server's ~/.ssh/authorized_keys file to avoid having to type the password each time you access the repository via the network.

Figure 3: A remote branch "origin" is added to a local repository to point to the repository on the server. "git push" then feeds local changes to the remote repository.

If another client wants to retrieve the data from the server-based repository, it just clones it, as shown in Figure 4. Once on the local machine, it is a full copy of the server repository that also has the ability to git push changes checked in locally to the server.

Figure 4: Other clients can now clone the remote repository and then use "git push" to upload their changes to the server.

Wrapped in Perl

The script in Listing 1 shows the Perl script, which accepts the commands listed in Table 1 and issues the corresponding Git commands. It uses Sysadm::Install from CPAN to jump quickly back and forth between various directories (cd and cdback) and run various Git commands at the command line.

The order data is stored in a cache implemented by the Cache::FileCache CPAN module and the value 0 used for cache_depth in line 32 sends every entry to a file in the local ~/data/shop directory. Line 24 uses mkd from Sysadm:: Install to create the directory if it does not already exist. In contrast to Perl's mkdir() function, mkd does some error checking and issues a log of its activities, assuming you enabled Log4perl.

The cache's set() and get() methods accept the product name (e.g., "iPod") as a key and creates/retrieves entries in the format defined by the record_new() function (line 120). Besides the product name, a record also includes two date fields of the DateTime type. The first field, bought, stores the order date and uses the today() method in line 132 to set this to the current date.

Users can specify the expected arrival date of an item with a buy command,

shop buy 'dell netbook' 30

which specifies a delivery period of 30 days for a netbook ordered from Dell. Lines 133ff. convert this day value into a DateTime::Duration type object, which, with a bit of operator magic, can later be added to a DateTime object to calculate the expected delivery date. The latter is then stored in the second DateTime field, aptly named expected.

Both DateTime objects contain a formatter, DateTime::Format::Strptime, which defines the expected date format as "%F", thus expecting the object to be represented as YYYY-MM-DD in a string context.

Cache::FileCache has no trouble storing this deeply nested data structure in a file; it flattens the structure internally before doing so, then, when reading it later, converts it back into Perl objects. After the cache file has made its way into the local repository workspace, the shop script makes the changes permanent by running git add and git commit.

Listing 1: shop.pl
001 #!/usr/local/bin/perl -w
002 use strict;
003 use Sysadm::Install qw(:all);
004 use Cache::FileCache;
005 use DateTime;
006 use
007   DateTime::Format::Strptime;
008 use File::Basename;
009
010 my ($H)       = glob "~";
011 my $data_dir  = "data";
012 my $repo_name = "shop";
013 my $repo_dir =
014   "$H/$data_dir/$repo_name";
015
016 my $repo_url =
017 'mschilli@box.goofhost.com:repos/shop.git';
018
019 my ($action) = shift;
020 die
021 "usage: $0 buy|got|list ..."
022   unless defined $action;
023
024 mkd $repo_dir
025   unless -d $repo_dir;
026
027 my $CACHE =
028   Cache::FileCache->new(
029   {
030     cache_root =>
031       "$H/$data_dir",
032     cache_depth => 0,
033     namespace => $repo_name,
034   }
035   );
036
037 if ( $action eq "buy" ) {
038   my ( $item, $days ) =
039     @ARGV;
040   die
041     "usage: $0 buy item days"
042     if !defined $days
043       or $days =~ /\D/;
044
045   my $rec =
046     record_new( $item,
047     $days );
048   if ( $CACHE->get($item) ) {
049     die
050      "$item already exists.";
051   }
052   $CACHE->set( $item, $rec );
053   git_commit(
054     "Added item $item");
055 }
056 elsif ( $action eq "got" ) {
057   my ($key) = @ARGV;
058   die "usage: $0 got item"
059     unless defined $key;
060   my $path =
061     path_to_key($key);
062   git_cmd( "git", "rm", "-f",
063     basename($path) );
064   git_cmd( "git", "commit",
065     "-a", "-m$key deleted" );
066
067 }
068 elsif ( $action eq "list" ) {
069   record_list();
070
071 }
072 elsif ( $action eq "push" ) {
073   git_cmd(
074     "git",    "push",
075     "origin", "master"
076   );
077
078 }
079 elsif ( $action eq "pull" ) {
080   git_cmd(
081     "git",    "pull",
082     "origin", "master"
083   );
084
085 }
086 elsif ( $action eq "clone" )
087 {
088   cd "$H/$data_dir";
089   rmdir $repo_name;
090   cmd_run( "git", "clone",
091     $repo_url );
092   cdback;
093
094 }
095 elsif ( $action eq "init" ) {
096   git_cmd( "git", "init" );
097   git_cmd(
098     "git", "remote",
099     "add", "origin",
100     $repo_url
101   );
102 }
103 else {
104   die
105 "Unknown action '$action";
106 }
107
108 #############################
109 sub path_to_key {
110 #############################
111   my ($key) = @_;
112
113   return
114     $CACHE->_get_backend()
115     ->_path_to_key(
116     $repo_name, $key );
117 }
118
119 #############################
120 sub record_new {
121 #############################
122   my ( $item, $days ) = @_;
123
124   my $df =
125     DateTime::Format::Strptime
126     ->new(
127     pattern   => "%F",
128     time_zone => "local",
129     );
130
131   my $now =
132     DateTime->today();
133   my $exp =
134     $now +
135     DateTime::Duration->new(
136     days => $days );
137
138   $now->set_formatter($df);
139   $exp->set_formatter($df);
140
141   return {
142     item     => $item,
143     bought   => $now,
144     expected => $exp,
145   };
146 }
147
148 #############################
149 sub record_list {
150 #############################
151
152   for my $key (
153     $CACHE->get_keys() )
154   {
155     my $r =
156       $CACHE->get($key);
157     print "$r->{item} ",
158       "bought:$r->{bought} ",
159       "exp:$r->{expected} ",
160       "\n";
161   }
162 }
163
164 #############################
165 sub git_commit {
166 #############################
167   my ($msg) = @_;
168
169   cd $repo_dir;
170   cmd_run( "git", "add",
171     "." );
172   cmd_run(
173     "git", "commit",
174     "-a",  "-m$msg"
175   );
176   cdback;
177 }
178
179 #############################
180 sub git_cmd {
181 #############################
182   cd $repo_dir;
183   cmd_run(@_);
184   cdback;
185 }
186
187 #############################
188 sub cmd_run {
189 #############################
190   my ( $stdout, $stderr,
191     $rc ) = tap @_;
192   if ( $rc != 0 ) {
193     die $stderr;
194   }
195 }

Tearing the Drapes

The script needs to peek behind the drapes of the cache to delete the entry after the goods have arrived because you need to know a file's name to be able to delete it from a Git repository; however, the cache generates a 40-byte hash as the filename for each key (as in d549f860476c...). If the user tries to delete the entry for the iPod by issuing a shop got iPod command, the path_to_key function defined in lines 109ff. peeps behind the drapes of the cache abstraction and retrieves the matching pathname. Line 62 then issues a git rm -f command to remove the file both from the local workspace and the entry in the local repository. A subsequent commit makes this permanent.

The shop list command tells the record_list() function, lines 149ff. to call the cache implementation's get_keys() method and return all the keys that exist in the cache as a list. It then passes each element of the list to the get() method, which retrieves the cache entry for a key from disk.

The git commands are all issued through the cmd_run function defined in lines 188ff., which internally calls tap from the Sysadm::Install module, which in turn runs the command lines, intercepts STDERR and STDOUT, and returns them neatly to the caller as return parameters.

Logging

The local repository logs all the transactions and therefore could easily reinstate past states. If you are interested in the transactions that have occurred in the repository, you can simply query the repository's log by issuing the git log command in the ~/data/shop directory, as Figure 5 shows.

Figure 5: The "git log" command issued in the ~/data/shop directory shows the latest transactions in the repository.

Conflicts Happen

If two clients independently store the same product in their local repositories, a conflict occurs as soon as the second client attempts to shop push its changes to the centralized server.

Figure 6 shows the nasty error message issued by Git when a second client attempts to shop push. Ideally, when a shop pull follows, git should notice that the changes to the remote branch and the local file are identical, but because the object here is a binary file created by Cache::FileCache, it doesn't trust its own judgment and complains instead. Text files, on the other hand, are handled perfectly by Git in this respect.

Figure 6: Two clients enter the same product independently, and the central server reports a conflict.

If a conflict occurs, Git enters a merge state and waits for the user to resolve it. In this case, you need to delete the product locally (shop got) and then create it again (shop buy). The next push is accepted, and the server repository is happy again.

If the client were to delete the product, not create it again, but issue a push instead, the chainsaw ordered in Figure 6 would disappear from the server repository. After a shop pull on the first client, it would disappear from its local repository, too.

Installation

The script needs the Sysadm::Install, Cache::FileCache, DateTime, and DateTime::Format::Strptime modules from CPAN, which are best installed by way of a CPAN shell to resolve dependencies automatically. Alternatively, you can use your Linux package manager if its repository has all of the required modules on hand. Their names might be slightly different in this case, though: libsysadm-install-perl, for example, is Debian's idea of Sysadm::Install.

Before you start, you will need to set up an empty server-side repository manually with the use of git init (Figure 2). Perl modules are not required on the hosted machine; however, you do need the git program, but it is part of most of today's Linux systems anyway.

Figure 7 shows how the first client initializes its local repository, uses shop buy to enter a couple of purchases, runs shop list to query the local database, and ultimately transfers data to the server with shop push. The shop commands do not produce any output if they run successfully.

Figure 7: The first client creates the local repository, populates it with data, and then updates the previously empty server repository.

The second client then uses shop clone to create a local clone of the server repository, as shown in Figure 8. That client also makes a couple of purchases and tags the iPod Nano transaction with Amazon as complete. It then issues a shop push to push the local changes to the server.

Figure 8: The second client retrieves the data from the server repository, adds new data, and pushes new entries to the server.

The same procedure applies to all further clients; again, they first clone the server repository, make some changes to the local repository, push the new data to the server, and receive the latest updates from the other clients via a server pull. The feast of orders can go on and on, and thankfully, Git will notice immediately any deliveries that fail to reach their destination because of mail problems or supplier sloppiness.

INFO
[1] Listings for this article: http://www.linux-magazine.com/resources/article_code
[2] Swicegood, Travis. Pragmatic Version Control Using Git. Pragmatic Bookshelf, 2008