By Michael Schilli
If you are like me, and really enjoy buying cheap goodies on the Internet, you might feel uncertain at times as to whether the things you ordered really will arrive just three days later. The thrill of spending can cause bargain hunters to lose track. It seems only natural to store the orders you place in a database and update it when the goods arrive. Of course, you need the database to be available from wherever you spend your money.
And that could be anywhere: in the office, at home, or maybe on your laptop in a cheap motel room. Maybe you don't even have Internet access, and after recovering from the shock of being inundated by mysterious parcels, you might have to update the database locally, only to synchronize it later when access to the net has been re-established.
The Git distributed version control system [2] seems like a perfect choice for the job. Put together in no more than two weeks by the kernel guru Linus Torvalds to replace the proprietary Bitkeeper product, Git manages the Linux kernel, patching and merging thousands of files at the blink of an eye. Of course, speed isn't an issue for the application I have in mind, but it's good to know that Git can synchronize various distributed filesystem trees without breaking a sweat, thanks to its integrated push/pull replication mechanism (Figure 1).
Information concerning Internet purchases and their estimated times of arrival are cached on the local hard disk. The CPAN module used for this creates a separate file for each item. Git versions this information in a local repository, which is the typical approach for a distributed versioning system. This approach gives developers full functionality without Internet access and without fazing the central repositories with temporary developments. They can check in new versions, check out old ones, create parallel development branches or merge branches with others, and many other things.
To synchronize the local repository, the local user issues a "push" to another instance someplace else. I've chosen a "central" repository on a hosting service as the point of contact for all clients, from which they push their changes to and pull their updates from. In reality, of course, no such thing as a centralized Git repository exists, and it is up to you which instance you contact to download patches or new features.
If you are working on a new laptop that doesn't know about the marvels of this tracking system yet, you can create a clone of the centralized instance with shop clone. After doing so, you do not need Internet access to query the clone or feed new data to it; instead, the changes are simply synchronized later with the centralized instance once you have reestablished the connection.
If you are interested in implementing this solution, you will need to create an empty Git repository on a hosting service with SSH access. As you can see in Figure 2, you need git init for this. Just create a new directory, change to the directory you created, and give the git init command. Now you might think that the client could simply clone this repository locally, but you need to think again: For some obscure reasons, it has to jump through a burning hoop first.
As you can see in Figure 3, the client also needs to run git init to create an empty repository. Next, add a testfile for test purposes, run git add to insert it, and complete the process by running the commit command. Then, with remote add, define a remote branch with an origin alias pointing to the central repository on the server. The push origin master command then synchronizes the master (default) branch on the client with the similarly named branch on the server. It is a good idea to have the client's public key in the server's ~/.ssh/authorized_keys file to avoid having to type the password each time you access the repository via the network.
If another client wants to retrieve the data from the server-based repository, it just clones it, as shown in Figure 4. Once on the local machine, it is a full copy of the server repository that also has the ability to git push changes checked in locally to the server.
The script in Listing 1 shows the Perl script, which accepts the commands listed in Table 1 and issues the corresponding Git commands. It uses Sysadm::Install from CPAN to jump quickly back and forth between various directories (cd and cdback) and run various Git commands at the command line.
The order data is stored in a cache implemented by the Cache::FileCache CPAN module and the value 0 used for cache_depth in line 32 sends every entry to a file in the local ~/data/shop directory. Line 24 uses mkd from Sysadm:: Install to create the directory if it does not already exist. In contrast to Perl's mkdir() function, mkd does some error checking and issues a log of its activities, assuming you enabled Log4perl.
The cache's set() and get() methods accept the product name (e.g., "iPod") as a key and creates/retrieves entries in the format defined by the record_new() function (line 120). Besides the product name, a record also includes two date fields of the DateTime type. The first field, bought, stores the order date and uses the today() method in line 132 to set this to the current date.
Users can specify the expected arrival date of an item with a buy command,
shop buy 'dell netbook' 30
which specifies a delivery period of 30 days for a netbook ordered from Dell. Lines 133ff. convert this day value into a DateTime::Duration type object, which, with a bit of operator magic, can later be added to a DateTime object to calculate the expected delivery date. The latter is then stored in the second DateTime field, aptly named expected.
Both DateTime objects contain a formatter, DateTime::Format::Strptime, which defines the expected date format as "%F", thus expecting the object to be represented as YYYY-MM-DD in a string context.
Cache::FileCache has no trouble storing this deeply nested data structure in a file; it flattens the structure internally before doing so, then, when reading it later, converts it back into Perl objects. After the cache file has made its way into the local repository workspace, the shop script makes the changes permanent by running git add and git commit.
Listing 1: shop.pl |
001 #!/usr/local/bin/perl -w 002 use strict; 003 use Sysadm::Install qw(:all); 004 use Cache::FileCache; 005 use DateTime; 006 use 007 DateTime::Format::Strptime; 008 use File::Basename; 009 010 my ($H) = glob "~"; 011 my $data_dir = "data"; 012 my $repo_name = "shop"; 013 my $repo_dir = 014 "$H/$data_dir/$repo_name"; 015 016 my $repo_url = 017 'mschilli@box.goofhost.com:repos/shop.git'; 018 019 my ($action) = shift; 020 die 021 "usage: $0 buy|got|list ..." 022 unless defined $action; 023 024 mkd $repo_dir 025 unless -d $repo_dir; 026 027 my $CACHE = 028 Cache::FileCache->new( 029 { 030 cache_root => 031 "$H/$data_dir", 032 cache_depth => 0, 033 namespace => $repo_name, 034 } 035 ); 036 037 if ( $action eq "buy" ) { 038 my ( $item, $days ) = 039 @ARGV; 040 die 041 "usage: $0 buy item days" 042 if !defined $days 043 or $days =~ /\D/; 044 045 my $rec = 046 record_new( $item, 047 $days ); 048 if ( $CACHE->get($item) ) { 049 die 050 "$item already exists."; 051 } 052 $CACHE->set( $item, $rec ); 053 git_commit( 054 "Added item $item"); 055 } 056 elsif ( $action eq "got" ) { 057 my ($key) = @ARGV; 058 die "usage: $0 got item" 059 unless defined $key; 060 my $path = 061 path_to_key($key); 062 git_cmd( "git", "rm", "-f", 063 basename($path) ); 064 git_cmd( "git", "commit", 065 "-a", "-m$key deleted" ); 066 067 } 068 elsif ( $action eq "list" ) { 069 record_list(); 070 071 } 072 elsif ( $action eq "push" ) { 073 git_cmd( 074 "git", "push", 075 "origin", "master" 076 ); 077 078 } 079 elsif ( $action eq "pull" ) { 080 git_cmd( 081 "git", "pull", 082 "origin", "master" 083 ); 084 085 } 086 elsif ( $action eq "clone" ) 087 { 088 cd "$H/$data_dir"; 089 rmdir $repo_name; 090 cmd_run( "git", "clone", 091 $repo_url ); 092 cdback; 093 094 } 095 elsif ( $action eq "init" ) { 096 git_cmd( "git", "init" ); 097 git_cmd( 098 "git", "remote", 099 "add", "origin", 100 $repo_url 101 ); 102 } 103 else { 104 die 105 "Unknown action '$action"; 106 } 107 108 ############################# 109 sub path_to_key { 110 ############################# 111 my ($key) = @_; 112 113 return 114 $CACHE->_get_backend() 115 ->_path_to_key( 116 $repo_name, $key ); 117 } 118 119 ############################# 120 sub record_new { 121 ############################# 122 my ( $item, $days ) = @_; 123 124 my $df = 125 DateTime::Format::Strptime 126 ->new( 127 pattern => "%F", 128 time_zone => "local", 129 ); 130 131 my $now = 132 DateTime->today(); 133 my $exp = 134 $now + 135 DateTime::Duration->new( 136 days => $days ); 137 138 $now->set_formatter($df); 139 $exp->set_formatter($df); 140 141 return { 142 item => $item, 143 bought => $now, 144 expected => $exp, 145 }; 146 } 147 148 ############################# 149 sub record_list { 150 ############################# 151 152 for my $key ( 153 $CACHE->get_keys() ) 154 { 155 my $r = 156 $CACHE->get($key); 157 print "$r->{item} ", 158 "bought:$r->{bought} ", 159 "exp:$r->{expected} ", 160 "\n"; 161 } 162 } 163 164 ############################# 165 sub git_commit { 166 ############################# 167 my ($msg) = @_; 168 169 cd $repo_dir; 170 cmd_run( "git", "add", 171 "." ); 172 cmd_run( 173 "git", "commit", 174 "-a", "-m$msg" 175 ); 176 cdback; 177 } 178 179 ############################# 180 sub git_cmd { 181 ############################# 182 cd $repo_dir; 183 cmd_run(@_); 184 cdback; 185 } 186 187 ############################# 188 sub cmd_run { 189 ############################# 190 my ( $stdout, $stderr, 191 $rc ) = tap @_; 192 if ( $rc != 0 ) { 193 die $stderr; 194 } 195 } |
The script needs to peek behind the drapes of the cache to delete the entry after the goods have arrived because you need to know a file's name to be able to delete it from a Git repository; however, the cache generates a 40-byte hash as the filename for each key (as in d549f860476c...). If the user tries to delete the entry for the iPod by issuing a shop got iPod command, the path_to_key function defined in lines 109ff. peeps behind the drapes of the cache abstraction and retrieves the matching pathname. Line 62 then issues a git rm -f command to remove the file both from the local workspace and the entry in the local repository. A subsequent commit makes this permanent.
The shop list command tells the record_list() function, lines 149ff. to call the cache implementation's get_keys() method and return all the keys that exist in the cache as a list. It then passes each element of the list to the get() method, which retrieves the cache entry for a key from disk.
The git commands are all issued through the cmd_run function defined in lines 188ff., which internally calls tap from the Sysadm::Install module, which in turn runs the command lines, intercepts STDERR and STDOUT, and returns them neatly to the caller as return parameters.
The local repository logs all the transactions and therefore could easily reinstate past states. If you are interested in the transactions that have occurred in the repository, you can simply query the repository's log by issuing the git log command in the ~/data/shop directory, as Figure 5 shows.
If two clients independently store the same product in their local repositories, a conflict occurs as soon as the second client attempts to shop push its changes to the centralized server.
Figure 6 shows the nasty error message issued by Git when a second client attempts to shop push. Ideally, when a shop pull follows, git should notice that the changes to the remote branch and the local file are identical, but because the object here is a binary file created by Cache::FileCache, it doesn't trust its own judgment and complains instead. Text files, on the other hand, are handled perfectly by Git in this respect.
If a conflict occurs, Git enters a merge state and waits for the user to resolve it. In this case, you need to delete the product locally (shop got) and then create it again (shop buy). The next push is accepted, and the server repository is happy again.
If the client were to delete the product, not create it again, but issue a push instead, the chainsaw ordered in Figure 6 would disappear from the server repository. After a shop pull on the first client, it would disappear from its local repository, too.
The script needs the Sysadm::Install, Cache::FileCache, DateTime, and DateTime::Format::Strptime modules from CPAN, which are best installed by way of a CPAN shell to resolve dependencies automatically. Alternatively, you can use your Linux package manager if its repository has all of the required modules on hand. Their names might be slightly different in this case, though: libsysadm-install-perl, for example, is Debian's idea of Sysadm::Install.
Before you start, you will need to set up an empty server-side repository manually with the use of git init (Figure 2). Perl modules are not required on the hosted machine; however, you do need the git program, but it is part of most of today's Linux systems anyway.
Figure 7 shows how the first client initializes its local repository, uses shop buy to enter a couple of purchases, runs shop list to query the local database, and ultimately transfers data to the server with shop push. The shop commands do not produce any output if they run successfully.
The second client then uses shop clone to create a local clone of the server repository, as shown in Figure 8. That client also makes a couple of purchases and tags the iPod Nano transaction with Amazon as complete. It then issues a shop push to push the local changes to the server.
The same procedure applies to all further clients; again, they first clone the server repository, make some changes to the local repository, push the new data to the server, and receive the latest updates from the other clients via a server pull. The feast of orders can go on and on, and thankfully, Git will notice immediately any deliveries that fail to reach their destination because of mail problems or supplier sloppiness.
INFO |
[1] Listings for this article: http://www.linux-magazine.com/resources/article_code
[2] Swicegood, Travis. Pragmatic Version Control Using Git. Pragmatic Bookshelf, 2008 |