By Michael Schilli
During early phases of a project, developers tend to experiment with various options, and sometimes it is too early to save prototypes in the version control system. If you haven't set up a repository, or if you haven't been able to agree on its structure, you might find yourself working without a safety net. In this case, good code might fall victim to an over-zealous rm * or your editor's delete command.
This month's Perl script, noworries, can give you automatic version control. Whenever you save a file with your editor, and whenever you use the shell to manipulate files using commands like rm or mv, a daemon hidden in the background receives a message. When it does, it picks up the new or modified file, and uses RCS to version the file. All of this is transparent to the user. Figure 1 shows a user creating and then deleting a new file in the Shell. Without some Perl wizardry, the file, myfile would have been gone for good, but calling noworries -l myfile tells us that the versioner created a backup copy just 17 seconds earlier. noworries -r 1.1 myfile retrieves the file and writes its content to STDOUT.
The script does not use manipulated shell functions or any other dirty tricks. Of course, an instance of the script needs to be running in the background - the -w (for "watch") option handles this - to start the File Alteration Monitor (FAM) utility [2], which in turn subscribes to the operating system kernel's Dnotify interface. Whenever the file system creates, moves, or deletes a directory or file, or manipulates file content, the kernel is notified of the event. The File Alteration Monitor (FAM) tells Dnotify that it is interested in what is going on in various directories and receives notifications in return. CPAN has a Perl module (SGI::FAM) that moves FAM's C interface to Perl. It is event-based and does not require CPU-intensive polling. Calling the next_event() method blocks the daemon until the next event occurs.
Figure 2 shows another example. In this case a file is created, and then modified twice in a row. The daemon receives a message for each event and creates three versions of the files in RCS (1.1, 1.2, and 1.3). Calling noworries -l myfile displays these versions, even if the file has been deleted in the meantime.
Asking for revision 1.2 by specifying -r 1.2 and the filename file lets noworries retrieve the version after the first modification and prints its content to STDOUT. The shell command shown in Figure 2 redirects the output back to a file named file, which is immediately versioned again by the daemon. Figure 3 shows the daemon's activity: just to be on the safe side, the daemon logs its activities in the file /tmp/noworries.log.
The noworries script takes care of files and directories, no matter how deeply they are nested, below ~/noworries in the user's home directory. This is where you would typically set up new directories, or extract tarballs if you wanted the protection of a safety net. The daemon creates a structure below ~/.noworries.rcs to record the changes behind the scenes. Each subdirectory contains a RCS directory with the versioned files, traditionally named ending in ,v. RCS has been a Unix tool from day one and is still used today by version control systems such as CVS or Perforce. The following set of commands checks in a version of file:
echo "Data!" >file mkdir RCS ci file co -l file
The program ci from the RCS command set creates RCS/file,v in the delta format used by RCS. The co command at the end, in combination with the -l (for "lock") option, restores the current version to the current directory. If you then modify file, and follow this up with another ci/co command sequence, you end up with two versions in RCS/file,v, which can be retrieved separately using co. The rlog program, another member of the RCS family, lets you view the meta-data for file versions you have checked in.
The noworries listing (Listing 1) defines the names of these tools in Lines 25 through 27. If you pass them to the script in this way, make sure they reside in your PATH to allow noworries to call them. If needed, you can hard code the full paths.
noworries uses the mkd (make directory), cp (file copy), cd (change directory), cdback (go back to original directory), and tap (execute a program and collect output) functions exported by Sysadm::Install. Regular readers of my Perl column may recall them from [4].
Listing 1: noworries |
001 #!/usr/bin/perl -w 002 ############################# 003 # noworries - 004 # m@perlmeister.com 005 ############################# 006 use strict; 007 use Sysadm::Install qw(:all); 008 use File::Find; 009 use SGI::FAM; 010 use Log::Log4perl qw(:easy); 011 use File::Basename; 012 use Getopt::Std; 013 use File::Spec::Functions 014 qw(rel2abs abs2rel); 015 use DateTime; 016 use 017 DateTime::Format::Strptime; 018 use Pod::Usage; 019 020 my $RCS_DIR = 021 "$ENV{HOME}/.noworries.rcs"; 022 my $SAFE_DIR = 023 "$ENV{HOME}/noworries"; 024 025 my $CI = "ci"; 026 my $CO = "co"; 027 my $RLOG = "rlog"; 028 029 getopts( "dr:wl", 030 \my %opts ); 031 032 mkd $RCS_DIR 033 unless -d $RCS_DIR; 034 035 Log::Log4perl->easy_init({ 036 category => 'main', 037 level => $opts{d} 038 ? $DEBUG 039 : $INFO, 040 file => $opts{w} && 041 !$opts{d} 042 ? "/tmp/noworries.log" 043 : "stdout", 044 layout => "%d %p %m%n" 045 } 046 ); 047 048 if ( $opts{w} ) { 049 INFO "$0 starting up"; 050 watcher(); 051 052 } elsif( 053 $opts{r} or $opts{l} ) { 054 055 my ($file) = @ARGV; 056 pod2usage("No file given") 057 unless defined $file; 058 059 my $filename = 060 basename $file; 061 062 my $absfile = 063 rel2abs($file); 064 my $relfile = 065 abs2rel( $absfile, 066 $SAFE_DIR ); 067 068 my $reldir = 069 dirname($relfile); 070 cd "$RCS_DIR/$reldir"; 071 072 if ( $opts{l} ) { 073 rlog($filename); 074 } else { 075 sysrun( 076 $CO, "-r$opts{r}", 077 "-p", $filename 078 ); 079 } 080 cdback; 081 082 } else { 083 pod2usage( 084 "No valid option given"); 085 } 086 087 ############################# 088 sub watcher { 089 ############################# 090 cd $SAFE_DIR; 091 092 my $fam = SGI::FAM->new(); 093 watch_subdirs( ".", $fam ); 094 095 while (1) { 096 # Block until next event 097 my $event = 098 $fam->next_event(); 099 100 my $dir = 101 $fam->which($event); 102 my $fullpath = 103 $dir . "/" . 104 $event->filename(); 105 106 # Emacs temp files 107 next 108 if $fullpath =~ /~$/; 109 110 # Vi temp files 111 next if $fullpath =~ 112 /\.sw[px]x?$/; 113 114 DEBUG "Event: ", 115 $event->type, "(", 116 $event->filename, ")"; 117 118 if ( $event->type eq 119 "create" 120 and -d $fullpath ) { 121 DEBUG "Adding monitor", 122 " for directory ", 123 $fullpath, "\n"; 124 $fam->monitor( 125 $fullpath); 126 } 127 elsif ( $event->type =~ 128 /create|change/ 129 and -f $fullpath ) { 130 check_in($fullpath); 131 } 132 } 133 } 134 135 ############################# 136 sub watch_subdirs { 137 ############################# 138 my ($start_dir, $fam) = @_; 139 140 $fam->monitor($start_dir); 141 142 for my $dir ( 143 subdirs($start_dir) ) { 144 DEBUG "Adding monitor ", 145 "for $dir"; 146 $fam->monitor($dir); 147 } 148 149 return $fam; 150 } 151 152 ############################# 153 sub subdirs { 154 ############################# 155 my ($dir) = @_; 156 157 my @dirs = (); 158 159 find sub { 160 return unless -d; 161 return if /^\.\.?$/; 162 push @dirs, 163 $File::Find::name; 164 }, $dir; 165 166 return @dirs; 167 } 168 169 ############################# 170 sub check_in { 171 ############################# 172 my ($file) = @_; 173 174 if ( !-T $file ) { 175 DEBUG "Skipping non-", 176 "text file $file"; 177 return; 178 } 179 180 my $rel_dir = 181 dirname($file); 182 my $rcs_dir = 183 "$RCS_DIR/$rel_dir/RCS"; 184 185 mkd $rcs_dir 186 unless -d $rcs_dir; 187 188 cd "$RCS_DIR/$rel_dir"; 189 cp "$SAFE_DIR/$file", "."; 190 my $filename = 191 basename($file); 192 193 INFO "Checking $filename", 194 " into RCS"; 195 my ($stdout, $stderr, 196 $exit_code) = tap( 197 $CI, "-t-", 198 "-m-", $filename 199 ); 200 INFO "Check-in result: ", 201 "rc=$exit_code ", 202 "$stdout $stderr"; 203 204 ($stdout, $stderr, 205 $exit_code) = tap( 206 $CO, "-l", $filename); 207 cdback; 208 } 209 210 ############################# 211 sub time_diff { 212 ############################# 213 my ($dt) = @_; 214 215 my $dur = 216 DateTime->now() - $dt; 217 218 for ( 219 qw(weeks days hours 220 minutes seconds)) { 221 my $u = 222 $dur->in_units($_); 223 return "$u $_" if $u; 224 } 225 } 226 227 ############################# 228 sub rlog { 229 ############################# 230 my ($file) = @_; 231 232 my ( $stdout, $stderr, 233 $exit_code ) 234 = tap( $RLOG, $file ); 235 236 my $p = 237 DateTime::Format::Strptime 238 ->new( pattern => 239 '%Y/%m/%d %H:%M:%S' ); 240 241 while ($stdout =~ 242 /^revision\s(\S+).*? 243 date:\s(.*?); 244 (.*?)$/gmxs) { 245 246 my ($rev, $date, $rest) 247 = ($1, $2, $3); 248 249 my ($lines) = ($rest =~ 250 /lines:\s+(.*)/); 251 $lines ||= 252 "first version"; 253 254 my $dt = 255 $p->parse_datetime( 256 $date); 257 258 print "$rev ", 259 time_diff($dt), 260 " ago ($lines)\n"; 261 } 262 } 263 264 __END__ 265 266 =head1 NAME 267 268 noworries - Dev Safety Net 269 270 =head1 SYNOPSIS 271 272 # Print previous version 273 noworries -r revision file 274 275 # List all revisions 276 noworries -l file 277 278 # Start the watcher 279 noworries -w |
Before SGI::FAM can receive messages about modified files below a directory, FAM first has to let the kernel know that it is interested in doing so. Events start to roll in after calling $fam->monitor(...) with ~/noworries as its argument, whenever a new directory or file is created directly in ~/noworries. However, this does not apply to any subdirectories. For this reason, SGI::FAM immediately launches another monitor for subdirectories whenever it notices that a new subdirectory has been created. A similar trick applies if noworries starts up when a deeply nested directory structure below ~/noworries already exists. (We'll get to that in a moment.)
Setting the -w option launches noworries in daemon mode and runs the infinite loop defined in the watcher function in Line 88 of Listing 1. The call to the next_event() method in Line 98 blocks the execution flow until one of four FAM-monitored events occurs. To find out which one of potentially many active directory monitors has triggered, the SGI::FAM object's which() method, which is called in Line 101, returns the directory that triggered the event. The event's filename() method returns the name of the new, existing, modified, or deleted object, which can be a directory or a file.
The type() method gives us the event type. The types that noworries is interested in are create and change. The monitor() method adds new directories to the list of things to watch, while the check_in() function defined in Line 170 handles new or modified files. A similar approach is used for adding directories. The daemon uses find to locate directories when launched, assuming that ~/noworries already exists. The subdirs() helper function in Line 153 digs down deeper and deeper into the directory tree and returns any directories it finds no matter how deeply nested they may be. The watch_subdirs() function iterates over all of them and passes their relative pathnames to FAM for surveillance.
The documentation section in Line 266 is not just for convenient access to a nicely formatted manual page whenever a user calls perldoc noworries. It is also output by the pod2usage() function, if the user fails to provide the required command line options. It does not make much sense to version temporary vi or emacs files, so they are filtered out in Lines 107 through 112.
When a file needs to be checked into the version control system, check_in in Line 170 first checks if the file is a text file. check_in discards binary files in Line 174. The function is called with a pathname relative to ~/noworries, as this is where watcher() jumps to in Line 90. Line 189 copies the original file to the RCS tree, and Line 195 calls the ci tool with the -t and -m options. It passes a value of - to both, as the first - and all following - check-in comments are meaningless. But you have to give ci something to chew on to avoid an interactive prompt. Line 204 checks the file out, as described earlier on. The next time a change occurs, the checked out copy is overwritten, and the new version is checked in by ci.
noworries calls the RCS rlog function to find out which versions of a file are available. rlog returns the version numbers with the date (formated as yyyy/mm/dd hh:mm:ss) and also reveals the number of lines that have changed in comparison to the previous version. Of course, it can't give us this information for the initial version, but if you are told that version 1.2 has lines: +10 -0, this means there are 10 new lines in comparison to 1.1, and that no lines have been deleted.
The DateTime module from CPAN helps tremendously with date calculations. The DateTime::Format::Strptime module parses the RCS date information, and converts the value to seconds after the epoch. To do this, the constructor expects a format string with the following pattern: "%Y/%m/%d %H:%M:%S", and the call to parse_datetime() returns a fully initialized DateTime object if successful. The while loop that starts in Line 241 navigates the slightly overwhelming output by the rlog helper, using a multiple-line regular expression to do so.
The time_diff() function in Line 211 expects a DateTime object and calculates how old a version is in seconds, minutes, hours, days, or weeks. This is easier to read for the heavy noworries user.
Unfortunately, Dnotify, the mechanism used by FAM, doesn't scale well and bows out at around two hundred subdirectories. To solve this problem dnotify has been replaced by inotify in more recent kernels. inotify makes better use of resources and scales more easily. FAM is also obsolete, and Gamin [3] its designated successor.
The kernel's Dnotify mechanism does not use file system inodes, but filenames, so that mv file1 file2 triggers two events: a delete type and a create type event. This does not bother noworries, as the script ignores delete events, and if the same file appears some time later, it is just checked in as the latest version.
The script should only be used on your local hard disk, and not with NFS, as FAM can only be efficient if the NFS target is also running a FAM. If not, it polls the target at regular intervals, and this makes the whole thing somewhat ineffective.
You need to install the SGI::FAM, Sysadm::Install, DateTime, DateTime::Format::Strptime, and Pod::Usage CPAN modules; a CPAN Shell scan will help to quickly resolve the dependencies. If you see a FAM.c:813: error: storage size of `RETVAL' isn't known error when building SGI::FAM, change Line 813 in FAM.c from enum FAMCodes RETVAL; to FAMCodes RETVAL;; re-running make should then give you the goodies.
To make sure that the daemon is always running, add a line such as x777:3:respawn:su mschilli -c "/home/mschilli/bin/noworries -w" to /etc/inittab, and then let the Init daemon know by running init q. The process has to run with the ID of the current user (mschilli in this case) to ensure that $ENV{HOME} in the script points to the right home directory. In this case, the init process launches the noworries daemon when you boot your machine, and the respawn option ensures that the process restarts immediately if for some reason it is inadvertently terminated. But before you do all of this, test the daemon on the command line to see if everything is working properly.
The -d for debug option might be a help if you are experiencing problems; it displays detailed status information on standard output rather than logging in /tmp/noworries.log.
INFO |
[1] Listing for this article: http://www.linux-magazine.com/Magazine/Downloads/63/Perl
[2] FAM Homepage: http://oss.sgi.com/projects/fam/ [3] Gamin Homepage: http://www.gnome.org/~veillard/gamin/ [4] "Perl Shell Scripts," Michael Schilli: http://www.linux-magazine.com/issue/52/Perl_Shell_Scripts.pdf |
THE AUTHOR |
Michael Schilli works as a Software Developer at Yahoo!, Sunnyvale, California. He wrote "Perl Power" for Addison-Wesley and can be contacted at mschilli@perlmeister.com. His homepage is at http://perlmeister.com. |