Unix Power ToolsUnix Power ToolsSearch this book

22.8. Sorting a List of People by Last Name

It's hard to sort any old list of peoples' names because some people have one-word first and last names like Joe Smith, but other people have multi-part names like Mary Jo Appleton. This program sorts on the last word in each name. That won't take care of the way that names are used everywhere in the world, but it might give you some ideas.

Figure Go to http://examples.oreilly.com/upt3 for more information on: namesort

The script reads from files or its standard input; it writes to standard output.

#! /bin/sh
# Print last field (last name), a TAB, then whole name:
awk '{print $NF "\t" $0}' $* |
# sort (by last name: the temporary first field)
sort |
# strip off first field and print the names:
cut -f2-

If you want more control over the sorting or you're interested in pulling apart names in general, there's a Perl module you might want to look at called Lingua::EN::NameParse. Below is a Perl script that also sorts a list of names by surname.

#!/usr/bin/perl

use Lingua::EN::NameParse;

my $Name_Obj = Lingua::EN::NameParse->new(auto_clean  => 1);
my @names = <STDIN>;
for my $line (sort by_lastname @names){
  chomp($line);
  print $line, "\n";
}

sub by_lastname {
  my @names;
  for my $name ($a, $b) {
    chomp($name);
    if( my $err = $Name_Obj->parse($name) ){
      warn "WARN: Unparsable name ($name): $err";
    }
    my %tmp = $Name_Obj->components;
    push @names, \%tmp;
  }
  return lc $names[0]->{surname_1} cmp lc $names[1]->{surname_1};
}

The script starts by bringing in the Lingua::EN::NameParse library. Then, all lines from standard input are read in and stored in an array. Perl's sort function is particularly flexible in that it can use a user-defined subroutine to determine the desired collating sequence. Here, the subroutine by_lastname receives the next two items of the list to be sorted in the "magical" global variables $a and $b. These names are then parsed by the global Lingua::EN::NameParse object, and the name components are stored in the array @names. It's then a simple matter to alphabetically compare the lowercased surnames and return that value to sort. Although this script may be a little bit more Perl than you wanted to know, the problem of sorting by last names is complex. Fortunately, the Lingua::EN::NameParse module available on CPAN was available to do the heavy lifting for us. In fact, one of most the compelling reasons to learn Perl is the large collection of free library modules stored on the Comprehensive Perl Archive Network (CPAN), which is mirrored throughout the world. For more about CPAN, see Section 41.11.

--JP and JJ



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.