Unix Power ToolsUnix Power ToolsSearch this book

22.2. Sort Fields: How sort Sorts

Unless you tell it otherwise, sort divides each line into fields at whitespace (blanks or tabs), and sorts the lines by field, from left to right.

That is, it sorts on the basis of field 0 (leftmost), but when the leftmost fields are the same, it sorts on the basis of field 1, and so on. This is hard to put into words, but it's really just common sense. Suppose your office inventory manager created a file like this:

supplies     pencils  148
furniture    chairs   40
kitchen      knives   22
kitchen      forks    20
supplies     pens     236
furniture    couches  10
furniture    tables   7
supplies     paper    29

You'd want all the supplies sorted into categories, and within each category, you'd want them sorted alphabetically:

% sort supplies
furniture    chairs   40
furniture    couches  10
furniture    tables   7
kitchen      forks    20
kitchen      knives   22
supplies     paper    29
supplies     pencils  148
supplies     pens     236

Of course, you don't always want to sort from left to right. The command-line option +n tells sort to start sorting on field n; -n tells sort to stop sorting on field n. Remember (again) that sort counts fields from left to right, starting with 0.[66] Here's an example. We want to sort a list of telephone numbers of authors, presidents, and blues singers:

[66]I harp on this because I always get confused and have to look it up in the manual page.

Robert M Johnson      344-0909
Lyndon B Johnson      933-1423
Samuel H Johnson      754-2542
Michael K Loukides    112-2535
Jerry O Peek          267-2345
Timothy F O'Reilly    443-2434

According to standard "telephone book rules," we want these names sorted by last name, first name, and middle initial. We don't want the phone number to play a part in the sorting. So we want to start sorting on field 2, stop sorting on field 3, continue sorting on field 0, sort on field 1, and (just to make sure) stop sorting on field 2 (the last name). We can code this as follows:

% sort +2 -3 +0 -2 phonelist
Lyndon B Johnson      933-1423
Robert M Johnson      344-0909
Samuel H Johnson      754-2542
Michael K Loukides    112-2535
Timothy F O'Reilly    443-2434
Jerry O Peek          267-2345

A few notes:

There are a couple of variations that are worth mentioning. You may never need them unless you're really serious about sorting data files, but it's good to keep them in the back of your mind. First, you can add any "collation" operations (discard blanks, numeric sort, etc.) to the end of a field specifier to describe how you want that field sorted. Using our previous example, let's say that if two names are identical, you want them sorted in numeric phone number order. The following command does the trick:

% sort +2 -3 +0 -2 +3n phonelist

The +3n option says "do a numeric sort on the fourth field." If you're worried about initial blanks (perhaps some of the phone numbers have area codes), use +3nb.

Second, you can specify individual columns within any field for sorting, using the notation +n.c, where n is a field number, and c is a character position within the field. Likewise, the notation -n.c says "stop sorting at the character before character c." If you're counting characters, be sure to use the -b (ignore whitespace) option -- otherwise, it will be very difficult to figure out what character you're counting.

-- ML



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.