Unix Power ToolsUnix Power ToolsSearch this book

22.5. Alphabetic and Numeric Sorting

sort performs two fundamentally different kinds of sorting operations: alphabetic sorts and numeric sorts. An alphabetic sort is performed according to the traditional "dictionary order," using the ASCII collating sequence. Uppercase letters come before lowercase letters (unless you specify the -f option, which "folds" uppercase and lowercase together), with numerals and punctuation interspersed. The -l (lowercase L) option sorts by the current locale instead of the default US/ASCII order.

This is all fairly trivial and common sense. However, it's worth belaboring the difference, because it's a frequent source of bugs in shell scripts. Say you sort the numbers 1 through 12. A numeric sort gives you these numbers "in order," just like you'd expect. An alphabetic sort gives you:

1
11
12
2
...

Of course, this is how you'd sort the numbers if you applied dictionary rules to the list. Numeric sorts can handle decimal numbers (for example, numbers like 123.44565778); they can't handle floating-point numbers (for example, 1.2344565778E+02). The GNU sort does provide the -g flag for sorting numbers in scientific notation. Unfortunately, it is significantly slower than plain old decimal sorting.

What happens if you include alphabetic characters in a numeric sort? Although the results are predictable, I would prefer to say that they're "undefined." Including alphabetic characters in a numeric sort is a mistake, and there's no guarantee that different versions of sort will handle them the same way. As far as I know, there is no provision for sorting hexadecimal numbers.

One final note: your version of numeric sort may treat initial blanks as significant, sorting numbers with additional spaces before them ahead of numbers without the additional spaces. This is an incredibly stupid misfeature. There is a workaround: use the -b (ignore leading blanks) and always specify a sort field.[67] That is, sort -nb +0 will do what you expect; sort -n won't.

[67]Stupid misfeature number 2: -b doesn't work unless you specify a sort field explicitly, with a +n option.

-- ML



Library Navigation Links

Copyright © 2003 O'Reilly & Associates. All rights reserved.