Saturday, March 01, 2008

Tips for unix sort command

You might have used the sort command many times in the past for simple sort requirements.
But did you know that one of the most frustrating and tough to find out problems in sort command is that the option -n has been missed. This option is used to specify that the sorting has to be done numerically - in other words, the field that is being used for sorting will be considered to be a numeric field.
Another commonly used option of sort is -k. This option is used to specify the field based on which the sorting needs to be done. For example,
Recently, I used sort for one of my scripts and was irritated that I did not know the way to do nested sort (sort first by key1 and then by key2...). What was more irritating was that even the life saving man page did not indicate any option for this.
After some searching in the net, I found some reference to the -k option being allowed more than once for the same sort command. And that saved my day:
sort -k1,1 -k3,3 filename
sorts the fields in filenamefirst by field 1 (because of 1,1) and in case of conflicts between those fields, by field 3 (3,3) and redirects the output to stdout.
This can be combined with the -n option to sort numerically - considering that the field is numeric. For example,
sort -k1,1n -k3,3 filename
The above command sorts the first field numerically and the second field in consideration (actually the third field in the file) alphabetically and redirects the output to stdout.
There is a caveat though. The following command is different from the one above:
sort -n -k1,1 -k3,3 filename
considers that all the fields that need to be sorted will be sorted numerically. In other words, the sorting will happen on field 1 numerically and then on field 3 numerically.