I am trying to sort a large (10mb) file on the first 2 columns. The file is delimited by ASCII 241 (±). The problem is that after sorting the first 2 data fields correctly, unix keeps sorting on the rest of the line, regardless of the inclusion of the -s option.
Command : sort -k1bn -k2n -s -t$'\xF1' -o sorted_file file_to_sort
Sample data: (fairly sorted, so I can highlight the issue)
6033718±2± 0±20817742
6033718±3±20817742
6033718±3±20862761
6033718±3±SRDV408BC
6033718±3±KFT474
6033718±3±941764
6033718±4±20817742
6033718±4±20862761
6033718±4±SRDV408BC
6033718±4±KFT474
6033718±4±941764
6033718±5±21501-0-06 ±D13 * TIMING
6033718±5±17003-0-01 ±VEHICLE OPER
6033718±6±21501-0-06 ±10 ±0±
6033718±6±17003-0-01 ±10 ±0±
6033718±9±I± === Applicable Coverage
6033718±9±I±Volvo D11/13/16 / TIMING
6033718±9±E±check for oil leak, insp
After running the command, I get:
6033718±2± 0±20817742
6033718±3±20817742
6033718±3±20862761
6033718±3±941764
6033718±3±KFT474
6033718±3±SRDV408BC
6033718±4±20817742
6033718±4±20862761
6033718±4±941764
6033718±4±KFT474
6033718±4±SRDV408BC
6033718±5±17003-0-01 ±VEHICLE OPER
6033718±5±21501-0-06 ±D13 * TIMING
6033718±6±17003-0-01 ±10 ±0±
6033718±6±21501-0-06 ±10 ±0±
6033718±9±E±check for oil leak, insp
6033718±9±I± === Applicable Coverage
6033718±9±I±Volvo D11/13/16 / TIMING
As you can see, looking at the ‘3’, ‘4’ & ‘9’ records, they data following the second field has been sorted, even though the manual for sort states that the -s option will prevent sorting of the rest of the line after the keys have been exhausted.
Where am I going wrong here?
BTW, it seems to work fine on a smaller file.
The sorting keys are specified as
-k <start>[,<end>]. If<end>is not specified, the key used is from<start>to the end of the line, which is somewhat unintuitive. You probably want something more like this:Note these keys specify single fields, rather than the default of “all fields starting at … until the end of the line”.