I have a question about sorting data by multiple columns. I’m definitely a beginner

Question

0

Asked: June 17, 20262026-06-17T08:48:10+00:00 2026-06-17T08:48:10+00:00

I have a question about sorting data by multiple columns. I’m definitely a beginner

0

I have a question about sorting data by multiple columns. I’m definitely a beginner at this and am wondering how I can sort by one column and then by another without losing the ordering of the first column. I have a file of tab separated data consisting of three columns. The majority of the data isn’t paired (one id, first column, and position start and end, second and third columns). Occasionally, however, there are multiple entries for the same ID (first column). These need to remain grouped together (without a space separating them from the next entry, unless it has a different ID). The data is really already sorted with respect to the first column, but I need to sort it numerically based on the starting position (second column) while preserving the original sorting. Like this:

Current format:

PITG_00129  606 1436

PITG_00130  1   987

PITG_00132  2   1321

PITG_00133 4464 11708
PITG_00133 1 2946
PITG_00133 4081 4515

Desired format:

PITG_00129  606 1436

PITG_00130  1   987

PITG_00132  2   1321

PITG_00133 1 2946
PITG_00133 4081 4515
PITG_00133 4464 11708

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-17T08:48:11+00:00

You can do this pretty easily in python. First, you need to read your data in a proper format:

def line_to_tuple(line):
    data = line.split()
    return (data[0],int(data[1]),int(data[2]))

This will turn each line into a tuple which will sort lexicographically. Since your strings (the first column) are set up in an easily sorted manner, we don’t need to worry about them. The second and third columns just need to be converted to integers to make them sort properly.

with open(inputfile) as fin, open(outputfile,'w') as fout:
    non_blank_lines = (line for line in fin if line.strip())
    sorted_lines = sorted(non_blank_lines,key=line_to_tuple)
    fout.writelines(sorted_lines)

Here’s another implementation to preserve blank lines between fields:

import itertools
def field1(line):
    data = line.split()
    try:
        return data[0]
    except IndexError:
        return None

def fields(line):
    data = line.split()
    return data[0],int(data[1]),int(data[2])

with open('test.dat') as fin, open('output.dat','w') as fout:
    for k,v in itertools.groupby(fin,key=field1):
        if k is None:
            fout.write('\n')
        else:
            fout.writelines(sorted(v,key=fields))

This uses itertools to chunk up the file based on the empty lines and sorts those groups individually before writing them back out.

Here’s the output:

temp $ cat output.dat 
PITG_00129  606 1436

PITG_00130  1   987

PITG_00132  2   1321

PITG_00133 1 2946
PITG_00133 4081 4515
PITG_00133 4464 11708

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a question about sorting data by multiple columns. I’m definitely a beginner

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply