I have a large file which I want to format in a certain manner.

Question

0

Asked: June 7, 20262026-06-07T18:15:29+00:00 2026-06-07T18:15:29+00:00

I have a large file which I want to format in a certain manner.

0

I have a large file which I want to format in a certain manner. File input example:

DVL1    03220   NP_004412.2 VANGL2  02758   Q9ULK5  in vitro    12490194
PAX3    09421   NP_852124.1 MEOX2   02760   NP_005915.2 in vitro;yeast 2-hybrid 11423130
VANGL2  02758   Q9ULK5  MAGI3   11290   NP_001136254.1  in vitro;in vivo    15195140

And this is how I want it to become:

DVL1    03220   NP_004412   VANGL2  02758   Q9ULK5
PAX3    09421   NP_852124   MEOX2   02760   NP_005915
VANGL2  02758   Q9ULK5  MAGI3   11290   NP_001136254

To summarize:

if the line has 1 dot, that dot is deleted along with the number after it and a \t is added, so the output line will only have 6 tab-separated values
if the line has 2 dots, those dots are deleted along with the numbers after them and a \t is added, so the output line will only have 6 tab-separated values
if the line has no dots, maintain the first 6 tab-separated values

My idea is currently something like this:

for line in infile:
    if "." in line: # thought about this and a line.count('.') might be better, just wasn't capable to make it work
        transformed_line = line.replace('.', '\t', 2) # only replaces the dot; want to replace dot plus next first character
        columns = transformed_line.split('\t')
        outfile.write('\t'.join(columns[:8]) + '\n') # if i had a way to know the position of the dot(s), i could join only the desired columns
    else:
        columns = line.split('\t')
        outfile.write('\t'.join(columns[:5]) + '\n') # this is fine

Hope I explained myself ok.
Thanks for you guys effort.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T18:15:30+00:00

you can try something like this:

    with open('data1.txt') as f:
        for line in f:
            line=line.split()[:6]
            line=map(lambda x:x[:x.index('.')] if '.' in x else x,line)  #if an element has '.' then
                                                                         #remove that dot else keep the element as it is
            print('\t'.join(line))

output:

DVL1    03220   NP_004412   VANGL2  02758   Q9ULK5
PAX3    09421   NP_852124   MEOX2   02760   NP_005915
VANGL2  02758   Q9ULK5  MAGI3   11290   NP_001136254

Edit:

as @mgilson suggested the line line=map(lambda x:x[:x.index('.')] if '.' in x else x,line) can be replaced by simply line=map(lambda x:x.split('.')[0],line)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large file which I want to format in a certain manner.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply