Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6837345
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T23:27:57+00:00 2026-05-26T23:27:57+00:00

I need help in parsing a very long text file which looks like: NAME

  • 0

I need help in parsing a very long text file which looks like:

NAME         IMP4   
DESCRIPTION  small nucleolar ribonucleoprotein 
CLASS        Genetic Information Processing
             Translation
             Ribosome biogenesis in eukaryotes
DBLINKS      NCBI-GI: 15529982
             NCBI-GeneID: 92856
             OMIM: 612981
///
NAME         COMMD9
DESCRIPTION  COMM domain containing 9
ORGANISM     H.sapiens
DBLINKS      NCBI-GI: 156416007
             NCBI-GeneID: 29099
             OMIM: 612299
///
.....

I want to obtain a structured csv file, with the same number of columns in every row, in order to extract easily the information I need.

First I tried in this way:

for line in a:
    if '///' not in line:
        b.write(''.join(line.replace('\n', '\t')))
    else:
    b.write('\n')

obtaining a csv like this:

NAME         IMP4\tDESCRIPTION  small nucleolar ribonucleoprotein\tCLASS        Genetic Information Processing\t             Translation\t             Ribosome biogenesis in eukaryotes\tDBLINKS      NCBI-GI: 15529982\t            NCBI-GeneID: 92856\t
         OMIM: 612981
NAME         COMMD9\tDESCRIPTION  COMM domain containing 9\tORGANISM     H.sapiens\tDBLINKS      NCBI-GI: 156416007\t             NCBI-GeneID: 29099t\             OMIM: 612299

The main problem is given by the fact that fields like DBLINKS, that in the original file are in multiple lines, in this way result split in several fields, while I need to have it all in one.
Moreover, not all the fields are present in every line, for instance the fields ‘CLASS’ and ‘ORGANISM’ in the example.

The file I’d like to obtain should look like:

NAME         IMP4\tDESCRIPTION  small nucleolar ribonucleoprotein\tNA\tCLASS        Genetic Information Processing; Translation; Ribosome biogenesis in eukaryotes\tDBLINKS      NCBI-GI: 15529982; NCBI-GeneID: 92856; OMIM: 612981
NAME         COMMD9\tDESCRIPTION  COMM domain containing 9\tORGANISM     H.sapiens\tNA\tDBLINKS      NCBI-GI: 156416007; NCBI-GeneID: 29099; OMIM: 612299

Could you please help me?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T23:27:57+00:00Added an answer on May 26, 2026 at 11:27 pm

    You could use itertools.groupby, once to collect lines into records, and a second time to collect multi-line fields into an iterator:

    import csv
    import itertools
    
    def is_end_of_record(line):
        return line.startswith('///')
    
    class FieldClassifier(object):
        def __init__(self):
            self.field=''
        def __call__(self,row):
            if not row[0].isspace():
                self.field=row.split(' ',1)[0]
            return self.field
    
    fields='NAME DESCRIPTION ORGANISM CLASS DBLINKS'.split()
    with open('data','r') as f:
        for end_of_record, lines in itertools.groupby(f,is_end_of_record):
            if not end_of_record:
                classifier=FieldClassifier()
                record={}
                for fieldname, row in itertools.groupby(lines,classifier):
                    record[fieldname]='; '.join(r.strip() for r in row)
                print('\t'.join(record.get(fieldname,'NA') for fieldname in fields))
    

    yields

    NAME         IMP4   DESCRIPTION  small nucleolar ribonucleoprotein  NA  CLASS        Genetic Information Processing; Translation; Ribosome biogenesis in eukaryotes DBLINKS      NCBI-GI: 15529982; NCBI-GeneID: 92856; OMIM: 612981
    NAME         COMMD9 DESCRIPTION  COMM domain containing 9   ORGANISM     H.sapiens  NA  DBLINKS      NCBI-GI: 156416007; NCBI-GeneID: 29099; OMIM: 612299
    

    Above is the output as you would see it printed. It matches the desired output you posted, assuming you are showing the repr of that output.


    References to tools used:

    • itertools.groupby
    • a class with a __call__ method
    • str.join with a generator expression for which it helps to
      first understand list comprehension
    • dict.get method with a default value
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need help parsing out some text from a page with lxml. I tried
I need help with parsing this object: var user = { 10000068485: {id:10000068485,name:Jenan}, 10000099257:
I need help understanding some C++ operator overload statements. The class is declared like
I have a very basic question and need help. I am trying to understand
i have a log file which contains hundreds/thousands of seperate XML messages and need
Need help parsing... In my code, I have a method that returns url.getHost();. But
I need some help improving this function I made for parsing the links in
need help to create regular expression matching string www.*.abc.*/somestring Here * is wild card
Need help writing a script downloads data from google insight using c# this is
I need help on this following aspx code aspx Code: <asp:Label ID =lblName runat

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.