Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3232862
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T17:12:37+00:00 2026-05-17T17:12:37+00:00

I have some data (text files) that is formatted in the most uneven manner

  • 0

I have some data (text files) that is formatted in the most uneven manner one could think of. I am trying to minimize the amount of manual work on parsing this data.

Sample Data :

Name        Degree      CLASS       CODE        EDU     Scores
--------------------------------------------------------------------------------------
John Marshall       CSC   78659944   89989        BE   900
Think Code DB I10   MSC  87782  1231  MS            878
Mary 200 Jones    CIVIL      98993483  32985        BE       898
John G. S  Mech 7653 54 MS 65
Silent Ghost  Python Ninja 788505  88448  MS Comp  887

Conditions :

  • More than one spaces should be compressed to a delimiter (pipe better? End goal is to store these files in the database).
  • Except for the first column, the other columns won’t have any spaces in them, so all those spaces can be compressed to a pipe.
  • Only the first column can have multiple words with spaces (Mary K Jones). The rest of the columns are mostly numbers and some alphabets.
  • First and second columns are both strings. They almost always have more than one spaces between them, so that is how we can differentiate between the 2 columns. (If there is a single space, that is a risk I am willing to take given the horrible formatting!).
  • The number of columns varies, so we don’t have to worry about column names. All we want is to extract each column’s data.

Hope I made sense! I have a feeling that this task can be done in a oneliner. I don’t want to loop, loop, loop 🙁

Muchos gracias “Pythonistas” for reading all the way and not quitting before this sentence!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T17:12:38+00:00Added an answer on May 17, 2026 at 5:12 pm

    It still seems tome that there’s some format in your files:

    >>> regex = r'^(.+)\b\s{2,}\b(.+)\s+(\d+)\s+(\d+)\s+(.+)\s+(\d+)'
    >>> for line in s.splitlines():
        lst = [i.strip() for j in re.findall(regex, line) for i in j if j]
        print(lst)
    
    
    []
    []
    ['John Marshall', 'CSC', '78659944', '89989', 'BE', '900']
    ['Think Code DB I10', 'MSC', '87782', '1231', 'MS', '878']
    ['Mary 200 Jones', 'CIVIL', '98993483', '32985', 'BE', '898']
    ['John G. S', 'Mech', '7653', '54', 'MS', '65']
    ['Silent Ghost', 'Python Ninja', '788505', '88448', 'MS Comp', '887']
    

    Regex is quite straightforward, the only things you need to pay attention to are the delimiters (\s) and the word breaks (\b) in case of the first delimiter. Note that when the line wouldn’t match you get an empty list as lst. That would be a read flag to bring up the user interaction described below. Also you could skip the header lines by doing:

    >>> file = open(fname)
    >>> [next(file) for _ in range(2)]
    >>> for line in file:
        ...  # here empty lst indicates issues with regex
    

    Previous variants:

    >>> import re
    >>> for line in open(fname):
        lst = re.split(r'\s{2,}', line)
        l = len(lst)
        if l in (2,3):
            lst[l-1:] = lst[l-1].split()
        print(lst)
    
    ['Name', 'Degree', 'CLASS', 'CODE', 'EDU', 'Scores']
    ['--------------------------------------------------------------------------------------']
    ['John Marshall', 'CSC', '78659944', '89989', 'BE', '900']
    ['Think Code DB I10', 'MSC', '87782', '1231', 'MS', '878']
    ['Mary 200 Jones', 'CIVIL', '98993483', '32985', 'BE', '898']
    ['John G. S', 'Mech', '7653', '54', 'MS', '65']
    

    another thing to do is simply allow user to decide what to do with questionable entries:

    if l < 3:
        lst = line.split()
        print(lst)
        iname = input('enter indexes that for elements of name: ')     # use raw_input in py2k
        idegr = input('enter indexes that for elements of degree: ')
    

    Uhm, I was all the time under the impression that the second element might contain spaces, since it’s not the case you could just do:

    >>> for line in open(fname):
        name, _, rest = line.partition('  ')
        lst = [name] + rest.split()
        print(lst)
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a text file that is formatted like JSON, but in a print/view
I have a number of text files. Each text files have data like this
I am working on a program that loads and saves data from text files,
I have several csv files that I need to transfer over to mdb format.
I inherited a data-storage which was using simple text-files to save documents. Documents had
I'm looking for some guidance/code-snippets/tutorial on passing some simple data into my app. I
I have a SQL Server 2008 procedure that sends email via sp_send_dbmail. I'm using
I have a Python script that does something along the line of: def MyScript(input_filename1,
Suppose I have a MySQL database named DB, and a table named ContactInfo. The
using xcode I would like to achieve a public shared class that stores its

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.