Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7762777
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T14:29:40+00:00 2026-06-01T14:29:40+00:00

I am new to python and stuck on how to do that. I have

  • 0

I am new to python and stuck on how to do that.
I have a very large text file about 4GB contains error messages .Each message line in the text file represents one message, i need to filter out several columns and replace the space character with |.
Example:

input:
83b14af0-949b-71e0-18d5-0ad781020000 40ba8352-8dd2-71dc-12b8-0ad781020000 1 -1407714483 20 COLG-GRA-617-RD1.oss 1 181895426 12 oss-ap-1.oss 0 0 48 0 0 0 1307845644 1307845647 0 2 12 0 0 0  0 0 12 0 0 0  0 0 1307845918 3 OpC 6 opcecm 9 SNMPTraps 8 IBB_COLG 4 ATM0 0  0  0  69 Cisco Agent Interface Up (linkUp Trap) on interface ATM0 --Sev Normal 372 Generic: 3; Specific: 0; Enterprise: .1.3.6.1.4.1.9.1.569;
output:
83b14af0-949b-71e0-18d5-0ad781020000 | 40ba8352-8dd2-71dc-12b8-0ad781020000 | COLG-GRA-617-RD1.oss | 1307845644 | 1307845647 |1307845918 | Cisco Agent Interface Up (linkUp Trap) on interface ATM0 | Normal 372 | Generic: 3 | Specific: 0 | Enterprise: .1.3.6.1.4.1.9.1.569

Really I appreciate any help

Thank you

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T14:29:41+00:00Added an answer on June 1, 2026 at 2:29 pm

    Your input file format is annoying. We could split the input on white space, but some of the fields you want to capture should contain white space. We could split the input on column numbers, but I am not certain that every string is always the same length; it seems likely that the numbers will vary in number of digits. So the best solution should involve regular expressions.

    A single regular expression to parse this whole line would be pretty mind-numbing to write and to understand. But we can build up the pattern from shorter patterns. I think the result is pretty easy to understand. Also, if the file format changes or the fields you want to capture ever change, I think you can pretty easily change this.

    Note that we use the Python “string repetition” operator, *, to repeat the shorter patterns. If we have 2 words we want to recognize and capture, we can use c*2 to repeat the capture pattern twice.

    In your example of the desired output, you had some extra white space. I wrote the patterns to not capture any white space, but if you actually want the white space you can edit the patterns as you like.

    If you don’t know about regular expressions, you should read the documentation for the Python re module. Briefly, the part of the pattern enclosed in parentheses will be captured, and other parts will match but not be captured. \s matches white space, and \S matches non-white space. + in a pattern means “1 or more” and * means “0 or more”. ^ and $ match beginning and end of the pattern, respectively.

    import re
    
    # Define patterns we want to recognize.
    
    c = r'(\S+)\s+'  # a word we want to capture
    s = r'\S+\s+'  # a word we want to skip
    mesg = r'(\S.*\S)\s+--Sev\s+'  # mesg to capture; terminated by string '--Sev'
    w2 = r'(\S+\s+\S+)\s+'  # two words separated by some white space
    w2semi = r'(\S+\s+\S+)\s*;\s+'  # two words terminated by a semicolon
    tail = r'(.*\S)\s*;'
    
    # Join together the above patterns to make one giant pattern that parses
    # the input.
    s_pat = ( r'^\s*' + 
        c*2 + s*3 + c*1 + s*10 + c*2 + s*14 + c*1 + s*14 +
        mesg + w2 + w2semi*2 + tail +
        r'\s*$')
    
    # Pre-compile the pattern for speed.
    pat = re.compile(s_pat)
    
    # Test string and the expected output result.
    s_input = "83b14af0-949b-71e0-18d5-0ad781020000 40ba8352-8dd2-71dc-12b8-0ad781020000 1 -1407714483 20 COLG-GRA-617-RD1.oss 1 181895426 12 oss-ap-1.oss 0 0 48 0 0 0 1307845644 1307845647 0 2 12 0 0 0  0 0 12 0 0 0  0 0 1307845918 3 OpC 6 opcecm 9 SNMPTraps 8 IBB_COLG 4 ATM0 0  0  0  69 Cisco Agent Interface Up (linkUp Trap) on interface ATM0 --Sev Normal 372 Generic: 3; Specific: 0; Enterprise: .1.3.6.1.4.1.9.1.569;"
    s_correct = "83b14af0-949b-71e0-18d5-0ad781020000|40ba8352-8dd2-71dc-12b8-0ad781020000|COLG-GRA-617-RD1.oss|1307845644|1307845647|1307845918|Cisco Agent Interface Up (linkUp Trap) on interface ATM0|Normal 372|Generic: 3|Specific: 0|Enterprise: .1.3.6.1.4.1.9.1.569"
    
    # re.match() returns a "match group"
    m = re.match(pat, s_input)
    # m.groups() returns sequence of captured strings; join with '|'
    s_output = '|'.join(m.groups())
    
    # sanity check
    if s_correct == s_output:
        print "excellent"
    else:
        print "bogus"
    
    # excellent.
    

    With the pattern written, tested, and debugged, it’s very simple to write the program to actually process the file.

    # use the pattern defined above, named "pat"
    with open(input_file, "r") as f_in, open(output_file, "w") as f_out:
        for line_num, line in enumerate(f_in, 1):
            try:
                m = re.match(pat, line)
                s_output = '|'.join(m.groups())
                f_out.write(s_output + '\n')
            except Exception:
                print("unable to parse line %d: %s" % (line_num, line)
    

    This will read the file one line at a time, process the line, and write the processed line to the output file.

    Note that I’m using multiple with statements on one line. This works with any recent Python but doesn’t work on 2.5 or 3.0.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

New to Python, have a simple, situational question: Trying to use BeautifulSoup to parse
I'm new to python and have hit a problem with an SQL query I'm
I am new to Python and Numpy, and I am facing a problem, that
I am new to C++/Python mixed language programming and do not have much idea
I'm new in Django and Python and I'm stuck! It's complicated to explain but
I'm very new to PHP so far and have been trying to learn it.
I am new to Python programming. I am stuck on what should be a
I'm new to Python and learning through the O'Reilly Learning Python series. I'm stuck
Is there a way to have the parent that spawned a new thread catch
I have a log file that has lines that look like this: 1,2546857-23541,f_last,user,4:19 P.M.,11/02/2009,START,27,27,3,c2546857-23541,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.