Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6581563
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 25, 20262026-05-25T16:10:08+00:00 2026-05-25T16:10:08+00:00

Python 2.4.3 I need to read through some files (can be as large as

  • 0

Python 2.4.3

I need to read through some files (can be as large as 10GB). What I need it to do is go through the file until it matches a pattern. Then print that line and every line after it until it matches another pattern. At that point, resume reading through the file until the next pattern match.

For example. file contains.

---- Alpha ---- Zeta
...(text lines)

---- Bravo ---- Delta
...(text lines)

etc.

If matching on —- Alpha —- Zeta, it should print —- Alpha —- Zeta and every line after that until it encounters —- Bravo —- Delta (or whatever other than —- Alpha —- Zeta), which it will read right on by it until it matches —- Alpha —- Zeta again.

The following matches what i’m looking for – but only prints the matching line – and not the text that follows it.

Any idea where I’m going wrong on it?

import re
fh = open('text.txt', 'r')

re1='(-)'   # Any Single Character 1
re2='(-)'   # Any Single Character 2
re3='(-)'   # Any Single Character 3
re4='(-)'   # Any Single Character 4
re5='( )'   # White Space 1
re6='(Alpha)'  # Word 1
re6a='((?:[a-z][a-z]+))'   # Word 1 alternate
re7='( )'   # White Space 2
re8='(-)'   # Any Single Character 5
re9='(-)'   # Any Single Character 6
re10='(-)'  # Any Single Character 7
re11='(-)'  # Any Single Character 8
re12='(\\s+)'  # White Space 3
re13='(Zeta)'  # Word 2
re13a='((?:[a-z][a-z]+))'  # Word 2 alternate


rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10+re11+re12+re13,re.IGNORECASE|re.DOTALL)
rga =     re.compile(re1+re2+re3+re4+re5+re6a+re7+re8+re9+re10+re11+re12+re13a,re.IGNORECASE|re.DOTALL)


for line in fh:
    if re.match(rg, line):
        print line
        fh.next()
        while not re.match(rga, line):
            print fh.next()

fh.close()

and my example text file.

---- Pappa ---- Oscar
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Mauris eleifend imperdiet 
lacus quis imperdiet. Nulla erat neque, laoreet vel fermentum a, dapibus in sem. 
Maecenas elementum nisi nec neque pellentesque ac rutrum urna cursus. Nam non purus 
sit amet dolor fringilla venenatis. Integer augue neque, scelerisque ac dictum at, 
venenatis elementum libero. Etiam nec ante in augue porttitor laoreet. Aenean ultrices
pellentesque erat, id porta nulla vehicula id. Cras eu ante nec diam dapibus hendrerit
in ac diam. Vivamus velit erat, tincidunt id tempus vitae, tempor vel leo. Donec 
aliquam nibh mi, non dignissim justo.

---- Alpha ---- Zeta
Sed molestie tincidunt euismod. Morbi ultrices diam a nibh varius congue. Nulla velit
erat, luctus ac ornare vitae, pharetra quis felis. Sed diam orci, accumsan eget 
commodo eu, posuere sed mi. Phasellus non leo erat. Mauris turpis ipsum, mollis sed 
ismod nec, aliquam non quam. Vestibulum sem eros, euismod ut pharetra sit amet, 
dignissim eget leo.

---- Charley ---- Oscar
Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. 
Aliquam commodo, metus at vulputate hendrerit, dui justo tempor dui, at posuere    
ante vitae lorem. Fusce rutrum nibh a erat condimentum laoreet. Nullam eu hendrerit 
sapien. Suspendisse id lobortis urna. Maecenas ut suscipit nisi. Proin et metus at 
urna euismod sollicitudin eu at mi. Aliquam ac egestas magna. Quisque ac vestibulum 
lectus. Duis ac libero magna, et volutpat odio. Cras mollis tincidunt nibh vel rutrum.
Curabitur fringilla, ante eget scelerisque rhoncus, libero nisl porta leo, ac
vulputate mi erat vitae felis. Praesent auctor fringilla rutrum. Aenean sapien ligula,
imperdiet sodales ullamcorper ut, vulputate at enim.


---- Bravo ---- Delta
Donec cursus tincidunt pellentesque. Maecenas neque nisi, dignissim ac aliquet ac,
vestibulum ut tortor. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Aenean ullamcorper dapibus accumsan. Aenean eros
tortor, ultrices at adipiscing sed, lobortis nec dolor. Fusce eros ligula, posuere
quis porta nec, rhoncus et leo. Curabitur turpis nunc, accumsan posuere pulvinar eget,
sollicitudin eget ipsum. Sed a nibh ac est porta sollicitudin. Pellentesque ut urna ut 
risus pharetra mollis tincidunt sit amet sapien. Sed semper sollicitudin eros quis 
pellentesque. Curabitur ac metus lorem, ac malesuada ipsum. Nulla turpis erat, congue 
eu gravida nec, egestas id nisi. Praesent tellus ligula, pretium vitae ullamcorper 
vitae, gravida eu ipsum. Cras sed erat ligula.


---- Alpha ---- Zeta
Cras id condimentum lectus. Sed sit amet odio eros, ut mollis sapien. Etiam varius 
tincidunt quam nec mattis. Nunc eu varius magna. Maecenas id ante nisl. Cras sed augue 
ipsum, non mollis velit. Fusce eu urna id justo sagittis laoreet non id urna. Nullam 
venenatis tincidunt gravida. Proin mattis est sit amet dolor malesuada sagittis. 
Curabitur in lacus rhoncus mi posuere ullamcorper. Phasellus eget odio libero, ut 
lacinia orci. Pellentesque iaculis, ligula at varius vulputate, arcu leo dignissim 
massa, non adipiscing lectus magna nec dolor. Quisque in libero nec orci vestibulum 
dapibus. Nulla turpis massa, varius quis gravida eu, bibendum et nisl. Fusce tincidunt 
laoreet elit, sed egestas diam pharetra eget. Maecenas lacus velit, egestas nec tempor 
eget, hendrerit et massa.

+++++++++++++++++++++ Update ++++++++++++++++++++++++++++++++

The following code does work – it matches on the header type row – prints that and every line after it until the next header type pattern – which is that doesn’t match, skips until the next header type pattern.

Only problem is – it’s really really butt slow. It takes about a minute to do through 10m lines.

re1='(-)'   # Any Single Character 1
re2='(-)'   # Any Single Character 2
re3='(-)'   # Any Single Character 3
re4='(-)'   # Any Single Character 4
re5='( )'   # White Space 1
re6='(Alpha)'  # Word 1
re6a='((?:[a-z][a-z]+))'   # Word 1 alternate
re7='( )'   # White Space 2
re8='(-)'   # Any Single Character 5
re9='(-)'   # Any Single Character 6
re10='(-)'  # Any Single Character 7
re11='(-)'  # Any Single Character 8
re12='(\\s+)'  # White Space 3
re13='(Zeta)'  # Word 2
re13a='((?:[a-z][a-z]+))'  # Word 2 alternate


rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8+re9+re10+re11+re12+re13,re.IGNORECASE|re.DOTALL)
rga = re.compile(re1+re2+re3+re4+re5+re6a+re7+re8+re9+re10+re11+re12+re13a,re.IGNORECASE|re.DOTALL)



linestop = 0
fh = open('test.txt', 'r')

for line in fh:
    if linestop == 0:
        if re.match(rg, line):
            print line
            linestop = 1
    else:
        if re.match(rga, line):
            linestop = 0
        else:
            print line

fh.close()

+++++++++ If I add a grep part to it first, i’m thinking that’ll speed things up tremendously. i.e. grep out – then run the above regex script.

I got os.system to work good – I can’t see how to pass a regex match via pOpen

**** Final Update **********

I’m calling this completed. What I ended up doing was:

  • Grep through the file using os.system – and writing the results out.
  • reading the file in and using the re.match I have for above – printing out only the necessary items.

net result was it went from taking about 65 seconds to read through a 10 million line file – printing out the necessary items – to about 3.5 seconds. I wish I could have figured out how to pass grep other than os.system – but maybe it’s just not well implimented in python 2.4

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-25T16:10:09+00:00Added an answer on May 25, 2026 at 4:10 pm

    You’re still matching against line, which doesn’t change because you’re still in the same iteration of the for loop.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to read through a log file, extracting all paths, and return a
I need a Python library that supports PEM files and both RSA signing and
I need to import some data from a excel file and a folder with
I have some data that I would like to gzip, uuencode and then print
A python script need to spawn multiple sub-processes via fork(). All of those child
In Python I need to efficiently and generically test whether an attribute of a
Using Python I need to insert a newline character into a string every 64
Using Python I need to delete all characters in a multiline string up to
I'm currently writing an app in Python and need to provide localization for it.
I am developing a GPL-licensed application in Python and need to know if the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.