Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1067935
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T20:13:16+00:00 2026-05-16T20:13:16+00:00

Hey I’ve encountered an issue where my program stops iterating through the file at

  • 0

Hey I’ve encountered an issue where my program stops iterating through the file at the 57802 record for some reason I cannot figure out. I put a heartbeat section in so I would be able to see which line it is on and it helped but now I am stuck as to why it stops here. I thought it was a memory issue but I just ran it on my 6GB memory computer and it still stopped.

Is there a better way to do anything I am doing below?
My goal is to read the file (if you need me to send it to you I can 15MB text log)
find a match based on the regex expression and print the matching line. More to come but that’s as far as I have gotten. I am using python 2.6

Any ideas would help and code comments also! I am a python noob and am still learning.

import sys, os, os.path, operator
import re, time, fileinput

infile = os.path.join("C:\\","Python26","Scripts","stdout.log")

start = time.clock()

filename  = open(infile,"r")

match = re.compile(r'(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}),\d{3} +\w+ +\[([\w.]+)\] ((\w+).?)+:\d+ - (\w+)_SEARCH:(.+)')

count = 0
heartbeat = 0
for line in filename:
    heartbeat = heartbeat + 1
    print heartbeat
    lookup = match.search(line)
    if lookup:
        count = count + 1
        print line
end = time.clock()
elapsed = end-start
print "Finished processing at:",elapsed,"secs. Count of records =",count,"."

filename.close()

This is line 57802 where it fails:

2010-08-06 08:15:15,390 DEBUG [ah_admin] com.thg.struts2.SecurityInterceptor.intercept:46 - Action not SecurityAware; skipping privilege check.

This is a matching line:

2010-08-06 09:27:29,545 INFO  [patrick.phelan] com.thg.sam.actions.marketmaterial.MarketMaterialAction.result:223 - MARKET_MATERIAL_SEARCH:{"_appInfo":{"_appId":21,"_companyDivisionId":42,"_environment":"PRODUCTION"},"_description":"symlin","_createdBy":"","_fieldType":"GEO","_geoIds":["Illinois"],"_brandIds":[2883],"_archived":"ACTIVE","_expired":"UNEXPIRED","_customized":"CUSTOMIZED","_webVisible":"VISIBLE_ONLY"}

Sample data just the first 5 lines:

2010-08-06 00:00:00,035 DEBUG [] com.thg.sam.jobs.PlanFormularyLoadJob.executeInternal:67 - Entered into PlanFormularyLoadJob: executeInternal
2010-08-06 00:00:00,039 DEBUG [] com.thg.ftpComponent.service.JScapeFtpService.open:153 - Opening FTP connection to sdrive/hibbert@tccfp01.hibbertnet.com:21
2010-08-06 00:00:00,040 DEBUG [] com.thg.sam.email.EmailUtils.sendEmail:206 - org.apache.commons.mail.MultiPartEmail@446e79
2010-08-06 00:00:00,045 DEBUG [] com.thg.sam.services.OrderService.getOrdersWithStatus:121 - Orders list size=13
2010-08-06 00:00:00,045 DEBUG [] com.thg.ftpComponent.service.JScapeFtpService.open:153 - Opening FTP connection to sdrive/hibbert@tccfp01.hibbertnet.com:21
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T20:13:17+00:00Added an answer on May 16, 2026 at 8:13 pm

    Your problem is definitely the part @paulrubel pointed out:

    ((\w+).?)+:\d+
    

    Now that you’ve added sample data, it’s obvious that the . is supposed to match a literal dot, which means you should have escaped it (\.). Also, you don’t need the inner set of parentheses, and the outer set should be non-capturing, but it’s the basic structure that’s killing you; there are too many arrangements of word characters and dots it has to try before giving up. The other lines all fail before that part of the regex is attempted, which is why you don’t have any problem with them.

    When I try it in RegexBuddy, your regex matches the good line in 186 steps, and gives up trying on line 57802 after 1,000,000 steps. When I escape the dot, the good line only takes 90 steps to match, but it still times out on line 57802. But now I know that part of the regex can only match word characters and dots. Once it has consumed all of those it can, the next bit has to match :\d+; if it doesn’t, I know there’s no point trying other arrangements. I can use an atomic group to tell it not to bother:

    (?>(?:\w+\.?)+):\d+
    

    With that change, the good line matches in 83 steps, and line 57802 only takes 66 steps to report failure. But it’s not always feasible to use atomic groups, so you should try to make your regex conform to the actual structure of the text it’s matching. In this case you’re matching what looks like a Java class name (some word characters, followed by zero or more instances of (a dot and some more word characters)) followed by a colon and and a line number:

    \w+(?:\.\w+)*:\d+
    

    When I plug that into the regex, it matches the good line in 80 steps, and rejects line 57802 in 67 steps–the atomic group isn’t even needed.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hey all i am trying to search through a CSV file using the following
hey guys having this really simple problem but cant seem to figure out have
Hey guys I have about 8 fieldSets and Im iterating over a list. I
Hey I can't figure out an error I got. I had my App working
Hey everyone I am a newbie and would love some help. I got the
Hey all i have an XML file that i am trying to loop though.
Hey there, I'm writing a python plugin for a geo referencing program called QGIS
Hey right now I'm using jQuery and I have some global variables to hold
Hey having some trouble trying to maintain transparency on a png when i create
Hey I would like a magnifying glass or some image to pop over another

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.