Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7615497
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T02:39:28+00:00 2026-05-31T02:39:28+00:00

i perform performance tests for a few java applications. Applications produce very big log

  • 0

i perform performance tests for a few java applications. Applications produce very big log files( it can be 7-10 GB) during the test . I need to trim these log files between specific dates and time. currently, i use python script, which parses log timestamps in datetime python object and print only matched strings. But this solution is very slow. 5 GB log is parsed about 25 minutes
Obviously entries in log file is sequentially and i don’t need to read all file line by line.
I thought about reading file from the start and from the end, until condition is matched and print files between matched number of lines. But i don’t know how can i read file from the backwards, without downloading it to the memory.

Please, can you suggest me any suitibale solution for this case.

here is part of python script:

      lfmt = '%Y-%m-%d %H:%M:%S'
      file = open(filename, 'rU')
      normal_line = ''
      for line in file:
        if line[0] == '[':
          ltimestamp = datetime.strptime(line[1:20], lfmt)

          if ltimestamp >= str and ltimestamp <= end:
            normal_line = 'True'
        else:
          normal_line = ''

      if normal_line:
        print line,
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T02:39:29+00:00Added an answer on May 31, 2026 at 2:39 am

    As the data is sequential if the start and end of the region of interest are near the beginning of the file, then reading from the end of the file (to find the matching end point) is still a bad solution!

    I’ve written some code that will quickly find the start and end points as you require, this approach is called binary search and is similar to the clasic childrens “higher or lower” guessing game!

    The script reads a trial line mid-way between lower_bounds and upper_bounds (initially the SOF and EOF), and checks the match criteria. If the sought line is earlier, then it guesses again by reading a line half-way between the lower_bound and the previous read trial (if its higher then it splits between its guess and the upper bound). So you keep iterating between upper and lower bounds – this yields the fastest possible “on average” solution.

    This should be a real quick solution (log to the base 2 of the number of lines!!). For example in the worst possible case (finding line 999 out of a 1000 lines), using binary search would take just 9 line reads! (from a billion lines would take just 30…)

    Assumptions for the below code:

    • Every line starts with time information.
    • The times are unique – If not, when a match is found you’ll have to check backwards or forwards to include or exclude all entries with matching time, as appropriate (if required).
    • Amusingly this is a recursive function, so the number of lines of your file is limited to 2**1000 (luckily this allows for quite a large file…)

    Further:

    • This could be adapted to read in arbitrary blocks, rather than by line, if preferred. As suggested by J.F. Sebastian.
    • In my original answerI suggested this approach but using linecache.getline, while this is possible its inappropriate for large files as it reads the whole file into memory (thus file.seek() is superior), thanks to TerryE and J.F. Sebastian for pointing that out.

    import datetime

    def match(line):
        lfmt = '%Y-%m-%d %H:%M:%S'
        if line[0] == '[':
            return datetime.datetime.strptime(line[1:20], lfmt)
    
    def retrieve_test_line(position):
        file.seek(position,0)
        file.readline()  # avoids reading partial line, which will mess up match attempt
        new_position = file.tell() # gets start of line position
        return file.readline(), new_position
    
    def check_lower_bound(position):
        file.seek(position,0)
        new_position = file.tell() # gets start of line position
        return file.readline(), new_position
    
    def find_line(target, lower_bound, upper_bound):
        trial = int((lower_bound + upper_bound) /2)
        inspection_text, position = retrieve_test_line(trial)
        if position == upper_bound:
            text, position = check_lower_bound(lower_bound)
            if match(text) == target:
                return position
            return # no match for target within range
        matched_position = match(inspection_text)
        if matched_position == target:
            return position
        elif matched_position < target:
            return find_line(target, position, upper_bound)
        elif matched_position > target:
            return find_line(target, lower_bound, position)
        else:
            return # no match for target within range
    
    lfmt = '%Y-%m-%d %H:%M:%S'
    # start_target =  # first line you are trying to find:
    start_target =  datetime.datetime.strptime("2012-02-01 13:10:00", lfmt)
    # end_target =  # last line you are trying to find:
    end_target =  datetime.datetime.strptime("2012-02-01 13:39:00", lfmt)
    file = open("log_file.txt","r")
    lower_bound = 0
    file.seek(0,2) # find upper bound
    upper_bound = file.tell()
    
    sequence_start = find_line(start_target, lower_bound, upper_bound)
    
    if sequence_start or sequence_start == 0: #allow for starting at zero - corner case
        sequence_end = find_line(end_target, sequence_start, upper_bound)
        if not sequence_end:
            print "start_target match: ", sequence_start
            print "end match is not present in the current file"
    else:
        print "start match is not present in the current file"
    
    if (sequence_start or sequence_start == 0) and sequence_end:
        print "start_target match: ", sequence_start
        print "end_target match: ", sequence_end
        print
        print start_target, 'target'
        file.seek(sequence_start,0)
        print file.readline()
        print end_target, 'target'
        file.seek(sequence_end,0)
        print file.readline()
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I need to perform a few checks (enable or disable a label elsewhere on
I'm preparing for performance tests and creating new test cases. So I was just
I use jmeter to record the requests and then perform a performance test. After
I want to perform performance measurement of a change I want to make to
I need to perform a find and replace using XSLT 1.0 which is really
Following on from my previous question , I'm looking to run some performance tests
I need to know how the performance of different XML tools (parsers, validators, XPath
I am looking for a good library that can perform pairing based cryptography (PBC)
I have been trying to perform some OpenGL ES performance optimizations in an attempt
Just a few questions about reverse proxies and performance. Will there be an effect

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.