I’m working on writing a small log scraping program in Python, which processes a

Question

0

Asked: May 23, 20262026-05-23T08:10:57+00:00 2026-05-23T08:10:57+00:00

I’m working on writing a small log scraping program in Python, which processes a

0

I’m working on writing a small log scraping program in Python, which processes a rolling logfile and stores the offsets in the file for lines of interest.

My original solution was running rather quickly on large files but I didn’t have a method for clearing the storage which means that if the program were to continue running, the memory usage would steadily increase until the program consumed all of the memory available to it. My solution was to use collections.deque with maxlen set so that the list would operate as a circular buffer, discarding the oldest loglines as more came in.

While this fixes the memory issue, I’m faced with a major performance loss in calling items from the deque by index. As an example, this code runs much slower than the old equivalent, where self.loglines was not a deque. Is there a way to improve it’s speed, or make a circular buffer where random-access is a constant time operation (instead of, I assume, O(n))?

    def get_logline(self, lineno):
            """Gets the logline at the given line number.

            Arguments:
            lineno - The index of the logline to retrieve

            Returns: A string containing the logline at the given line number
            """

            start = self.loglines[lineno].start
            end = self.loglines[lineno+1].start
            size = end - start
            if self._logfile.closed:
                    self._logfile = open(self.logpath, "r")
            self._logfile.seek(start)
            logline = self._logfile.read(size)
            return logline

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-23T08:10:58+00:00

Editorial Team

2026-05-23T08:10:58+00:00Added an answer on May 23, 2026 at 8:10 am

As with all double-linked lists, random access in a collections.deque is O(n). Consider using a list of bounded lists so that clearing of old entries (del outer[0]) can still proceed in a timely manner even with hundreds of thousands of entries.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m working on writing a small log scraping program in Python, which processes a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply