I’m currently parsing a log file that has the following structure: 1) timestamp, preceded

Question

0

Editorial Team

Asked: June 11, 20262026-06-11T14:45:16+00:00 2026-06-11T14:45:16+00:00

I’m currently parsing a log file that has the following structure: 1) timestamp, preceded

0

I’m currently parsing a log file that has the following structure:

1) timestamp, preceded by # character and followed by \n

2) arbitrary # of events that happened after that timestamp and all followed by \n

3) repeat..

Here is an exmaple:

#100
04!
03!
02!
#1299
0L
0K
0J
0E
#1335
06!
0X#
0[#
b1010 Z$
b1x [$
...

Please forgive the seemingly cryptic values, they are encodings representing certain “events”.

Note: Event encodings may also use the # character.

What I am trying to do is to count the number of events that happen at a certain time.

In other words, at time 100, 3 events happened.

I am trying to match all text between two timestamps – and count the number of events by simply counting the number of newlines enclosed in the matched text.

I’m using Python’s regex engine, and I’m using the following expression:

pattern = re.compile('(#[0-9]{2,}.*)(?!#[0-9]+)')

Note: The {2,} is because I want timestamps with at least two digits.

I match a timestamp, continue matching any other characters until hitting another timestamp – ending the matching.

What this returns is:

#100
#1299
#1335

So, I get the timestamps – but none of the events data – what I really care about!

I’m thinking the reason for this is that the negative-lookbehind is “greedy” – but I’m not completely sure.

There may be an entirely different regex that makes this much simpler – open to any suggestions!

Any help is much appreciated!

-k

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T14:45:17+00:00

If you insist on a regex-based solution, I propose this:

>>> pat = re.compile(r'(^#[0-9]{2,})\s*\n((?:[^#].*\n)*)', re.MULTILINE)
>>> for t, e in pat.findall(s):
...     print t, e.count('\n')
...
#100 3
#1299 4
#1335 6

Explanation:

(              
  ^            anchor to start of line in multiline mode
  #[0-9]{2,}   line starting with # followed by numbers
)
\s*            skip whitespace just in case (eg. Windows line separator)
\n             new line
(
  (?:          repeat non-capturing group inside capturing group to capture 
               all repetitions
    [^#].*\n   line not starting with #
  )*
)

You seemed to have misunderstood what negative lookahead does. When it follows .*, the regex engine first tries to consume as many characters as possible and only then checks the lookahead pattern. If the lookahead does not match, it will backtrack character by character until it does.

You could, however, use positive lookahead together with the non-greedy .*?. Here the .*? will consume characters until the lookahead sees either a # at start of a line, or the end of the whole string:

re.compile(r'(^#[0-9]{2,})\s*\n(.*?)(?=^#|\Z)', re.DOTALL | re.MULTILINE)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m currently parsing a log file that has the following structure: 1) timestamp, preceded

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply