I’m struggling with regex for splitting logs files into log sequence in order to match pattern inside these sequences.
log format is:
timestamp fieldA fieldB fieldn log message1
timestamp fieldA fieldB fieldn log message2
log message2bis
timestamp fieldA fieldB fieldn log message3
The timestamp regex is known.
I want to extract every log sequence (potentialy multiline) between timestamps. And I want to keep the timestamp.
I want in the same time to keep the exact count of lines.
What I need is how to decorate timestamp pattern to make it split my log file in log sequence. I can not split the whole file as a String, since the file content is provided in a CharBuffer
Here is sample method that will be using this log sequence matcher:
private void matches(File f, CharBuffer cb) {
Matcher sequenceBreak = sequencePattern.matcher(cb); // sequence matcher
int lines = 1;
int sequences = 0;
while (sequenceBreak.find()) {
sequences++;
String sequence = sequenceBreak.group();
if (filter.accept(sequence)) {
System.out.println(f + ":" + lines + ":" + sequence);
}
//count lines
Matcher lineBreak = LINE_PATTERN.matcher(sequence);
while (lineBreak.find()) {
lines++;
}
if (sequenceBreak.end() == cb.limit()) {
break;
}
}
}
It sounds like you want the regex to match the entire log sequence, from the timestamp to the end of the last line, including the line separator. Assuming every log sequence but the last one is followed immediately by another log sequence, you should be able to use a lookahead for a timestamp to find the end of the sequence.
If that’s not fast or accurate enough, this should work better:
Of course, I’m assuming you’ll replace
timestampwith the real timestamp regex. Just out of curiosity, have you considered using Scanner’s findWithinHorizon method for this? Seems to me it could save you a lot of work.