I’m parsing a log file to identify and retrieve information about failures. Regular Expressions seem to be the right way to go about this.
Here’s my initial pattern: \d{4}-\d{2}-\d{2} \d{2}.*
This works for well for single lines like this:
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|StackLine:0:0
This doesn’t work for information that spans multiple lines.
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |StackLine:0:0
Here is what a couple of lines in the log look like:
2011-02-06 02:17:54.9886|FATAL|ClassName|Failure data|5th StackLine:0:0
4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
2011-02-06 02:19:04.4087|FATAL|ClassName|Message
Failure data
Additional message |7th StackLine:0:0
6th StackLine:0:0
5th StackLine:0:0
4th StackLine:0:0
3rd StackLine:0:0
2nd StackLine:0:0
1st StackLine:0:0
The phrase “StackLine” represents a method signature in the dumped call stack. For example, here two different “StackLine” examples:
ExecuteCodeWithGuaranteedCleanup at offset 0 in file:line:column <filename unknown>:0:0
and
OnXmlMsgReceived at offset 128 in file:line:column d:\buildserver\source\svnroot\DepotManager\trunk\src\DepotManager.Core\Gating\AutoGate\Wherenet\Zla\EventSink.cs:115:17
In an ideal world, I would just get the line, starting at the time stamp through that first line:character notation (which is frequently 0:0).
How would I go about creating a pattern that would match both?
This will match a line starting with a date and all lines following it that do not start with a date.
Here is a Rubular example:
http://www.rubular.com/r/1BIoLZ5tfs
edit 2: If you want to stop at the first
:0:0you can use the following regex as long as you have a multi-line option enabled so that the.character will also match newlines:And here is a new Rubular: http://www.rubular.com/r/rfR1wqDHR8