I have a log file that is being written using a defined slf4j XML format. Is there a way to write a script, feeding in the XML format file, and then parsing the messages contained within?
Example output:
2012-10-11 16:53:25.895 [main] {} INFO org.mortbay.log - jetty-6.1.11
2012-10-11 16:53:26.097 [main] {} INFO / - Initializing Spring root WebApplicationContext
I want to create an output file (like a csv) that would separate by the columns based on the definition in the XML:
<encoder>
<pattern>%d{yyyy-MM-dd HH:mm:ss.SSS} [%thread] {%mdc} %-5level %logger{36} - %msg%n</pattern>
</encoder>
Any help/pointers would be GREATLY appreciated!
Thank you!
Sure, there are plenty of ways of reading an XML file in Perl, including XML::Parser and XML::LibXML.
I would start with XML::Parser. XML::LibXML seems to be better in the long run, but I feel way more comfortable with XML::Parser.
EDIT: now that you have edited your question, I see that my response is not adequate. Clearly, getting the pattern (for which you might need the abovementioned XML modules or just a simple regex) will not be a problem. Unfortunately, I don’t know the possible formatting options of the pattern, they seem to be complex.
You want to create a regex based on the pattern that you could then apply to each line.
In this specific case, the regex would look like that:
Since I do know Perl, but do not know the message format, I can only make guesses. I assume that a formatting atom in slf4j follows the pattern
%-?\w+(\{.*?\}|)— that is, a percent sign, optional minus, alphanumeric characters, and then, optionally, some additional formatting options in curly braces.Given that you managed to parse the XML formatting instructions and successfully extract the pattern to the variable
$pattern, you now do the following:Applied to your example pattern, this will produce the following regex:
You can match it against every line of your log file:
This is not perfect yet and will fail; you should recognize the date format, and distinguish between %n and %msg (I assume %n can only contains digits, if not, you have a problem). However, you see where this is going. Hope that helps.