I need to parse a Weblogic log file with ANTLR. Here is the example:
Tue Aug 28 09:39:09 MSD 2012 [test] [[ACTIVE] ExecuteThread: '0' for queue: 'weblogic.kernel.Default (self-tuning)'] Alert - There is no user password credential mapper provider configured in your security realm. Oracle Service Bus service account management will be disabled. Configure a user password credential mapper provider if you need OSB service account support.
Sun Sep 02 23:13:00 MSD 2012 [test] [[ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'] Warning - Timer (Checkpoint) has been triggered with a tick (205 873) that is less than or equal to the last tick that was received (205 873). This could happen in a cluster due to clock synchronization with the timer authority. The current trigger will be ignored, and operation will be skipped.
Mon Sep 03 10:35:54 MSD 2012 [test] [[ACTIVE] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'] Info -
[OSB Tracing] Inbound request was received.
Service Ref = Some/URL
URI = Another/URL
Message ID = u-u-i-d
Request metadata =
<xml-fragment>
<tran:headers xsi:type="http:HttpRequestHeaders" xmlns:http="http://www.bea.com/wli/sb/transports/http" xmlns:tran="http://www.bea.com/wli/sb/transports" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<http:Accept-Encoding>gzip, deflate,gzip, deflate</http:Accept-Encoding>
<http:Connection>Keep-Alive</http:Connection>
<http:Content-Length>666</http:Content-Length>
<http:Content-Type>text/xml; charset=utf-8</http:Content-Type>
<http:Host>some.host.name</http:Host>
<http:SOAPAction>""</http:SOAPAction>
</tran:headers>
<tran:encoding xmlns:tran="http://www.bea.com/wli/sb/transports">utf-8</tran:encoding>
<http:client-host xmlns:http="http://www.bea.com/wli/sb/transports/http">1.2.3.4</http:client-host>
<http:client-address xmlns:http="http://www.bea.com/wli/sb/transports/http">1.2.3.4</http:client-address>
<http:http-method xmlns:http="http://www.bea.com/wli/sb/transports/http">POST</http:http-method>
</xml-fragment>
Payload =
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><XMLHere/></s:Envelope>
I am interested in this part of a log, everything else must be ignored (Date, Service Ref value and Envelope XML should be parsed):
Sun Sep 02 23:13:00 MSD 2012 [test] [[ACTIVE] ExecuteThread: '5' for queue: 'weblogic.kernel.Default (self-tuning)'] Warning - Timer (Checkpoint) has been triggered with a tick (205 873) that is less than or equal to the last tick that was received (205 873). This could happen in a cluster due to clock synchronization with the timer authority. The current trigger will be ignored, and operation will be skipped.
Mon Sep 03 10:35:54 MSD 2012 [test] [[ACTIVE] ExecuteThread: '19' for queue: 'weblogic.kernel.Default (self-tuning)'] Info -
[OSB Tracing] Inbound request was received.
Service Ref = Some/URL
URI = Another/URL
Message ID = u-u-i-d
Request metadata =
<xml-fragment>
<tran:headers xsi:type="http:HttpRequestHeaders" xmlns:http="http://www.bea.com/wli/sb/transports/http" xmlns:tran="http://www.bea.com/wli/sb/transports" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<http:Accept-Encoding>gzip, deflate,gzip, deflate</http:Accept-Encoding>
<http:Connection>Keep-Alive</http:Connection>
<http:Content-Length>666</http:Content-Length>
<http:Content-Type>text/xml; charset=utf-8</http:Content-Type>
<http:Host>some.host.name</http:Host>
<http:SOAPAction>""</http:SOAPAction>
</tran:headers>
<tran:encoding xmlns:tran="http://www.bea.com/wli/sb/transports">utf-8</tran:encoding>
<http:client-host xmlns:http="http://www.bea.com/wli/sb/transports/http">1.2.3.4</http:client-host>
<http:client-address xmlns:http="http://www.bea.com/wli/sb/transports/http">1.2.3.4</http:client-address>
<http:http-method xmlns:http="http://www.bea.com/wli/sb/transports/http">POST</http:http-method>
</xml-fragment>
Payload =
<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"><XMLHere/></s:Envelope>
Here is my lexer:
lexer grammar LogLexer;
options {filter=true;}
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
LOGDATE : DAY ' ' MONTH ' ' NUMDAY ' ' NUMTIME ' ' TIMEZONE ' ' NUMYEAR;
METAINFO : '[' .* ']' ' [[' .* ']' .* ']' .* '-' .* '[OSB Tracing] Inbound request was received.';
SERVICE_REF : 'Service Ref = ';
URI : (SYMBOL | '/')+;
ENVELOPE_TAG : '<' ENVELOPE_TAGNAME .* '>' .* '</' ENVELOPE_TAGNAME '>';
fragment
ENVELOPE_TAGNAME : SYMBOL+ ':Envelope';
fragment
NUMTIME : NUM NUM ':' NUM NUM ':' NUM NUM;
fragment
TIMEZONE : SYMBOL SYMBOL SYMBOL;
fragment
DAY : 'Sun' | 'Mon' | 'Tue' | 'Wed' | 'Fri' | 'Sat';
fragment
MONTH : 'Sep' | 'Oct' | 'Nov' | 'Dec' | 'Feb' | 'Mar' | 'May' | 'Apr' | 'Jun' | 'Jul' | 'Aug';
fragment
NUMYEAR : NUM NUM NUM NUM;
fragment
NUMDAY : NUM NUM;
fragment
NUM : '0'..'9';
fragment
SYMBOL : ('a'..'z' | 'A'..'Z');
And here is the parser (not finished yet):
grammar LogParser;
options {
tokenVocab = OSBLogLexer;
}
@header {
import java.util.List;
import java.util.ArrayList;
}
parse
returns [List<List<String>> entries]
@init {
$entries = new ArrayList<List<String>>();
}
: requestLogEntry+
{
$entries.add($requestLogEntry.logEntry);
};
requestLogEntry
returns [List<String> logEntry]
@init {
$logEntry = new ArrayList<String>();
}
: LOGDATE METAINFO .* serviceRef .* ENVELOPE_TAG
{
$logEntry.add($LOGDATE.getText());
$logEntry.add($serviceRef.serviceURI);
$logEntry.add($ENVELOPE_TAG.getText());
};
serviceRef
returns [String serviceURI]
: SERVICE_REF URI
{
$serviceURI = $URI.getText();
};
The problem is that it parses log incorrectly. My code does not ignore unwanted records, so I get invalid DATE value in resulting list: Tue Aug 28 09:39:09 MSD 2012 (the first one in the example) instead of Mon Sep 03 10:35:54 MSD 2012 (the correct one). Could anyone help me?
Thanks in advance for your answers.
UPDATE
I have updated my code, but I get generation errors. Can’t see what is wrong.
Updated lexer:
lexer grammar LogLexer;
options {
filter=true;
}
TRASH : LOGDATE ' ' METAINFO (' ' | '\n')* { skip(); };
LOGDATE : DAY ' ' MONTH ' ' NUMDAY ' ' NUMTIME ' ' TIMEZONE ' ' NUMYEAR;
METAINFO : ('[' | ']' | SYMBOL | NUM | ' ' | SPECIAL)+;
OSB_METAINFO : (' ' | '\n')* '[OSB Tracing] Inbound request was received.';
SERVICE_REF : 'Service Ref = ';
URI : (SYMBOL | '/')+;
ENVELOPE_TAG : '<' ENVELOPE_TAGNAME .* '>' .* '</' ENVELOPE_TAGNAME '>';
fragment
OSB_TRACING : '[OSB Tracing] Inbound request was received.';
fragment
ENVELOPE_TAGNAME : SYMBOL+ ':Envelope';
fragment
NUMTIME : NUM NUM ':' NUM NUM ':' NUM NUM;
fragment
TIMEZONE : SYMBOL SYMBOL SYMBOL;
fragment
DAY : 'Sun' | 'Mon' | 'Tue' | 'Wed' | 'Fri' | 'Sat';
fragment
MONTH : 'Sep' | 'Oct' | 'Nov' | 'Dec' | 'Feb' | 'Mar' | 'May' | 'Apr' | 'Jun' | 'Jul' | 'Aug';
fragment
NUMYEAR : NUM NUM NUM NUM;
fragment
NUMDAY : NUM NUM;
fragment
NUM : '0'..'9';
fragment
SYMBOL : ('a'..'z' | 'A'..'Z');
fragment
SPECIAL : ( ~'\n' | '\'' | '.' | '(' | ')' | '-');
Updated parser:
parser grammar LogParser;
options {
tokenVocab = LogLexer;
}
@header {
import java.util.List;
import java.util.ArrayList;
}
parse returns [List<List<String>> entries]
@init {
$entries = new ArrayList<List<String>>();
}
: requestLogEntry+
{
$entries.add($requestLogEntry.logEntry);
};
requestLogEntry
returns [List<String> logEntry]
@init {
$logEntry = new ArrayList<String>();
}
: LOGDATE ' ' METAINFO OSB_METAINFO .* serviceRef .* ENVELOPE_TAG
{
$logEntry.add($LOGDATE.getText());
$logEntry.add($serviceRef.serviceURI);
$logEntry.add($ENVELOPE_TAG.getText());
};
serviceRef
returns [String serviceURI]
: SERVICE_REF URI
{
$serviceURI = $URI.getText();
};
Lexer generation errors:
[14:18:12] error(204): LogLexer.g:56:21: duplicate token type '\'' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:28: duplicate token type '.' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:34: duplicate token type '(' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:40: duplicate token type ')' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:46: duplicate token type '-' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:21: duplicate token type '\'' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:28: duplicate token type '.' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:34: duplicate token type '(' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:40: duplicate token type ')' when collapsing subrule into set
[14:18:12] error(204): LogLexer.g:56:46: duplicate token type '-' when collapsing subrule into set
Those errors seem to happen randomly and randomly dissappear (file rename). Also ANTLR generates another lexer from my parser file (this also happens randomly). I am using last avaliable ANTLR3 and ANTLRWorks on Windows 7 (x64).
I’m not completely sure I’m tracking what is a valid versus an invalid service request, but I’ll plow ahead anyway.
Your parser is looking for
Before it can start parsing, the lexer is looking for that LOGDATE + METAINFO + some stuff + serviceRef.
The lexer doesn’t know that you want it to discard the first two LOGDATEs that don’t have a serviceRef and only consider the third entry that has the serviceRef. Therefore, it will parse the first line as the beginning of a full-fledged entry.
Without giving you the answer and robbing you of the joy that comes from a deep understanding antlr, I would suggest that you have your lexer “understand” more of how both a proper entry is built. The lexer should also understand how an incorrect entry is built.
In other words, how would you rewrite the lexer so that it handles some lexemes and says that the first two lines are “just a date” and the third is the real deal?