I need a little guidance in writing a grammar to parse the log file

Question

0

Asked: May 14, 20262026-05-14T21:40:11+00:00 2026-05-14T21:40:11+00:00

I need a little guidance in writing a grammar to parse the log file

0

I need a little guidance in writing a grammar to parse the log file of the game Aion. I’ve decided upon using Antlr3 (because it seems to be a tool that can do the job and I figured it’s good for me to learn to use it). However, I’ve run into problems because the log file is not exactly structured.

The log file I need to parse looks like the one below:

2010.04.27 22:32:22 : You changed the connection status to Online. 
2010.04.27 22:32:22 : You changed the group to the Solo state. 
2010.04.27 22:32:22 : You changed the group to the Solo state. 
2010.04.27 22:32:28 : Legion Message: www.xxxxxxxx.com (forum)



ventrillo: 19x.xxx.xxx.xxx

Port: 3712

Pass: xxxx (blabla) 

 4/27/2010 7:47 PM 
2010.04.27 22:32:28 : You have item(s) left to settle in the sales agency window.

As you can see, most lines start with a timestamp, but there are exceptions. What I’d like to do in Antlr3 is write a parser that uses only the lines starting with the timestamp while silently discarding the others.

This is what I’ve written so far (I’m a beginner with these things so please don’t laugh :D)

grammar Antlr;

options {
  language = Java;
}

logfile: line* EOF;

line : dataline | textline;

dataline: timestamp WS ':' WS text NL ;
textline: ~DIG text NL;

timestamp: four_dig '.' two_dig '.' two_dig WS two_dig ':' two_dig ':' two_dig ;

four_dig: DIG DIG DIG DIG;
two_dig: DIG DIG;

text: ~NL+;

/* Whitespace */ 
WS: (' ' | '\t')+;

/* New line goes to \r\n or EOF */
NL: '\r'? '\n' ;

/* Digits */
DIG : '0'..'9';

So what I need is an example of how to parse this without generating errors for lines without the timestamp.

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-14T21:40:11+00:00

No one is going to laugh. In fact, you did a pretty good job for a first try. Of course, there’s room for improvement! 🙂

First some remarks: you can only negate single characters. Since your NL rule can possibly consist of two characters, you can’t negate it. Also, when negating from within your parser rule(s), you don’t negate single characters, but you’re negating lexer rules. This may sound a bit confusing so let me clarify with an example. Take the combined (parser & lexer) grammar T:

grammar T;

// parser rule
foo
  :  ~A
  ;

// lexer rules
A
  :  'a'
  ;

B
  :  'b'
  ;

C
  :  'c'
  ;

As you can see, I’m negating the A lexer-rule in the foo parser-rule. The foo rule does now not match any character except the 'a', but it matches any lexer rule except A. In other words, it will only match a 'b' or 'c' character.

Also, you don’t need to put:

options {
  language = Java;
}

in your grammar: the default target is Java (it does not hurt to leave it in there of course).

Now, in your grammar, you can already make a distinction between data– and text-lines in your lexer grammar. Here’s a possible way to do so:

logfile
  :  line+
  ;

line
  :  dataline 
  |  textline
  ;

dataline
  :  DataLine
  ;

textline
  :  TextLine
  ;

DataLine
  :  TwoDigits TwoDigits '.' TwoDigits '.' TwoDigits Space+ TwoDigits ':' TwoDigits ':' TwoDigits Space+ ':' TextLine
  ;

TextLine
  :  ~('\r' | '\n')* (NewLine | EOF)
  ;

fragment
NewLine
  :  '\r'? '\n'
  |  '\r'
  ;

fragment
TwoDigits
  :  '0'..'9' '0'..'9'
  ;

fragment
Space
  :  ' ' 
  |  '\t'
  ;

Note that the fragment part in the lexer rules mean that no tokens are being created from those rules: they are only used in other lexer rules. So the lexer will only create two different type of tokens: DataLine‘s and TextLine‘s.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I need a little guidance in writing a grammar to parse the log file

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply