My grammar is producing an unexpected result. I am not sure if it is

Question

0

Asked: May 26, 20262026-05-26T12:25:44+00:00 2026-05-26T12:25:44+00:00

My grammar is producing an unexpected result. I am not sure if it is

0

My grammar is producing an unexpected result. I am not sure if it is just my bug or some issues with ANTLR’S ambiguous alternatives processing logic.

Here is my grammar :

    grammar PPMacro;
options {
  language=Java;
  backtrack=true;

}

file: (inputLines)+ EOF;

inputLines 
:  ( preprocessorLineSet  |  oneNormalInputLine )  ; 

oneNormalInputLine  @after{System.out.print("["+$text+"]");}  
: (any_token_except_crlf)* CRLF ;

preprocessorLineSet 
: ifPart endifLine;

ifPart: ifLine  inputLines*   ;
ifLine  @after{System.out.print("{"+$text+"}" );} 
:  '#' IF (any_token_except_crlf)* CRLF ;

endifLine @after{System.out.print("{"+$text+"}" );} 
:  '#' ENDIF (any_token_except_crlf)* CRLF ;

any_token_except_crlf: (ANY_ID | WS | '#'|IF|ENDIF);
// just matches everything

CRLF: '\r'?  '\n'  ;
WS: (' '|'\t'|'\f' )+;
Hash: '#'  ;
IF     : 'if'    ;
ENDIF  : 'endif' ;
ANY_ID: ( 'a'..'z'|'A'..'Z'|'0'..'9'| '_')+ ;

Explanation:

It is for parsing a C++ #if … #endif block

I am trying to recognize nested #if #endif block. This is done by my preprocessorLineSet. It contains a recursive definition to support nested block. oneNormalInputLine is to handle anything not of the #if form. This rule is a match anything rule and actually matches a #if line. But I deliberately put it after the preprocessorLineSet in inputLines. I’m expecting this ordering can prevent it from matching a #if or #endif line. The reason to use a catch-all rule is that I want a rule to accept any other c++ syntax and simply echo them back to the output.

I my test, I just print out everything. Lines matched by preprocessorLineSet should be surrounded by {}, while those matched by oneNormalInputLine should be surrounded by [].

Sample inputs :

#if s
s
#if a
s 
s
#endif
#endif

and this

#if
abc
#endif

The corresponding outputs:

[#if s
][s
][#if a
][s
][s
][#endif
][#endif
]

and this

[#if
][abc
][#endif
]

Problem:

All the output lines including #if #endif are surrounded by [] meaning they are matched ONLY by oneNormalInputLine! But I am not expecting this. The preprocessorLineSet should be able to match the #if lines. Why’d I get this result?

This line contains ambiguous alternatives:

inputLines  :  ( preprocessorLineSet  |  oneNormalInputLine );

since both can match the #if and #endif. But I am expecting the first alternative should be used rather than the later one. Also note that backtracking is on.

EDIT
The reason my oneNormalInputLine rule accepts everything is that it is difficult to express something not having a specific pattern as #if pattern can be rather complicated:

/***

comments

*/   # /***
comments
*/ if

is a valid pattern. Writing a rule not having this pattern seems difficult.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T12:25:45+00:00

Your approach is not really robust – I’d suggest you to keep it simple and use the actual language rule, which says that every line that begins with # is a preprocessor directive, and the one that doesn’t begin with # isn’t. There would be no ambiguity in the grammar using this rule and it would be much simpler to understand.

Now why doesn’t your grammar work? The problem is that your preprocesstoLineSet rule can’t match anything.

preprocessorLineSet 
: ifPart endifLine;

ifPart: ifLine  inputLines*   ;

It starts by #if ..., then should match other lines, and as the first matching #endif comes, it should match it and finish. However, it doesn’t actually do that. inputLines can match pretty much any line (pretty much – it won’t match eg. C++’s operators and other non-identifiers), including all preprocessor directives. That means the ifPart rule will match to the end of input and there would be no endifLine left. Note that backtracking has no effect on this, because once ANTLR matches a rule (in this case ifPart, which will succeed on the whole rest of the input, since * is greedy), it will never backtrack into it. ANTLR’s rules for backtracking are hairy…

Note that if you made oneNormalLine not match preprocessor directives (eg. it would be something like (nonHash any*| ) CRLF, it would start to work.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

My grammar is producing an unexpected result. I am not sure if it is

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply