Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6747327
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T12:25:44+00:00 2026-05-26T12:25:44+00:00

My grammar is producing an unexpected result. I am not sure if it is

  • 0

My grammar is producing an unexpected result. I am not sure if it is just my bug or some issues with ANTLR’S ambiguous alternatives processing logic.

Here is my grammar :

    grammar PPMacro;
options {
  language=Java;
  backtrack=true;

}

file: (inputLines)+ EOF;

inputLines 
:  ( preprocessorLineSet  |  oneNormalInputLine )  ; 

oneNormalInputLine  @after{System.out.print("["+$text+"]");}  
: (any_token_except_crlf)* CRLF ;

preprocessorLineSet 
: ifPart endifLine;

ifPart: ifLine  inputLines*   ;
ifLine  @after{System.out.print("{"+$text+"}" );} 
:  '#' IF (any_token_except_crlf)* CRLF ;

endifLine @after{System.out.print("{"+$text+"}" );} 
:  '#' ENDIF (any_token_except_crlf)* CRLF ;

any_token_except_crlf: (ANY_ID | WS | '#'|IF|ENDIF);
// just matches everything

CRLF: '\r'?  '\n'  ;
WS: (' '|'\t'|'\f' )+;
Hash: '#'  ;
IF     : 'if'    ;
ENDIF  : 'endif' ;
ANY_ID: ( 'a'..'z'|'A'..'Z'|'0'..'9'| '_')+ ;

Explanation:

It is for parsing a C++ #if … #endif block

I am trying to recognize nested #if #endif block. This is done by my preprocessorLineSet. It contains a recursive definition to support nested block. oneNormalInputLine is to handle anything not of the #if form. This rule is a match anything rule and actually matches a #if line. But I deliberately put it after the preprocessorLineSet in inputLines. I’m expecting this ordering can prevent it from matching a #if or #endif line. The reason to use a catch-all rule is that I want a rule to accept any other c++ syntax and simply echo them back to the output.

I my test, I just print out everything. Lines matched by preprocessorLineSet should be surrounded by {}, while those matched by oneNormalInputLine should be surrounded by [].

Sample inputs :

#if s
s
#if a
s 
s
#endif
#endif

and this

#if
abc
#endif

The corresponding outputs:

[#if s
][s
][#if a
][s
][s
][#endif
][#endif
]

and this

[#if
][abc
][#endif
]

Problem:

All the output lines including #if #endif are surrounded by [] meaning they are matched ONLY by oneNormalInputLine! But I am not expecting this. The preprocessorLineSet should be able to match the #if lines. Why’d I get this result?

This line contains ambiguous alternatives:

inputLines  :  ( preprocessorLineSet  |  oneNormalInputLine );

since both can match the #if and #endif. But I am expecting the first alternative should be used rather than the later one. Also note that backtracking is on.

EDIT
The reason my oneNormalInputLine rule accepts everything is that it is difficult to express something not having a specific pattern as #if pattern can be rather complicated:

/***

comments

*/   # /***
comments
*/ if  

is a valid pattern. Writing a rule not having this pattern seems difficult.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T12:25:45+00:00Added an answer on May 26, 2026 at 12:25 pm

    Your approach is not really robust – I’d suggest you to keep it simple and use the actual language rule, which says that every line that begins with # is a preprocessor directive, and the one that doesn’t begin with # isn’t. There would be no ambiguity in the grammar using this rule and it would be much simpler to understand.

    Now why doesn’t your grammar work? The problem is that your preprocesstoLineSet rule can’t match anything.

    preprocessorLineSet 
    : ifPart endifLine;
    
    ifPart: ifLine  inputLines*   ;
    

    It starts by #if ..., then should match other lines, and as the first matching #endif comes, it should match it and finish. However, it doesn’t actually do that. inputLines can match pretty much any line (pretty much – it won’t match eg. C++’s operators and other non-identifiers), including all preprocessor directives. That means the ifPart rule will match to the end of input and there would be no endifLine left. Note that backtracking has no effect on this, because once ANTLR matches a rule (in this case ifPart, which will succeed on the whole rest of the input, since * is greedy), it will never backtrack into it. ANTLR’s rules for backtracking are hairy…

    Note that if you made oneNormalLine not match preprocessor directives (eg. it would be something like (nonHash any*| ) CRLF, it would start to work.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some bison grammar: input: /* empty */ | input command ; command:
I have an ANTLR grammar file as part of a C# project file and
Lets say the same grammar is not LR(1), can we safely say that the
I have a Antlr Grammar Lexer Rule Like this, Letter : '\u0024' | '\u005f'|
Given an ANTLR Java Grammar - what java source code would I write to
I wrote a grammar for a language and now I want to treat some
I have this grammar to match simple logical predicates in ANTLR. exp : or
I have an ANTLR grammar and am defining a function in my language that
I have a simple grammar defined below using Antlr 3: grammar i; @header {
Every LR(0) grammar is SLR(1) but vice versa need not necessarily be true, why?

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.