I am trying to preprocess my C++ source files by ANTLR. I would like

Question

0

Asked: May 25, 20262026-05-25T17:28:47+00:00 2026-05-25T17:28:47+00:00

I am trying to preprocess my C++ source files by ANTLR. I would like

0

I am trying to preprocess my C++ source files by ANTLR. I would like to output an input file preserving all the whitespace formatting of the original source file while inserting some new source codes of my own at the appropriate locations.

I know preserving WS requires this lexer rule:

WS: (' '|'\n'| '\r'|'\t'|'\f' )+ {$channel=HIDDEN;};

With this my parser rules would have a $text attribute containing all the hidden WS. But the problem is, for any parser rule, its $text attribute only include those input text starting from the position that matches the first token of the rule. For example, if this is my input (note the formatting WS before and in between the tokens):

line   1;     line   2;

And, if I have 2 separate parser rules matching

"line   1;"

and

"line   2;"

above separately but not the whole line:

"    line   1;     line   2;"

, then the leading WS and those WS in between "line 1" and "line 2" are lost (not accessible by any of my rules).

What should I do to preserve ALL THE WHITESPACEs while allowing my parser rules to determine when to add new codes at the appropriate locations?

EDIT

Let’s say whenever my code contains a call to function(1) using 1 as the parameter but not something else, it adds an extraFunction() before it:

void myFunction() {
   function();
   function(1);
}

Becomes:

void myFunction() {
   function();
   extraFunction();
   function(1);
}

This preprocessed output should remain human readable as people would continue coding on it. For this simple example, text editor can handle it. But there are more complicated cases that justify the use of ANTLR.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T17:28:48+00:00

Another solution, but maybe also not very practical (?): You can collect all Whitespaces backwards, something like this untested pseudocode:

grammar T;

@members {
    public printWhitespaceBetweenRules(Token start) {
        int index = start.getTokenIndex() - 1;

        while(index >= 0) {
            Token token = input.get(index);
            if(token.getChannel() != Token.HIDDEN_CHANNEL) break;
            System.out.print(token.getText());
            index--;
        }
    }
}

line1: 'line' '1' {printWhitespaceBetweenRules($start); };
line2: 'line' '2' {printWhitespaceBetweenRules($start); };
WS: (' '|'\n'| '\r'|'\t'|'\f' )+ {$channel=HIDDEN;};

But you would still need to change every rule.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to preprocess my C++ source files by ANTLR. I would like

EDIT

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply