I am developing a lexer grammar for C/C++ source code. The goal of the

Question

0

Asked: June 10, 20262026-06-10T05:40:34+00:00 2026-06-10T05:40:34+00:00

I am developing a lexer grammar for C/C++ source code. The goal of the

0

I am developing a lexer grammar for C/C++ source code. The goal of the grammar is to fight plagiarism between students at university.

To improve the effectiveness of the grammar, I want ANTLR to create the same token for the 4(?) different ways a student could increment a variable:

i++
++i
i += 1
(i = i + 1) [I doubt that this can be solved with ANTLR]

Each of these expressions should result in the token INCREMENT.

What I have come up with so far: (only the neccessary parts of the grammar are reproduced here)

options {
    language = CSharp3;
    filter = true;
    k = 2;
}

INCREMENT : IDENTIFIER (PLUSPLUS | ADDEQUAL '1') | PLUSPLUS IDENTIFIER ;
IDENTIFIER 
    :   LETTER (LETTER | DIGIT)*;

/*
 * covers both decimal and hex integer literals
 */
INTEGER_LITERAL : 
   DIGIT+ | '0x' HEX_DIGIT+;

ADDEQUAL            : '+=';
PLUSPLUS            : '++';

fragment
LETTER  :   'A'..'Z' | 'a'..'z';

fragment
HEX_DIGIT : DIGIT | 'a'..'f' | 'A'..'F';

fragment
DIGIT : '0'..'9';

testing this grammar for i += 1 results in the token sequence IDENTIFIER ADDEQUAL INTEGER_LITERAL instead of INCREMENT.

Why is that?
From my understanding the precedence of rules is top to bottom plus INCREMENT is the “bigger” rule.

What adjustments to the grammar need I make to get the desired result?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T05:40:36+00:00

testing this grammar for i += 1 results in the token sequence IDENTIFIER ADDEQUAL INTEGER_LITERAL instead of INCREMENT.

Why is that?

Because "i += 1" contains spaces you didn’t account for inside your INCREMENT rule.

What adjustments to the grammar need I make to get the desired result?

Account for the spaces (and line breaks, possibly).

However, creating a lexer alone does not seem the way to go here. You really need a parser, IMO. And the option k = 2; sets look ahead for parser rules, not lexer rules: so in case you stick to lexing only, you mind as well remove it.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am developing a lexer grammar for C/C++ source code. The goal of the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply