I am trying realize python like indent-depending grammar.
Source example:
ABC QWE
CDE EFG
EFG CDE
ABC
QWE ZXC
As i see, what i need is to realize two tokens INDENT and DEDENT, so i could write something like:
grammar mygrammar;
text: (ID | block)+;
block: INDENT (ID|block)+ DEDENT;
INDENT: ????;
DEDENT: ????;
Is there any simple way to realize this using ANTLR?
(I’d prefer, if it’s possible, to use standard ANTLR lexer.)
I don’t know what the easiest way to handle it is, but the following is a relatively easy way. Whenever you match a line break in your lexer, optionally match one or more spaces. If there are spaces after the line break, compare the length of these spaces with the current indent-size. If it’s more than the current indent size, emit an
Indenttoken, if it’s less than the current indent-size, emit aDedenttoken and if it’s the same, don’t do anything.You’ll also want to emit a number of
Dedenttokens at the end of the file to let everyIndenthave a matchingDedenttoken.For this to work properly, you must add a leading and trailing line break to your input source file!
ANTRL3
A quick demo:
You can test the parser with the class:
If you now put the following in a file called
in.txt:AAA AAAAA BBB BB B BB BBBBB BB CCCCCC C CC BB BBBBBB C CCC DDD DD D DDD D DDD(Note the leading and trailing line breaks!)
then you’ll see output that corresponds to the following AST:
Note that my demo wouldn’t produce enough dedents in succession, like dedenting from
ccctoaaa(2 dedent tokens are needed):You would need to adjust the code inside
else if(n < previousIndents) { ... }to possibly emit more than 1 dedent token based on the difference betweennandpreviousIndents. Off the top of my head, that could look like this:ANTLR4
For ANTLR4, do something like this:
Taken from: https://github.com/antlr/grammars-v4/blob/master/python3/Python3.g4