I’m having a bit of trouble manually emitting a token with a lexer rule

Question

0

Asked: May 13, 20262026-05-13T15:07:44+00:00 2026-05-13T15:07:44+00:00

I’m having a bit of trouble manually emitting a token with a lexer rule

0

I’m having a bit of trouble manually emitting a token with a lexer rule in ANTLR. I know that the emit() function needs to be used but there seems to be a distinct lack of documentation about this. Does anybody have a good example of how to do this?

The ANTLR book gives a good example of how you need to do this to parse Python’s nesting. For example, if you see a certain amount of whitespace that’s greater than the previous line’s whitespace, emit an INDENT token but if it’s less, emit a DEDENT token. Unfortunately the book glosses over the actual syntax that’s required.

EDIT: Here’s an example of what I’m trying to parse. It’s Markdown’s nested blockquotes:

before blockquote

> text1
>
> > text2
>
> text3

outside blockquote

Now, my approach so far is to essentially count the > symbols per line. For example, the above seems like it should emit (roughly…) PARAGRAPH_START, CDATA, PARAGRAPH_END, BQUOTE_START, CDATA, BQUOTE_START, CDATA, BQUOTE_END, CDATA, BQUOTE_END, PARAGRAPH_START, CDATA, PARAGRAPH_END. The difficulty here is the final BQUOTE_END which I think should be an imaginary token emitted once a non-blockquote element is found (and the nesting level is >= 1)