I’m trying to write a simple interactive (using System.in as source) language using antlr, and I have a few problems with it. The examples I’ve found on the web are all using a per line cycle, e.g.:
while(readline)
result = parse(line)
doStuff(result)
But what if I’m writing something like pascal/smtp/etc, with a “first line” looks like X requirment? I know it can be checked in doStuff, but I think logically it is part of the syntax.
Or what if a command is split into multiple lines? I can try
while(readline)
lines.add(line)
try
result = parse(lines)
lines = []
doStuff(result)
catch
nop
But with this I’m also hiding real errors.
Or I could reparse all lines everytime, but:
- it will be slow
- there are instructions I don’t want to run twice
Can this be done with ANTLR, or if not, with something else?
Yes, ANTLR can do this. Perhaps not out of the box, but with a bit of custom code, it sure is possible. You also don’t need to re-parse the entire token stream for it.
Let’s say you want to parse a very simple language line by line that where each line is either a
programdeclaration, or ausesdeclaration, or astatement.It should always start with a
programdeclaration, followed by zero or moreusesdeclarations followed by zero or morestatements.usesdeclarations cannot come afterstatements and there can’t be more than oneprogramdeclaration.For simplicity, a
statementis just a simple assignment:a = 4orb = a.An ANTLR grammar for such a language could look like this:
But, we’ll need to add a couple of checks of course. Also, by default, a parser takes a token stream in its constructor, but since we’re planning to trickle tokens in the parser line-by-line, we’ll need to create a new constructor in our parser. You can add custom members in your lexer or parser classes by putting them in a
@parser::members { ... }or@lexer::members { ... }section respectively. We’ll also add a couple of boolean flags to keep track whether theprogramdeclaration has happened already and ifusesdeclarations are allowed. Finally, we’ll add aprocess(String source)method which, for each new line, creates a lexer which gets fed to the parser.All of that would look like:
Now inside our grammar, we’re going to check through a couple of gated semantic predicates if we’re parsing declarations in the correct order. And after parsing a certain declaration, or statement, we’ll want to flip certain boolean flags to allow- or disallow declaration from then on. The flipping of these boolean flags is done through each rule’s
@after { ... }section that gets executed (not surprisingly) after the tokens from that parser rule are matched.Your final grammar file now looks like this (including some
System.out.println‘s for debugging purposes):which can be tested wit the following class:
To run this test class, do the following:
As you can see, you can only declare a
programonce:usescannot come afterstatements:and you must start with a
programdeclaration: