The ANTLR website describes two approaches to implementing “include” directives. The first approach is to recognize the directive in the lexer and include the file lexically (by pushing the CharStream onto a stack and replacing it with one that reads the new file); the second is to recognize the directive in the parser, launch a sub-parser to parse the new file, and splice in the AST generated by the sub-parser. Neither of these are quite what I need.
In the language I’m parsing, recognizing the directive in the lexer is impractical for a few reasons:
- There is no self-contained character pattern that always means “this is an include directive”. For example,
Include "foo";at top level is an include directive, but inArray bar --> Include "foo";orConstant Include "foo";the wordIncludeis an identifier. - The name of the file to include may be given as a string or as a constant identifier, and such constants can be defined with arbitrarily complex expressions.
So I want to trigger the inclusion from the parser. But to perform the inclusion, I can’t launch a sub-parser and splice the AST together; I have to splice the tokens. It’s legal for a block to begin with { in the main file and be terminated by } in the included file. A file included inside a function can even close the function definition and start a new one.
It seems like I’ll need something like the first approach but at the level of TokenStreams instead of CharStreams. Is that a viable approach? How much state would I need to keep on the stack, and how would I make the parser switch back to the original token stream instead of terminating when it hits EOF? Or is there a better way to handle this?
==========
Here’s an example of the language, demonstrating that blocks opened in the main file can be closed in the included file (and vice versa). Note that the # before Include is required when the directive is inside a function, but optional outside.
main.inf:
[ Main;
print "This is Main!";
if (0) {
#include "other.h";
print "This is OtherFunction!";
];
other.h:
} ! end if ]; ! end Main [ OtherFunction;
A possibility is for each
Includestatement to let your parser create a new instance of your lexer and insert these new tokens the lexer creates at the index the parser is currently at (see theinsertTokens(...)method in the parser’s@membersblock.).Here’s a quick demo:
Inform6.g
main.inf
other.h
Main.java
To run the demo, do the following on the command line:
The output you’ll see corresponds to the following AST: