I have an AST derived from the ANTLR Parser Generator for Java. What I want to do is somehow construct a control flow graph of the source code, where each statement or expression is a unique Node. I understand there must be some recursiveness to this identification, I was wondering what you would suggest as the best option and if ANTLR has a toolset I can use for this job. Cheers, Chris
EDIT – My main concern is to get a control flow graph(CFG) from the AST. This way I can get a tree representation of the source. To clarify, both the source code and the implementation language is Java.
Usually CFGs are computed on a lower-level representation (e.g. JVM bytecode). Someone did a thesis on such things a few years ago. There might be a helpful way described in there for how to get at that representation.
Since your source and target languages are the same, there’s no code generation step — you’re already done! However, now you get to walk the AST. At each node of the AST, you have to ask yourself: is this a ‘jumping’ instruction or not? Method calls and if statements are examples of jumping instructions. So are loop constructs (such as
forandwhile). Instructions such as addition and multiplication are non-jumping.First associate with each java statement a node in the CFG, along with an entry and exit node. As a first approximation, walk the tree and:
This will give you some kind of CFG. The procedure is slightly hairy in step 2 because the method called may be declared in a library, and not elsewhere in the AST — if so, either don’t make an edge or make an edge to a special node representing the entry to that library method.
Does this make sense?