Do you know of a C++ library (open source, or free for non-commercial use) that can parse Java source codes, for example from a jar file or defined classpath? I want to extract classes, class members, methods, method calls and relations between these artifacts.
I’ve spent all day googling for a solution. Either I’m blind, or can’t read! 🙂
You can’t get source codes from a jar file, since that is really a set of (binary) class files. Assuming you means the source codes that might have been used to produce a jar file, then there’s a decent answer.
If you want an open source solution, you can try ANTLR, which has a Java 1.5 grammar and AFAIK will build AST. From that you can “extract” the trees for the items you want, or at least the line numbers for the subtree of interest; from there, you can extract the code you want.
I believe ANTLR can be configured to produce a C++-based parser.
To capture relations between these, you need full name and type resolution, so you know which definition an identifier actually references. For this, ANTLR being just a parser won’t do the trick; you need to live a Life After Parsing.
An alternative might be the Java compiler; it offers some kind of API.