What are the various tradeoffs for performing static analysis on various levels of code? For instance for Java, why would someone perform static analysis on Java source code vs. Jasmin code vs. Java bytecode? Does the choice restrict or expand the various types of analyses able to be done? Does the choice influence the correctness of the analyses? Thanks.
What are the various tradeoffs for performing static analysis on various levels of code?
Share
From a user perspective, I’d say that, unless you have very specific, easy to formalize, properties to analyze (such as pure safety properties) go with a tool that supports Java source code.
From a tool-developer perspective, it may be easier to work with one level or another. I here present the differences that come to my mind. (Note that with a compiler and/or a decent decompiler a tool for instance operate on one layer and present the results on another.)
Pros for Java source code:
Pros for Bytecode:
Pros for machine code:
State of the art tools such as Spec# etc (formal methods dialect of C#) usually go through an intermediate language (BoogiePL (neighter MSIL nor C#) in the Spec# case) specifically designed for formal analysis.
In the end… no, not really. You face the same fundamental problems regardless of which (Turing complete) language you choose to analyze. Depending on what properties you analyze, YMMV though.
If you’re into formal methods and thinking about implementing an analysis yourself, I suspect you’ll find better tool-support for bytecode. If you’re a user or developer and want to perform analysis on your own code-base, I suspect you’ll benefit more from tools operating on Java-source code level.
Depends on what you mean by correctness. A static analysis is most often “defensive” in the sense that you don’t assume anything that you don’t know is true. If you restrict your attention to sound verification systems, all of them will be “equally correct”.