I’m writing a concolic engine for Python using the sys.settrace() functionality.
The main task during such kind of execution is to record the constraints on the input variables. The constraints are nothing else than the conditions of the if statements, that create two branches (the ‘then’ and the ‘else’ branch).
When an execution is complete, the engine chooses a constraint and finds appropriate values for the inputs so that the execution will go the down along the other branch (at execution x it goes the ‘then’ branch, at execution x+1 it goes along the ‘else’ branch).
This is to have a bit of context on why I doing what I’m trying to do…
By combining settrace() and the dis module, I get to see the bytecode of each source line, just before it is executed. This way I can easily record the if conditions as they appear during execution.
But then I have the big problem. I need to know which way the if went, which branch the execution took. So if my code is something like:
if x > a:
print x
else:
print a
at a certain point my tracing thing will see:
t: if x > 0:
then the python interpreter will execute the if and jump (or not) somewhere. And I will see:
t + 1: print x
So is the instruction t + 1 in the "then" branch or in the "else" one? Keep in mind the trace function sees only some bytecode in the current block.
I know of two way to do this. One is to evaluate the condition to see exactly whether it is true or false. This works only if there are no side-effects.
The other way is to try to look and the instruction pointer at t + 1 and try to understand where we are in the code. This is the way I am using right now, but it very delicate because at t + 1 I could find myself somewhere completely different (another module, a builtin function, etc).
So finally, the question I have is this: is there a way to get from Python itself, or from a C module/extension/whatever, the result of the last conditional jump?
In alternative, are there more fine-grained tracing options? Something like execute bytecode one opcode at a time. With the settrace() functionality the maximum resolution I get is whole source code lines.
In the worst case, I think I can modify the Python interpreter to expose such information, but I would leave that as last resort, for obvious reasons.
In the end this is what I did. I implemented the AST instrumentation and it works pretty well.
By playing with the AST, you need to move all function calls (also attributes and subscriptions, due to
getattr()and friends, out of the if conditions by creating temporary variables. Also you need to split theandandoroperators.Then add a call to your own function at the beginning of each branch, with a boolean parameter,
Truefor the then branch andFalsefor the else branch.After that I wrote an AST to source converter (there is one somewhere on the net, but does not work on current Python versions).
Working with the AST is very easy and quite simple, I ended up doing three transformation passes, adding also some
importstatements.This is the first pass, as an example. It splits if conditions if they contain
ororandoperators:Probably it is not very useful for code coverage applications because it messes up with the code quite a lot.