I am trying to build an abstract interpreter for C. Probably not for the whole grammar but for just a subset of it. I have previously asked on what language to use. Before I proceed any further, I would like to know how this abstract interpretation works ?
I have gone through the Wiki links and the lecture note links. I have understood the rationale and the theory behind it. I have my analysis worked out. The part I am totally not able to understand is how to interpret the code. That is, I had the initial code. I now have it preprocessed. I have also performed some normalization to the code which is required by my analysis. Now, how do I execute the code line-by-line and extract data out it as I keep executing it? (Please tell me if this is impossible. Or there is some way to properly execute the program which will achieve my objective). I am looking at collecting the information like the memory address of the dynamically allocated space, the return addresses of the function call.
I was suggested CIL earlier, CIL is mostly a transformation tool, transforming the code to some normalized form taking care of many anomalies but I was not able to get any information pertaining to my problem.
My question is how to extract the information line by line and which language is preferable ? Imperative languages or functional languages ? I have been Googling quite a few days for information regarding this, but of no use. Any links are also highly appreciated. Thanks.
EDIT : I still have some doubts. I got the part where we try to build an virtual environment. Let me explain what I am trying to do, so that it will help the discussion. I am basically trying to do pointer analysis which mainly concentrates on pointer arithmetic. Now suppose I have a integer pointer and I do an pointer arithmetic then I cannot be sure if the pointer is still pointing to a valid data.
From what you are saying, I understand we need to allocate the spaces for the variables but what about the values. if I have something like below
int a=10;
int *p = &a;
p = p+4;
Here the values of a and the constant ‘4’ is known. What if I get value from user or file. In such a case I need to execute the actual program. At the same time , I need to capture the data like the address. below,
int *p =(int *) malloc (sizeof(int));
*p= 15;
cout<<*p;
p = p+ino//some user input value;
cout<<*p;
So basically the code has to be executed but later part of the solution sounded more like parsing the C file. Please correct me if I am wrong.
Assuming you are really talking about abstract interpretation rather than merely interpreting C…
Abstract interpretation relies on two things – an abstract domain, a finite height lattice and an abstract semantics whereby the applying the semantics of a line to the value in the domain from the line before must produce a new value in the domain which is the same height or higher.
i.e. If your domain is the powerset of
{1,2,3,4}and the input is{1,2,3}the only valid outputs are{1,2,3}or{1,2,3,4}(assuming usual set ordering)You then proceed by performing fixed-point recursion on each line and storing the output of the semantics with the line, and the semantics at the end of each function with the function definition. How you choose the domain and interpret the set you end up with depends enormously on the analysis you are trying to do, but that is the outline as I understand it…
I must say i am not an expert with this, but some of my research colleagues have talked to me about it in the past, and this is the understanding i have come out with…
Also, you can just as easily run the analysis backwards – starting at the end of the function and moving forward, and this will be more appropriate for some kinds of analysis…