I have an interesting problem. I need to analyse a source code and to determine the types of variables before it is compiled. So, Reflection cannot be used!
There are only five types:
double x = 1.23;
long x = 3;
string s='Hello World!'
bool b=true
object[] A = [1, 1+2, 'Hello', s]
An example of cource code:
for (i=0; i < 5; i++)
{
a=2;
b=4;
c=6;
tesstClass.Str = 'sss';
}
I decided to use regular expressions to solve the problem.
First, I find all pieces of code with the desirable variable (expressions with it) as follows:
string pattern = variable + @"[\w.]*\s*[-*+/]?=\s*[\w\s+'*/-]*\s*;";
MatchCollection mc = Regex.Matches(code, pattern);
Second, I analyse each Match using 5 regular expressions (one for each type):
string stringPattern = @"'[^'\r\n]*'"; //String;
string doublePattern = @"\b[0-9]+\.[0-9]+\b"; //Double
string longPattern = @"[-+]?\b\d+\b"; // Integer with a sign
string boolPattern = @"\b(false|true)\b"; // Boolean
string arrayPattern = @"\[([\w']*\s*,?\s*)*\]"; // Array
I am very bad in regular expressions. So I’ve defined a set of very simple r. expressions. Can you help me to refine them.
The normal way of doing this would be to get the AST of your program and then simply search for the variable declarations you need. Gramars as suggested are a nice way of generating such AST.
But, if you need to analyse your program on the fly you can’t use this option because your code might have parse errors. In this case I feel your pain…
Your only option is to parse your source code and regular expressions might help a bit.
First, I would begin with a regex similar to this:
obs: YOUR_VARIABLE_TOKEN is missing because the variable has strong and defined rules about how it can be constructed for each language.
I didn’t test this regex and it certainly isn’t perfect. It was just to give you an idea.
Second, you would have to validate these matches with certain exception cases. For instance:
"bool a;"/* bool a; */Also, this is not a very strange request. Eclipse does this kind of evaluation too in some cases like indenting.
This is not an easy task though, specially, finding those exception cases. Good Luck.