Right now, I have a script which uses PHP’s tokenizer to look for certain functions within a PHP source code file. The pattern I am currently looking for is:
T_STRING + T_WHITESPACE (optional) + “(“
This seems to match all of my test cases so far except variable functions, which I am ignoring for the purposes of this question.
The obvious problem here is that this pattern produces a lot of false positives, like matching function definitions:
public function foo() { // foo() should not be matched
My question is, is there a more reliable/accurate method for looking at source code and plucking out all the function invocations? Maybe a better method than using the tokenizer at all?
Edit:
In particular, I’m looking to emulate the functionality of the disable_functions PHP directive within a class file. So, if exec() should be disallowed, I’m trying to find any uses of that function within the analyzed file. I do realize that variable functions make this terribly difficult, so I am detecting these and disallowing them as well.
You first run the tokenizer (available in PHP). Then you run a parser on top of the tokens. The parser needs to read the tokens and should be able to tell your what a specific token has been used for. It depends on the reliability of your parser how reliable the outcome is.
If your current parser (you have not shown any code) is not reliable enough, you need to write a better parser. That simple it is. Probably you’re not doing much more than just tokenizing and then reading as it passes which just might not be enough.