I have a program, written in C#, that when given a C++ or C# file, counts the lines in the file, counts how many are in comments and in designer-generated code blocks. I want to add the ability to count how many functions are in the file and how many lines are in those functions. I can’t quite figure out how to determine whether a line (or series of lines) is the start of a function (or method).
At the very least, a function declaration is a return type followed by the identifier and an argument list. Is there a way to determine in C# that a token is a valid return type? If not, is there any way to easily determine whether a line of code is the start of a function? Basically I need to be able to reliably distinguish something like.
bool isThere()
{
...
}
from
bool isHere = isThere()
and from
isThere()
As well as any other function declaration lookalikes.
Start by scanning scopes. You need to count open braces { and close braces } as you work your way through the file, so that you know which scope you are in. You also need to parse // and /* … */ as you scan the file, so you can tell when something is in a comment rather than being real code. There’s also #if, but you would have to compile the code to know how to interpret these.
Then you need to parse the text immediately prior to some scope open braces to work out what they are. Your functions may be in global scope, class scope, or namespace scope, so you have to be able to parse namespaces and classes to identify the type of scope you are looking at. You can usually get away with fairly simple parsing (most programmers use a similar style – for example, it’s uncommon for someone to put blank lines between the ‘class Fred’ and its open brace. But they might write ‘class Fred {‘. There is also the chance that they will put extra junk on the line – e.g. ‘template class __DECLSPEC MYWEIRDMACRO Fred {‘. However, you can get away with a pretty simple “does the line contain the word ‘class’ with whitespace on both sides? heuristic that will work in most cases.
OK, so you now know that you are inside a namepace, and inside a class, and you find a new open scope. Is it a method?
The main identifying features of a method are:
So you could search up for a blank line, or a line ending in ; { or } that indicates the end of the previous statement/scope. Then grab all the text between that point and the open brace of your scope. Then extract a list of tokens, and try to match the parameter-list brackets. Check that none of the tokens are reserved words (enum, struct, class etc).
This will give you a “reasonable degree of confidence” that you have a method. You don’t need much parsing to get a pretty high degree of accuracy. You could spend a lot of time finding all the special cases that confuse your “parser”, but if you are working on a reasonably consistent code-base (i.e. just your own company’s code) then you’ll probably be able to identify all the methods in the code fairly easily.