I need to count the number of classes in correct C# source file.
I wrote the following grammar:
grammar CSharpClassGrammar;
options
{
language=CSharp2;
}
@parser::namespace { CSharpClassGrammar.Generated }
@lexer::namespace { CSharpClassGrammar.Generated }
@header
{
using System;
using System.Collections.Generic;
}
@members
{
private List<string> _classCollector = new List<string>();
public List<string> ClassCollector { get { return
_classCollector; } }
}
/*------------------------------------------------------------------
* PARSER RULES
*------------------------------------------------------------------*/
csfile : class_declaration* EOF
;
class_declaration
: (ACCESSLEVEL | MODIFIERS)* PARTIAL? 'class' CLASSNAME
class_body
';'?
{ _classCollector.Add($CLASSNAME.text); }
;
class_body
: '{' class_declaration* '}'
;
/*------------------------------------------------------------------
* LEXER RULES
*------------------------------------------------------------------*/
ACCESSLEVEL
: 'public' | 'internal' | 'protected' | 'private' | 'protected
internal'
;
MODIFIERS
: 'static' | 'sealed' | 'abstract'
;
PARTIAL
: 'partial'
;
CLASSNAME
: ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_')*
;
COMMENT
: '//' ~('\n'|'\r')* {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WHITESPACE
: ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = HIDDEN; }
;
This parser correctly count empty classes (and nested classes too) with empty class-body:
internal class DeclarationClass1
{
class DeclarationClass2
{
public class DeclarationClass3
{
abstract class DeclarationClass4
{
}
}
}
}
I need to count classes with not empty body, such as:
class TestClass
{
int a = 42;
class Nested { }
}
I need to somehow ignore all the code that is “not a class declaration”.
In the example above ignore
int a = 42;
How can I do this? May be example for other language?
Please, help!
When you’re only interested in certain parts of a source file, you could set
filter=truein your options { … } sections. This will enable you to only define those tokens you’re interested in, and what you don’t define, is ignored by the lexer.Note that this only works with lexer grammars, not in combined (or parser) grammars.
A little demo:
It’s important you leave the
Identifierin there because you don’t wantXclass Footo be tokenized as:['X', 'class', 'Foo']. With theIdentifierin there,Xclasswill become the entire identifier.The grammar can be tested with the following class:
which produces the following output:
Note that this is just a quick demo, I am not sure if I handled the proper string literals in the grammar (I am unfamiliar with C#), but this demo should give you a start.
Good luck!