I have written simple lexical analyzer. And I understand the need to provide each recognized token with attribute. Let’s see what I got:
public sealed class Token
{
public enum TokenClass
{
Identifier,
StringLiteral,
NumberLiteral,
Operator,
PunctuationSeparator,
Bracket,
Parenthesis
}
public TokenClass Class { get; internal set; }
public String Value { get; internal set; }
}
In lexer I enqueue tokens setting up thier value & class. But what about attributes? How should I design the feature relative to my existing token class?
First tought came into my mind was:
- Declare private abstract classes of “ambiguous-entities” (I mean that Number could be Integer and Real and so on) inside token class;
- Then declare inherited classes e.g.
public class Comma : PunctuationSeparator {}; - Add Property
Object Attribute {get; private set;}; - Then create method like
private void ApplyAttribute(); - Call
ApplyAttribute()when token is instantiated and properties are set; -
Use something like this inside
ApplyAttribute().switch(this.TokenClass) { case this.TokenClass.Number: { this.Attribute = (Int32.TryParse(this.Value))? new Integer() : new Real(); } }
In parser it would be easy to write something like that if(CurToken.Attribute is Integer).
One thing that stops me from doing like that is number of classes I should create. Is this solution acceptable?
The attributes I’d use for a token? Probably something along the lines of
I disagree, though, with the previous poster regarding conversion of the token’s text into a ‘value’. IMHO, that is the domain of the parser and the nodes of the parse tree. Until the parser has placed the tokens in context WRT the grammar, the token is just a piece of text with a label attached to it. The lexical analyzer doesn’t know (and should care) what’s happening downstream — for all it know, the took is pretty-printing the source text (in which case, you want to leave the individual tokens alone).
You might want to take a look at Terrance Parr’s book(s):