I want to write a lexical parser for regular text.
So i need to detect following tokens:
1) Word
2) Number
3) dot and other punctuation
4) “…” “!?” “!!!” and so on
I think that is not trivial to write “if else” condition for each item.
So is there any finite state machine generators for c#?
I know ANTLR and other but while i will try to learn how to work with these tools i can write my own “ifelse” FSM.
i hope to found something like:
FiniteStateMachine.AddTokenDefinition(":)","smile");
FiniteStateMachine.AddTokenDefinition(".","dot");
FiniteStateMachine.ParseText(text);
I suggest using Regular Expressions. Something like
@"[a-zA-Z\-]+"will pick up words (a-z and dashes), while@"[0-9]*(\.[0-9]+)?"will pick up numbers (including decimal numbers). Dots and such are similar –@"[!\.\?]+"– and you can just add whatever punctuation you need inside the square brackets (escaping special Regex characters with a ).Poor man's "lexer" for C# is very close to what you are looking for, in terms of being a lexer. I recommend googling regular expressions for words and numbers or whatever else you need to find out what expressions, exactly you need.
EDIT:
Or see Justin’s answer for the particular regexes.