I need to write a parser to parse commands. 5 such commands are:
"a=10"
"b=foo"
"c=10,10"
"clear d"
"c push_back 2"
In the case of the first example, set is the command, a is the object and 10 is the value.
What do you think the parser should return for each line above?
Here is my idea:
"a=10" -> SET (COMMAND_ENUM), INT (VALUE_TYPE), "a", ("10")
"b=foo" -> SET (COMMAND_ENUM), STRING (VALUE_TYPE), "b", ("foo")
Is this a good approach? What is the standard approach for this problem? Should I dispatch instead?
I have a function which checks the type associated with an object. For example, a above is of type INT and must be assigned an INT value, otherwise the parser should return or throw an error of some sort. I also have a convert function for converting values from strings to the desired type. These throw if the conversion is not possible. If the parser tries to convert the values from strings to the required type, then it is probably a good idea to return them via a boost::variant.
You need to come up with at least a semi-formal grammar for the command language you want to recognize, since you’ve left a whole lot of things really vaguely specified (e.g. in
b=fooyou wantbto be a variable name butfooto be a string literal. How do you distinguish them?. Does a sequence of characters represent an identifier if it’s on the right side of an assignment, but a literal if it’s on the left side? Or does a single character represent an identifier, but multiple characters represent a literal?) Inc=10,10does10,10represent a list or a vector? Writing a grammar will at least force you to think about such things, and it will also serve at least as a guide to how to write your parser (at most it will be something that can be automatically translated into your parser).You’re on the right track by thinking of how statements should be represented as Abstract Syntax Trees (ASTs), but you need to take a step backwards and look at what you want in terms of concrete syntax.