I have a custom made grammar for an interpreted language and I am looking for advice on a parser which will create a tree which I can query. From the structure I would like to be able to generate code in the interpreted language. Most grammar parsers that I have seen validate already existing code. The second part of my question is should the grammar be abstracted to the point that the Python code will substitute symbols in the tree for actual code terminology? Ideally, I would love be be able to query a root symbol and have returned all the symbols which fall under that root and so forth all the way to a terminal symbol.
Any advice on this process or my vocabulary regarding it would be very helpful. Thank you.
The vast majority of parser libraries will create an abstract syntax tree (AST) from whatever code it is you’re generating; you can use whatever, eg pyparsing. To go from the AST to code, you might have to write functions manually to do that, but it’s pretty easy to do that recursively. For example:
assuming an AST structure that’s just a list where the first element is a tag for the node name, followed by the trees for any arguments:
[+, 4, [*, 'x', 5]]. Of course, you should use whatever your parser library uses, unless you’re writing the parser yourself.I don’t understand what you mean by Python code substituting symbols in the tree for actual code terminology.
You could write an easy function to iterate over all the symbols under a root node:
On second thought, the variable name
astis maybe a poor choice because of the ast module.