I’m trying to build a random query generator for a query language we developed. The idea is to generate random queries by following the rules in a parse-table. So far, all of the LL parser generators I’ve tried only generate recursive-descent parsers. I could try and modify the generated parser but looking at the parser that ANTLR generated for a tiny expression language, a parser for our query language will be very large.
So, I think a table-driven parser would be smaller and easier to tweak. Are there any open-source tools that can generate a table-driven LL parser?
Alternatively, can a LR parse table be used “in reverse” to derive random queries?
Yes, an LR parser table can be used to generate random queries. You simply traverse the tables as if you were parsing, and anyplace a state table offers you a transition or a reduction on a token, you randomly propose that token, with preference for reductions once the size of the query is “large enough”. The token sequence is your query.
But you don’t need the LR table for this; you can traverse your grammar definition (a set of BNF rules) in a similar way. You’ll want to first normalize all your rules to the following form:
This gets rid of Kleene star and plus, and alternatives, by adding new generated nonterminals to represent complex subphrases in the original rules. This is straightforward to do.
If you have a non-terminal (initially the goal symbol), merely find rules with matching LHS, pick one at random, and expand that rule. You have to keep track of your position in the rule as a “dot” (same idea as “items” in LR parser generation). Given a rule and a dot, either the token selected is a terminal (propose it, being no choice!) or a nonterminal (recursively generate a token sequence for the nonterminal
using the method of this paragraph).