I have a corpus of token-index based documents which offers a query method. The user manually(!) enters a query string which needs to be parsed and evaluated. The corpus should then return a list of all documents matching the given query string. The query language features the simple boolean operators AND, NOT and OR which can also be prioritized by parenthesis.
After some research I already used ANTLR to parse a given query string into a syntax tree.
For example: The query
"Bill OR (John AND Jim) OR (NOT Simon AND Mike)"
is translated in the following syntax tree:
EDIT: Please see the correct graph in Bart Kiers post (copied here):

All nodes in the tree are simple strings and each node knows its parent and children but not its siblings.
As you can see, the ANTLR grammar already dictated the order in which the operations need to be executed: the ones at the bottom of the tree come first.
So what I probably need to do is recusively(?) evaluate all operands in the tree.
In general, I can do a simple search on my corpus using a method Get(string term) for each leaf in the tree (like “Bill” or “John”). Get() returns a list of documents containing the term in the leaf. I can also evaluate the parent of each leaf to recognize a possible NOT operator which would then lead to a result list of documents NOT containing the term in the leaf (using the method Not() instead of Get()).
The AND and OR operator should be transformed into method calls which need two parameters:
- AND should call a method Intersect(list1, list2) which returns a list of documents that are in list1 AND in list2.
- OR should call a method Union(list1, list2) which returns a list of documents that are either in list1 OR in list2.
The parameters list1 and list2 contain the documents I received before using Get() or Not().
My question is: How can I – semantically and syntactically in C# – evaluate all necessary search terms and use them to call the right operator methods in the correct order? Intuitively it sounds like recursion but somehow I can’t picture it – especially since not all methods that need to be called have the same amount of parameters. Or are there maybe entirely other ways to accomplish this?
In Pseudo Code
Does that help?
Or were you looking to optimize the order of evaluations, which probably has books written on it…