Let’s say I have an acyclic directed graph such as a family “tree” (not really a tree since a child has 2 parents). I want to place a representation of this graph in a relational database so that it’s fast to compute all ancestors of a node, and all descendants of a node. How would you represent this graph? How would you query for all descendants? How would you insert and remove nodes and relationships? What assumptions are you making about the data?
The best solution will have the best big O for the number of select/insert/delete statements you run to query ancestors and descendants, with ties broken by best big O for total runtime, with ties broken by space requirements.
My coworker posed this question to me. I have a solution but it’s exponential size in the worst case so I wanted to see how other people would solve it.
edit
Clarified relational database. This question is trivial (and boring) if you use graph databases with built in transitive closures.
If
selects>manipulations, and especially subtree selects (all ancestors, all descendants) I’d go for a Closure-table approach. Yes, an explosion of paths in your path-table, but it does deliver results fast (as opposed to the adjacency model), and keeps updates limited to relevant portions (as opposed to 50% update with nested sets).Bill Karwin has some nice presentation online about pros and cons of different models, see http://www.slideshare.net/billkarwin/models-for-hierarchical-data (slide 48 is an overview).