I have a client who understands that his data model is a directed acyclic graph. We’ve been working with collections of nodes and an intermediate table of edges, and the performance has been pretty good. We have less than 100,000 data nodes in the current implementation, although that may grow by one or two orders of magnitude. He’s recently become convinced that, since we have a graph, a graph database (like Neo4J or Titan) would be “better.”
What problems does a graph-oriented database actually solve that cannot be solved with SQL, or that requires much more heavy lifting from the SQL client? From what I can see, path discovery appears to be it, but that can’t be the whole story.
In a relational database, nodes and edges will be related by some value they have in common. Searching for a node or edge will generally involve querying an index for this value.
In a graph database, nodes and edges are related directly by the same sort of internal database structures a relational database uses to maintain the internal structure of an index. So finding an edge from a node or a node from an edge is more like going one level deep in a relational index regardless of the number of nodes; while if you have millions of nodes in a relational database the index would be several levels deep.