Specifically a Multigraph.
Some colleague suggested this and I’m completely baffled.
Any insights on this?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
It’s pretty straightforward to store a graph in a database: you have a table for nodes, and a table for edges, which acts as a many-to-many relationship table between the nodes table and itself. Like this:
However, there are a couple of sticky points about storing a graph this way.
Firstly, the edges in this scheme are naturally directed – the start and end are distinct. If your edges are undirected, then you will either have to be careful in writing queries, or store two entries in the table for each edge, one in either direction (and then be careful writing queries!). If you store a single edge, i would suggest normalising the stored form – perhaps always consider the node with the lowest ID to be the start (and add a check constraint to the table to enforce this). You could have a genuinely unordered representation by not having the edges refer to the nodes, but rather having a join table between them, but that doesn’t seem like a great idea to me.
Secondly, the schema above has no way to represent a multigraph. You can extend it easily enough to do so; if edges between a given pair of nodes are indistinguishable, the simplest thing would be to add a count to each edge row, saying how many edges there are between the referred-to nodes. If they are distinguishable, then you will need to add something to the node table to allow them to be distinguished – an autogenerated edge ID might be the simplest thing.
However, even having sorted out the storage, you have the problem of working with the graph. If you want to do all of your processing on objects in memory, and the database is purely for storage, then no problem. But if you want to do queries on the graph in the database, then you’ll have to figure out how to do them in SQL, which doesn’t have any inbuilt support for graphs, and whose basic operations aren’t easily adapted to work with graphs. It can be done, especially if you have a database with recursive SQL support (PostgreSQL, Firebird, some of the proprietary databases), but it takes some thought. If you want to do this, my suggestion would be to post further questions about the specific queries.