I want to store undirected graph edges (for example, for friends). To store and retrieve all friends of node a, one can use:
Create two rows per edge, query on one column per node:
+--------------------------+
| id | from_node | to_node |
+--------------------------+
| 1 | a | b |
| 2 | b | a |
+--------------------------+
SELECT * FROM `x` WHERE from_node = a
Create one row per edge, use OR:
+--------------------------+
| id | node_a | node_b |
+--------------------------+
| 1 | a | b |
+--------------------------+
SELECT * FROM `y` WHERE node_a = a OR node_b = a
Which makes for more efficient lookups?
- Table
xwith2nrows, indices onfrom_nodeandto_node, lookup on one column - Table
ywithnrows, indices onnode_aandnode_b, lookup on both columns usingOR
if you optimise everything, then X will be fastest, assuming that you read data from disk and are querying for friends of a single person. that’s because you can arrange your data on disk so that they are ordered to match one index, which is the one you are querying. so, for a single person, you only need to do one disk seek. Y requires queries on two indices, so may imply multiple seeks to retrieve friends, even for a single person (and disk access time usually dominates simple queries).
see clustered indices at wikipedia (and the mysql manual)
if you are lucky enough to know that data will always be in memory then they will likely both be “fast enough” (and even if the data are on disk they may be fast enough – i am not saying X is the best design, only that it can be made most efficient).