As a training project I am trying to build a Family tree application on Azure.
The first step is the database, I plan to use table storage.
What would a table storage design look like for a Family tree application?
I have though of a couple of solutions.
- one entry per person, with xml with all relationships for that person. But that would mean updating several rows for a given change and a lot of duplicate data.
- one table for each type of information, one for persons, one for relationships… But this just feels like a relational database
I would build a partition per family with a row per person, so for each person the partition key would be the family and the row key the identifier for the person. On each person put an attribute for the parents (normally just two :)). This way you can quickly read the entire partition into memory and traverse the graph using an in memory tree structure. A typical family should have less than a hundred nodes, so would be lightning fast. Updates would always be to a family, so transactions can be used as each family is in a partition.
For a really difficult (related) exercise, implement a graph database (like your family tree) on top of a key-value store (table storage). Think of the problem that twitter or facebook have where you need to see updates (tweets, news) across all relationships (social graph). You then start getting into the interesting (hard) parts of NoSQL.