Let’s say I have a Document and an author collections. I could design it in two ways:
1st way:
documents
{_id:1, title:"document 1", author:"John", age: 34}
{_id:2, title: "document 2", author: "Maria", age:42 }
{_id:3, title: "document 3", author: "John", age: 34}
authors
{_id:1, name:"John", age:34}
{_id:2, name:"Maria", age:42}
2nd way:
documents
{_id:1, title:"document 1", id_author:1}
{_id:2, title: "document 2", id_author: 2}
{_id:3, title: "document 3", id_author: 1}
authors
{_id:1, name:"John", age:34}
{_id:2, name:"Maria", age:42}
1st way is good because I don’t have to simulate a Join when I retrieve a document, I have all the data in the documents collection. But, on the other hand, if I have to change Maria’s age, I have to do it in both collections.
2nd way is the opposite, if I need a document and the age of it’s author I need to query documents first and then authors. But the good thing is that when I have to change Maria’s age I only have to do it in the authors collection.
So, which solution is better? I guess that the more fields you need in authors collection the more likely you’ll be using the second way. But, if I am using the 1st way, is there a single query I can use to update the age of Maria in both collections?
Which is the most used solution?
Update in more than one collection would be a transaction. MongoDB does not support transactions.
Both ways have their own disadvantages.
The first way which is author-data inclusive may be more appropriate in logging situations where its contents won’t be subject to change.
The second way is better when you expect the author’s details to change or grow over time (most cases).
Like already mentioned, embedding the documents in their respective author’s document would be a way to combine the 2 suggestions’ benefits but may lead to problems in the long run.