Disclaimer: by referential Data, i do not mean referential integrity
I am learning nosql and would like to understand how data should be modeled. In a typical relational database for an CMS application, for example, you may have two tables: article and author, where article have an reference to the author.
In nosql system, you may create an article document this way since they are just disguised object graph
{
title: "Learn nosql in 5 minutes",
slug: "nosql_is_easy",
author: {firstName: "Smarty"
lastName: "Pants"
}
{
title: "Death to RDBMS",
slug: "rdbms_sucks",
author: {firstName: "Smarty"
lastName: "Pants"
}
and so on…
Say one day Mr. Smarty Pants decided to change his name to Regular Joe because nosql has become ubiquitous. In such uses case, every article would need to be scanned and the author’s name updated.
So my questions is, how should the data be modeled in nosql to fit the basic use cases for an CMS so that the performance is on par or faster than RDBMS? mongodb, for example, claims CMS as an use-case …
Edit:
Few people have already suggested normalizing the data like:
article
{
title: "Death to RDBMS",
slug: "rdbms_sucks",
author: {id: "10000001"}
}
author
{
name: "Big Brother",
id: "10000001"
}
However, since nosql, by design, lack joins, you would have to use mapreduce-like functions to bring the data together. If this is your suggestion, please comment on the performance of such operation.
Edit 2:
If you think nosql is not suitable solution for any kind of data that requires referential data, please also explain why. This would seem to make the use case for nosql rather limited since any reasonable application would contain relational data.
Edit 3:
I suppose CouchDB is a NoSQL database, if you say so.
But really, we have general-purpose programming languages, and domain-specific languages. Similarly, CouchDB is a domain-specific database.
I use CouchDB a lot but I really don’t care whether it uses SQL or NoSQL. CouchDB is valuable (to me) because the API is 100% HTTP, JSON, and Javascript. You can build web applications with the browser fetching HTML from CouchDB and then querying for data over AJAX. To say this is “not SQL” is an understatement!
Anyway, back to Smarty Pants and Regular Joe. Maybe he has 100,000 documents. What if we just updated them all, the hard way? Well, that is a pretty small amount of Javascript.
Yes, this technique would get you an F in computer science class. However, I like it. I would write this code in Firebug. In my browser. The rename is not atomic and it has no referential integrity. On the other hand, it would probably complete in a couple of seconds and nobody would care.
You might say CouchDB fails at the buzzwords and benchmarks but aces the school of hard knocks.
P.S. The
by_userview is built from map-reduce. In CouchDB, map-reduce is incremental which means it performs like most SQL indexes. All queries finish in a short, predictable (logarithmic) time.