To follow up on my question on modeling relational data with nosql, I have read several articles on the subject:
Nosql doesn’t mean non-relational
They seem to suggest that nosql can handle normalized, relational data.
So let’s continue with the example I had before, a CMS system that have two types of data: article and authors, where article has an reference (by ID) to author.
Below are the operations the system needs to support:
- Fetch a article by id along with the author
- Fetch all articles by particular author
- Find the first 10 article(s) with the author(s) sorted by creation date
I would like to understand the performance of these operation when compare to the same operation if the same data were stored on RDBMS. In particular, please specify if the operation uses MapReduce, require multple trips to the nosql store (Links), or pre-join
I would like to limit to discussion to document-based nosql solution like mongodb, couchdb, and riak.
Edit 1:
Spring-data project is avalible on Riak and Mongodb
For MongoDB, you wouldn’t use embedded documents for the author record. So the pre-join is out, it’s multiple trips to the DB. However, you can cache the author and only need to make that second trip once for each record. The queries you indicated are pretty trivial in MongoDB.
If you are using an ORM/ODM to manage your entities within your application, this would transparent. It would be two trips to the db though. They should be fast responses though, two hits shouldn’t be noticeable at all.
Finding articles by a given author is just reverse…
So again, two queries but the single author fetch should be fast and can easily be cached.
Lastly, again, two queries but just a tiny bit more complex. You can run these in a mongo shell to see what the results might be like.
I’m not sure this is worth writing a map reduce to complete. A couple quick round trips might have a little more latency but the mongo protocol is pretty fast. I wouldn’t be overly worried about it.
Lastly, real performance implications of doing it this way… Since ideally you’d only be querying on indexed fields in the document, it should be pretty quick. The only additional step is a second round trip to get the other documents, depending how your application and db is structures, this is likely not a big deal at all. You can tell mongo to only profile queries that take over a given threshold (100 or 200ms by default when turned on), so you can keep an eye on what’s taking time for your program as data grows.
The one befit you have here that an RDMS does not offer is much easier breaking apart of data. What happens when you expand your application beyond CMS to support other things but uses the same authentication store? It just happens to be a completely separate DB now, that’s shared across many applications. It’s much simpler to perform these queries across dbs – with RDMS stores it’s a complex process.
I hope this helps you in your NoSQL discovery!