I would like to know your opinion about using Cassandra to implement a
RBAC-like authentication & authorization model. We have simplified the
central relationship of the general model
(http://en.wikipedia.org/wiki/Role-based_access_control) to:
user —n:m— role —n:m— resource
user(s) and resource(s) are indexed with externally visible identifiers.
These identifiers need to be “re-ownable” (think: mail aliases), too.
The main reason to consider Cassandra is the availability, scalability
and (global) geo-redundancy. This is hard to achieve with a RBDMS.
On the other side, RBAC has many m:n relations. While some
inconsistencies may be acceptable, resource ownership (i.e. role=owner)
must never ever be mixed up.
What do you think? Is such relational model an antipattern for Cassandra
usage? Do you know similar solutions based on Cassandra?
I’m going to go ahead and turn my comments into an answer so they’re in one place.
While you have a large sounding dataset, 100,000,000 accounts to manage if I’m reading that correctly, you also have the constraint of needing to enforce some level of consistency to ensure a specific relationship never falls out of sync. You also have a situation with lots of one-to-many relationships (resource–>users or m:n from above) that you need to enforce. Additionally, it sounds like you will be reading from the dataset more than writing to it. Subsequently, I think an RDBMS with a hot backup would solve your problems better than a custom Cassandra deployment.
The reasons behind this being:
One-To-Many relationships in RDBMS can be expressed as a SQL statement that joins across tables and you only have to store the data once. In Cassandra, depending on the setup, you’d have to store the same information in multiple places to properly reflect the relationships. This would lead to a rather messy and redundant data model.
Consistency — Cassandra is eventually consistent, which is fine when dealing with most kinds of data, IMHO. However, when dealing with something like security, which necessitates consistency at all times, RDBMSes (plural?) have a significant advantage in Transactions to ensure you’re data is always in sync. Something I think is important from a security perspective.
Read Speed — Using indexes in RDBMS will significantly speed up reading out of the DB, so I wouldn’t make this a driving decision factor until you can empirically determine will be a significant bottleneck. Cassandra’s quorum reading model could, in some ways, be slower, as you have to wait on N machines (where N >= 1) to return an answer and correct that answer if it’s out of sync.
Redundancy — An RDBMS with a hot backup (master-master copying) would solve redundancy problems.
Cassandra’s a great tool and I enjoy using, however, in this case, I think your model works better with a RDBMS than it does with Cassandra.
Best of luck!