I have been trying out Cassandra and need some help in understanding a few issues. I am new to cassandra and I am not sure of translating a MySQL DB to Cassandra would lead me to pitfalls which due to say inexperience or limited knowledge of cassandra. So I hope I can get the useful information from experienced cassandra users/developers.
Below are sample keyspaces I have created. I would like to know any sort of drawback in the design if someone from their experience can point out.
create keyspace Students with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use Students;
create column family StudentID with column_type = 'Super' and comparator = 'UTF8Type' and subcomparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and column_metadata =
[{column_name : 'First Name', validation_class : UTF8Type},
{column_name : 'Last Name', validation_class : UTF8Type},
{column_name : 'Subjects', validation_class : UTF8Type},
{column_name : 'Class', validation_class : UTF8Type}];
set StudentID[utf8('1968')]['00001']['First Name'] = 'Mark';
set StudentID[utf8('1968')]['00001']['Last Name'] = 'Myers';
set StudentID[utf8('1968')]['00001']['Subjects'] = 'Maths, Chemistry';
set StudentID[utf8('1968')]['00001']['Class'] = '10th grade';
create keyspace Teachers with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use Teachers;
create column family TeacherID with column_type = 'Super' and comparator = 'UTF8Type' and subcomparator = 'UTF8Type' and default_validation_class = 'UTF8Type' and column_metadata =
[{column_name : 'First Name', validation_class : UTF8Type},
{column_name : 'Last Name', validation_class : UTF8Type},
{column_name : 'Subjects', validation_class : UTF8Type},
{column_name : 'Class', validation_class : UTF8Type}];
set TeacherID[utf8('777')]['234-333']['First Name'] = 'Mark';
set TeacherID[utf8('777')]['234-333']['Last Name'] = 'Myers';
set TeacherID[utf8('777')]['234-333']['Subjects'] = 'Maths, Chemistry,physics';
set TeacherID[utf8('777')]['234-333']['Class'] = '10th grade, 11th grade, 9th grade';
create keyspace Subjects with placement_strategy = 'org.apache.cassandra.locator.SimpleStrategy' and strategy_options = {replication_factor:1};
use Subjects;
create column family SubjectNames with default_validation_class = 'UTF8Type' and comparator = 'UTF8Type' and column_metadata =
[{column_name : 'Names1', validation_class : UTF8Type}];
set SubjectNames[utf8('Current')]['Name1']= 'maths';
set SubjectNames[utf8('Current')]['Name2']= 'physics';
set SubjectNames[utf8('Current')]['Name3']= 'chemistry';
set SubjectNames[utf8('Current')]['Name4']= 'CS';
Three keyspaces – Students, Teachers and Subjects.
I would definitely need some relationship amongst these keyspaces and would also require querying data.
e.g.
- I would query for students with a certain subject and/or class
- A teacher with certain class
- List all subjects taken up by a certain student from a certain class.
From what I know, I would definitely need to create secondary indices to make the queries work. That is, retrieving data on certain clauses.
What I know if I am correct
- We donot have a ‘like’ clause in cassandra
- For each value for a column (the very last key-value pair), the value must be broken up. That is to individual words. Say, I want to get a list of subjects so each subject must reside in a distinct column associated to it. I cannot query column values that are like “subjectA,subjectB” instead I would break it up to SubjectA and SubjectB and put them in different columns.
Below are the keyspaces.



First off, is Cassandra the right tool for the job you’re trying to do? Cassandra does an excellent job dealing with distributed, loosely coupled, data sets that need high speed reading and writing capability but it starts getting clunky when you want to enforce a relational model on top of it, hence my question. If you have a highly relational dataset, like the example you showed here, where the emphasis rests on determining relationships between information, then MySQL would be a better tool than Cassandra.
I think you are confusing keyspaces as a 1-1 mapping with MySQL tables. A keyspace would more directly correspond to a database instead of a table in the database. Firstly, you might want to redesign your keyspace setup to put everything together, something like this:
Repeat for your other two column families–not sure if you need supers or not.
To do cross cutting retrievals, you would need to make a column family such as:
to build a relationship column family between a specific class and all the people who belong to it.
Breaking Up
Yes, you would need to break them up by subject and place them into their own column families. Caveat to this being you can use secondary indices (as of Cassandra .7) that allow you to perform more equality type queries such as:
Also refer to this document regarding the use of Secondary Indices. Relevant quote being,
If you haven’t already seen it, the DataStax website will answer lots of your Cassandra questions, I highly recommend browsing through it if you’re going to use Cassandra extensively.
In short, your two options are to decouple the items and create column families for each relationship you want to maintain OR to possibly use secondary indices depending on how you separate your data. I personally prefer the former method–despite the boilerplate–because I think it scales out better.