I’ve been using mySQL for an app for some time, and the more data I collect, the slower it gets. So I have been looking into NOSQL options. One of the things I have in mySQL is a View created from a bunch of joins. The app shows all the important info in a grid, and the user can select ranges, do searches, etc. On this data set. Standard Query stuff.
Looking at Cassandra everything is already sorted based on the parameters I provide in my storage-conf.xml. So I would have a certain string as my key in the SuperColumn, and keep a bunch of the data in Columns below that. But I can only sort by one Column, and I can’t do any real searching within the columns without pulling all the SuperColumns, and looping through the data, right?
I don’t want to duplicate data across different ColumnFamilies, so I want to make sure Cassandra is appropriate for me. In Facebook, Digg, Twitter, they have plenty of searching functions, so maybe I am just not seeing the solution.
Is there a way with Cassandra for me to search for or filter specific data values in a SuperColumn, or its associated Column(s)? If not, is there another NOSQL option?
In the example below, it seems I can only query for phatduckk, friend1,John, etc. But what if I wanted to find anyone in the ColumnFamily that lived in city == “Beverley Hills”? Can it be done without returning all records? If so, could I do a search for city == “Beverley Hills” AND state == “CA”? It doesn’t seem like I can do either, but I want to make sure and see what my options are.
AddressBook = { // this is a ColumnFamily of type Super
phatduckk: { // this is the key to this row inside the Super CF
friend1: {street: "8th street", zip: "90210", city: "Beverley Hills", state: "CA"},
John: {street: "Howard street", zip: "94404", city: "FC", state: "CA"},
Kim: {street: "X street", zip: "87876", city: "Balls", state: "VA"},
Tod: {street: "Jerry street", zip: "54556", city: "Cartoon", state: "CO"},
Bob: {street: "Q Blvd", zip: "24252", city: "Nowhere", state: "MN"},
}, // end row
ieure: {
joey: {street: "A ave", zip: "55485", city: "Hell", state: "NV"},
William: {street: "Armpit Dr", zip: "93301", city: "Bakersfield", state: "CA"},
},
}
You cannot perform those kind of operations in Cassandra. There is a certain kinds of selection predicates that can be set on column-keys but nothing on the value that they hold. Look at the API and check get_slice/get_superslice and get_range query types. Again, all of this is concerning the keys in the ColumnFamily or SuperColumnFamily not the values.
If you want the kind of functionality that you have described then your best bet is a SQL database. Build proper indexes on your tables, especially on the columns that are most queried and you will see a big difference in the query performance. Hope this helps.