Been playing around with Cassandra and I am trying to evaluate what would be the best data model for storing things like views or hits for unique page id’s? Would it best to have a single column family per pageid, or 1 Super-column (logs) with columns pageid? Each page has a unique id, then would like to store date and some other metrics on the view.
I am just not sure which solution handles better scalability, lots of column family OR 1 giant super-column?
page-92838 { date:sept 2, browser:IE }
page-22939 { date:sept 2, browser:IE5 }
OR
logs {
page-92838 {
date:sept 2,
browser:IE
}
page-22939 {
date:sept 2,
browser:IE5
}
}
And secondly, how to handle lots of different date: entries for page-92838?
With cassandra, it is best to start with what queries you need to do, and model your schema to support those queries.
Assuming you want to query hits on a page, and hits by browser, you can have a counter column for each page like,
If you need to do time based queries, look at how twitters rainbird denormalizes as it writes to cassandra.