What are the advantages and drawbacks to storing full serialized objects in Cassandra vs. storing only the more primitive types within the object as columns?
It seems to me that you lose flexibility but gain simplicity if you’re storing the entire object within one column. Wouldn’t it be impossible to use a native Cassandra secondary index on the column if a full object was stored and you wanted to index on one of it’s members? (though I suppose here you would create your own index with an additional column family using that member value as the row key)
Thanks for any info you can provide. I’m still wrapping my brain around schema setup in this type of format.
Both advantages and disadvantages of full object serialization seem pretty obvious:
And drawbacks:
So, for example, it’s a good idea to use full object serialization when storing pageview events – compactness saves a lot of disk space, and these events are never modified after writing. Even if schema changes (i.e., new field is added), there’s no need to touch old data, just write new events in new format and use ProtoBuf to read both old and new records correctly.
On the other hand, it’s a bad idea to use it when storing objects like ‘picture with caption and tags’ – something having large binary data and small changeable fields.