I have process, which updates single column frequently. On the end, before compaction is executed, value of my column is stored in multiple SSTables.
Memtables are sorted, and are synchronously flushed to disk, in this case I would also assume, that SSTables on disk are sorted. Based on this Cassandra needs to look into single SSTable (with positive bloom filter) to find latest value of my column, is that right?
I am bit confused, because I’ve read somewhere, that frequently changing value of single column leads to unperformant rads, but my understanding is, that only compaction job will have more work to do, reading should be unaffected.
https://issues.apache.org/jira/browse/CASSANDRA-2498
So the answer is: Cassandra versions < 1.0 would go over each SSTable.
Starting form 1.0 only data from recent SSTable will be read, since each SSTable additionally to bloom filter contains also latest update time for particular column.
This results in next question. Bloom filter is in RAM, what about “last update time” (SSTable metadata)? Is disk seek required to access it? In this case Cassandra would still need disk seek for each SSTable which contains column value