the HBase-documentation mentions to declare column families at schema definiton time. I don’t understand why?
I know, that column families inherits multiple columns (that are possible to added at runtime) and they are mapped to the storage files. But for what reasons cf’s can not added at runtime?
the HBase-documentation mentions to declare column families at schema definiton time. I don’t understand
Share
Column families are part of the schema of the table. You can add them at runtime with an online schema change. But you wouldn’t add them dynamically the way that you can dynamically create new “columns” in an HBase table, if that’s what you had in mind.
The reason column families are part of the schema and would require a schema change is that they profoundly impact the way the data is stored, both on disk and in memory. Each column family has its own set of HFiles, and its own set of data structures in memory of the RegionServer. It would be pretty expensive to dynamically create or start using new column families.
Column families are only needed when you need to configure differently various parts of a table (for instance you want some columns to have a TTL and others to not expire), or when you want to control the locality of accesses (things accessed together should better be in the same column family if you want good performance, as the cost of operations grows linearly with the number of column families). So, again, because of those specialized reasons, it doesn’t make sense to dynamically add new column families at runtime the way you would add regular “columns” within a family.