I am looking at Hbase for a schema-less user action store (“user x viewed y”, “user x viewed y from page z”).
HBase seems a great choice as it
- stores data in schema-less format, as well as,
- can support complex queries like an RDBMS
Yes, performance considerations come later.
Question 1: What features of an RDBMS will I miss if I use HBase?
If I used an RDBMS, I would use features like SUM, WHERE, GROUP BY, ORDER, BETWEEN, comparisons and (inner) joins; and upto 2NF normalization. Nothing more complex.
Question 2: Apart from the querying, what about:
- altering schema
- single step backup of the entire cluster
- master-slave replication and clustering (sorry, this may be more of a Hadoop question, but HBase overview treats it separately)
that are straightforward on an RDBMS?
HBase is very far from an RDBMS. Hive lets your write Map/Reduce jobs with SQL-ish syntax but these are still map/reduce jobs.
From your expectations it sounds you should look at sharding solutions over regular RDBMS eg. ScaleDB or ScaleBase