Possible Duplicate:
What is the most network efficient method of fetching a set of rows in HBase?
Say that I have a set of row keys (as a Set). What is the most network efficient method of fetching a particular column family for all rows except the ones in this set ?
If the set is small compared to the total rows then just get all and filter in the client code. The hbase scanner is efficient and has configurable result caching buffer to reduce RPC calls.
You can filter the set returned to the client however this set will be sent to all nodes so the network traffic for this data will be multiplied across the number of nodes potentially holding the data.
You can add a filter to the scan for each key.