I have been experimenting some problems with the fully distributed version. First of all I’ll tell you my configuration:
I have 4 servers(server_{1,2,3,4}) with 6GB Ram and 2 cores. I installed hadoop in all of them, this is the configuration:
- server_1 is namenode, datanode and secondary namenode
- server_2, server_3, server_4: data nodes
The storage is around 500GB
On the other hand, I have installed hbase, and this is the configuration:
- server_1: master and regionserver
- server_2: zookeeper and regionserver
- server_3 and server_4: regionserver
hbase-site.xml for each server looks like this:
<property>
<name>hbase.zookeeper.quorum</name>
<value>server_2</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/hdfs/zookeeper</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://server_1:54310/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
So I have some problems you may help me:
- Insertion is slow. I have an alphanumeric row with two column families. It takes around 9 minutes to insert 200000 rows, but this is more or less acceptable.
-
I have a map reduce job where I create a configuration:
Configuration config = HBaseConfiguration.create();
and then I ask for config.get("hbase.cluster.distributed"); and it says “false”, what do you think?
For the first question, it is hard to really give a good answer as to why the inserts are slow (or whether they really are even slow). We don’t know how powerful the machines are, what kind of disk or network hardware you have, how big the individual cell values are, how big the column or row keys are, etc. There are just too many variables to decide whether this is slow or fast.
Regarding the distributed setting, you need to make sure that the machine that is launching the MapReduce job also has the same hbase-site.xml. You also need to make sure that the MR Configuration class loads the hbase-site.xml.