I am trying to setup all projects from Apache Hadoop stack in one cluster. What is the sequence of setting up apache hadoop ecosystem frameworks.
E.g: Hadoop, HBase, …
And if you tested with some specific set of steps can you tell what kind of problems can be faced during deployment. Main frameworks for deployment (Hadoop, HBase, Pig, Hive, HCatalog, Mahout, Giraph, ZooKeeper, Oozie, avro, sqoop, mrunit, crunch, please add if I miss something)
I am trying to setup all projects from Apache Hadoop stack in one cluster.
Share
There are different orders since not all listed products are dependent.
In a nutshell:
1. Hadoop (HDFS, MapReduce)
2. Pig, Hive, sqoop, Oozie
2. Zookeeper (needed for HBase)
3. HBase
I am not 100% sure abou Mahout, MRUnit dependencies, but I think that Hadoop only if needed.
Avro is not directly dependent on hadoop – it is serialization library.