I am working on a Java program to interface with an already running hadoop cluster. The program has HADOOP_HOME passed to it as an environment variable.
Based on this value, I need to load all of the necessary configuration resources before I start interacting with HDFS/MapReduce. The files that I think I need are based on the apache documentation. My current solution looks like this:
final String HADOOP_HOME = System.getEnv("HADOOP_HOME");
Configuration conf = new Configuration();
conf.addResource(new Path(HADOOP_HOME, "src/core/core-default.xml"));
conf.addResource(new Path(HADOOP_HOME, "src/hdfs/hdfs-default.xml"));
conf.addResource(new Path(HADOOP_HOME, "src/mapred/mapred-default.xml"));
conf.addResource(new Path(HADOOP_HOME, "conf/core-site.xml"));
conf.addResource(new Path(HADOOP_HOME, "conf/hdfs-site.xml"));
conf.addResource(new Path(HADOOP_HOME, "conf/mapred-site.xml"));
FileSystem hdfs = new FileSystem(conf);
Is there a cleaner way to do this? Hopefully a way that does not involve setting each resource explicitly?
You run your jar using
hadoop jar <your-jar>. This sets up everything automatically.