There are two arguments, a URI and a Configuration. I assume that the JobConf object that the client is set to should work for Configuration, but what about the URI?
Here is the code I have for the driver:
JobClient client = new JobClient();
JobConf conf = new JobConf(ClickViewSessions.class);
conf.setJobName("ClickViewSessions");
conf.setOutputKeyClass(LongWritable.class);
conf.setOutputValueClass(MinMaxWritable.class);
FileInputFormat.addInputPath(conf, new Path("input"));
FileOutputFormat.setOutputPath(conf, new Path("output"));
conf.setMapperClass(ClickViewSessionsMapper.class);
conf.setReducerClass(ClickViewSessionsReducer.class);
client.setConf(conf);
DistributedFileSystem dfs = new DistributedFileSystem();
try {
dfs.initialize(new URI("blah") /* what goes here??? */, conf);
} catch (Exception e) {
throw new RuntimeException(e.toString());
}
How do I get the URI to supply to the call to initialize above?
The URI is the location of the HDFS that you are running. The default value for the filesystem name should be in conf/core-site.xml. The value of ‘fs.default.name’ should be the URI that you connect to.
If you haven’t yet looked at the tutorial on how to set up a simple single-node system, I would highly recommend it:
http://hadoop.apache.org/common/docs/current/single_node_setup.html