My map function has to read a file for every input. That file doesn’t change at all, it is only for reading. Distributed cache might help me a lot i think, but i cant find a way to use it. The public void configure(JobConf conf) function that i need to override, i think is deprecated. Well JobConf is deprecated for sure. All the DistributedCache tutorials use the deprecated way to. What can i do? Is there another configure function that i can override??
These are the very first lines of my map function:
Configuration conf = new Configuration(); //load the MFile
FileSystem fs = FileSystem.get(conf);
Path inFile = new Path("planet/MFile");
FSDataInputStream in = fs.open(inFile);
DecisionTree dtree=new DecisionTree().loadTree(in);
I want to cache that MFile so that my map function doesn’t need to look it over and over again
Jobconfwas deprecated in0.20.x but in1.0.0it is not! 🙂 (as of writing this)To your question, there are two ways to run map reduce jobs in java, one is by using (
extending) classes inorg.apache.hadoop.mapreducepackage and other is byimplementingclasses inorg.apache.hadoop.mapredpackage (or the other way round ).Not sure which one you are using, if you don’t have a
configuremethod to override, you will get asetupmethod to override.This is similar to configure and should help you.
You get a
setupmethod tooverridewhen youextend Mapper classinorg.apache.hadoop.mapreducepackage.