public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
I trying to understand this basic program of map function in map-reduce, what the input paramsters are ? I could not find the definition of the map.
if any one can tell me what this function is doing it will be great
I’m assuming this is used in a Map Reduce job with a TextInputFormat. I’m also assuming that “one” is some sort of globally scoped IntWritable representing the number one.
In that case, the map function is called for every line in the file. The key is a number representing the offset of the start of the line in the file. The value is the textual value of the line.
The map function here is splitting each line on whitespace using a StringTokenizer, and emitting each word and the number one as its output.
Let’s say that your input file looks like this:
Lorem ipsum dolor sit amet
consectetur adipisicing elit
sed do eiusmod tempor incididunt
The mapper will emit the following keys and values:
Lorem, 1
ipsum, 1
dolor, 1
sit, 1
amet, 1
consectetur, 1
adipisicing, 1
elit, 1
sed, 1
do, 1
eiusmod, 1
tempor, 1
incididunt, 1
This is probably part of a word count map reduce job.