I am trying to find out where does the output of a Map task

Question

0

Editorial Team

Asked: May 27, 20262026-05-27T21:49:59+00:00 2026-05-27T21:49:59+00:00

I am trying to find out where does the output of a Map task

0

I am trying to find out where does the output of a Map task is saved to disk before it can be used by a Reduce task.

Note: – version used is Hadoop 0.20.204 with the new API

For example, when overwriting the map method in the Map class:

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        context.write(word, one);
    }

    // code that starts a new Job.

}

I am interested to find out where does context.write() ends up writing the data. So far i’ve ran into the:

FileOutputFormat.getWorkOutputPath(context);

Which gives me the following location on hdfs:

hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0

When i try to use it as input for another job it gives me the following error:

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0

Note: the job is started in the Mapper, so technically, the temporary folder where the Mapper task is writing it’s output exists when the new job begins. Then again, it still says that the input path does not exist.

Any ideas to where the temporary output is written to? Or maybe what is the location where i can find the output of a Map task during a job that has both a Map and a Reduce stage?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T21:50:00+00:00

So, I’ve figured out what is really going on.

The output of the mapper is buffered until it gets to about 80% of its size, and at that point it begins to dump the result to its local disk and continues to admit items into the buffer.

I wanted to get the intermediate output of the mapper and use it as input for another job, while the mapper was still running. It turns out that this is not possible without heavily modifying the hadoop 0.20.204 deployment. The way the system works is even after all the things that are specified in the map context:

map .... {
  setup(context)
  .
  .
  cleanup(context)
}

and the cleanup is called, there is still no dumping to the temporary folder.

After, the whole Map computation everything eventually gets merged and dumped to disk and becomes the input for the Shuffling and Sorting stages that precede the Reducer.

So far from all I’ve read and looked at, the temporary folder where the output should be eventually, is the one that I was guessing beforehand.

FileOutputFormat.getWorkOutputPath(context)

I managed to the what I wanted to do in a different way. Anyway
any questions there might be about this, let me know.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to find out where does the output of a Map task

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply