Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7009207
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T21:49:59+00:00 2026-05-27T21:49:59+00:00

I am trying to find out where does the output of a Map task

  • 0

I am trying to find out where does the output of a Map task is saved to disk before it can be used by a Reduce task.

Note: – version used is Hadoop 0.20.204 with the new API

For example, when overwriting the map method in the Map class:

public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    StringTokenizer tokenizer = new StringTokenizer(line);
    while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        context.write(word, one);
    }

    // code that starts a new Job.

}

I am interested to find out where does context.write() ends up writing the data. So far i’ve ran into the:

FileOutputFormat.getWorkOutputPath(context);

Which gives me the following location on hdfs:

hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0

When i try to use it as input for another job it gives me the following error:

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://localhost:9000/tmp/outputs/1/_temporary/_attempt_201112221334_0001_m_000000_0

Note: the job is started in the Mapper, so technically, the temporary folder where the Mapper task is writing it’s output exists when the new job begins. Then again, it still says that the input path does not exist.

Any ideas to where the temporary output is written to? Or maybe what is the location where i can find the output of a Map task during a job that has both a Map and a Reduce stage?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T21:50:00+00:00Added an answer on May 27, 2026 at 9:50 pm

    So, I’ve figured out what is really going on.

    The output of the mapper is buffered until it gets to about 80% of its size, and at that point it begins to dump the result to its local disk and continues to admit items into the buffer.

    I wanted to get the intermediate output of the mapper and use it as input for another job, while the mapper was still running. It turns out that this is not possible without heavily modifying the hadoop 0.20.204 deployment. The way the system works is even after all the things that are specified in the map context:

    map .... {
      setup(context)
      .
      .
      cleanup(context)
    }
    

    and the cleanup is called, there is still no dumping to the temporary folder.

    After, the whole Map computation everything eventually gets merged and dumped to disk and becomes the input for the Shuffling and Sorting stages that precede the Reducer.

    So far from all I’ve read and looked at, the temporary folder where the output should be eventually, is the one that I was guessing beforehand.

    FileOutputFormat.getWorkOutputPath(context)
    

    I managed to the what I wanted to do in a different way. Anyway
    any questions there might be about this, let me know.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Im trying to find out a good JavaScript library that can create a nice
I'm trying to find out how much memory my own .Net server process is
I'm trying to find out if there is any way to elevate a specific
I'm trying to find out the 'correct' windows API for finding out the localized
I'm trying to find out whether I should be using business critical logic in
I am trying to find out how to upload a file from a web
I am trying to find out how much memory my application is consuming from
I'm trying to find out the most efficient (best performance) way to check date
I'm trying to find out how much memory my objects take to see how
I am trying to find out how to use usercontrols in asp.net mvc. I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.