Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7593743
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T21:14:44+00:00 2026-05-30T21:14:44+00:00

i have a SequenceFile with 1264 records. each key is unique for each record.

  • 0

i have a SequenceFile with 1264 records. each key is unique for each record. my problem is that my mapper seems to be reading this file twice or it is being read twice. for sanity checking, i have written a little utility class to read the SequenceFile and indeed, there are only 1264 records (i.e. SequenceFile.Reader).

in my reducer, i should only get 1 record per Iterable, however, when i iterate over the iterable (Iterator), i get 2 records per Key (always 2 per key, and not 1 or 3 or something else per Key).

the logging output of my Job is below. i am not sure why, but why is it that the “Total input paths to process” is 2? when i run my Job, i tried -Dmapred.input.dir=/data and also -Dmapred.input.dir=/data/part-r-00000, but still, the total paths to process is 2.

any ideas is appreciated.

12/03/01 05:28:30 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/01 05:28:30 INFO input.FileInputFormat: Total input paths to process : 2
12/03/01 05:28:31 INFO mapred.JobClient: Running job: job_local_0001
12/03/01 05:28:31 INFO input.FileInputFormat: Total input paths to process : 2
12/03/01 05:28:31 INFO mapred.MapTask: io.sort.mb = 100
12/03/01 05:28:31 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/01 05:28:31 INFO mapred.MapTask: record buffer = 262144/327680
12/03/01 05:28:31 INFO mapred.MapTask: Starting flush of map output
12/03/01 05:28:31 INFO mapred.MapTask: Finished spill 0
12/03/01 05:28:31 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000000_0' done.
12/03/01 05:28:31 INFO mapred.MapTask: io.sort.mb = 100
12/03/01 05:28:31 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/01 05:28:31 INFO mapred.MapTask: record buffer = 262144/327680
12/03/01 05:28:31 INFO mapred.MapTask: Starting flush of map output
12/03/01 05:28:31 INFO mapred.MapTask: Finished spill 0
12/03/01 05:28:31 INFO mapred.TaskRunner: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:31 INFO mapred.TaskRunner: Task 'attempt_local_0001_m_000001_0' done.
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:31 INFO mapred.Merger: Merging 2 sorted segments
12/03/01 05:28:31 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 307310 bytes
12/03/01 05:28:31 INFO mapred.LocalJobRunner:
12/03/01 05:28:32 INFO mapred.TaskRunner: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
12/03/01 05:28:32 INFO mapred.LocalJobRunner:
12/03/01 05:28:32 INFO mapred.TaskRunner: Task attempt_local_0001_r_000000_0 is allowed to commit now
12/03/01 05:28:32 INFO mapred.JobClient:  map 100% reduce 0%
12/03/01 05:28:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local_0001_r_000000_0' to results
12/03/01 05:28:32 INFO mapred.LocalJobRunner: reduce > reduce
12/03/01 05:28:32 INFO mapred.TaskRunner: Task 'attempt_local_0001_r_000000_0' done.
12/03/01 05:28:33 INFO mapred.JobClient:  map 100% reduce 100%
12/03/01 05:28:33 INFO mapred.JobClient: Job complete: job_local_0001
12/03/01 05:28:33 INFO mapred.JobClient: Counters: 12
12/03/01 05:28:33 INFO mapred.JobClient:   FileSystemCounters
12/03/01 05:28:33 INFO mapred.JobClient:     FILE_BYTES_READ=1320214
12/03/01 05:28:33 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=1275041
12/03/01 05:28:33 INFO mapred.JobClient:   Map-Reduce Framework
12/03/01 05:28:33 INFO mapred.JobClient:     Reduce input groups=1264
12/03/01 05:28:33 INFO mapred.JobClient:     Combine output records=0
12/03/01 05:28:33 INFO mapred.JobClient:     Map input records=2528
12/03/01 05:28:33 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/03/01 05:28:33 INFO mapred.JobClient:     Reduce output records=2528
12/03/01 05:28:33 INFO mapred.JobClient:     Spilled Records=5056
12/03/01 05:28:33 INFO mapred.JobClient:     Map output bytes=301472
12/03/01 05:28:33 INFO mapred.JobClient:     Combine input records=0
12/03/01 05:28:33 INFO mapred.JobClient:     Map output records=2528
12/03/01 05:28:33 INFO mapred.JobClient:     Reduce input records=2528

My mapper class is very simple. It reads in a text file. To each line, it appends “m” to the line.

public class MyMapper extends Mapper<LongWritable, Text, LongWritable, Text> {

 private static final Log _log = LogFactory.getLog(MyMapper.class);

 @Override
 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
  String s = (new StringBuilder()).append(value.toString()).append("m").toString();
  context.write(key, new Text(s));
  _log.debug(key.toString() + " => " + s);
 }
}

My reducer class is also very simple. It simply appends “r” to the line.

public class MyReducer extends Reducer<LongWritable, Text, LongWritable, Text> {

private static final Log _log = LogFactory.getLog(MyReducer.class);

@Override
public void reduce(LongWritable key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
 for(Iterator<Text> it = values.iterator(); it.hasNext();) {
  Text txt = it.next();
  String s = (new StringBuilder()).append(txt.toString()).append("r").toString();
  context.write(key, new Text(s));
  _log.debug(key.toString() + " => " + s);
  }
 }
}

my Job class is as follows.

public class MyJob extends Configured implements Tool {

public static void main(String[] args) throws Exception {
 ToolRunner.run(new Configuration(), new MyJob(), args);
}

@Override
public int run(String[] args) throws Exception {
 Configuration conf = getConf();
 Path input = new Path(conf.get("mapred.input.dir"));
 Path output = new Path(conf.get("mapred.output.dir"));

 System.out.println("input = " + input);
 System.out.println("output = " + output);

 Job job = new Job(conf, "dummy job");
 job.setMapOutputKeyClass(LongWritable.class);
 job.setMapOutputValueClass(Text.class);
 job.setOutputKeyClass(LongWritable.class);
 job.setOutputValueClass(Text.class);

 job.setMapperClass(MyMapper.class);
 job.setReducerClass(MyReducer.class);

 FileInputFormat.addInputPath(job, input);
 FileOutputFormat.setOutputPath(job, output);

 job.setJarByClass(MyJob.class);

 return job.waitForCompletion(true) ? 0 : 1;
 }
}

my input data looks like the following.

T, T
T, T
T, T
F, F
F, F
F, F
F, F
T, F
F, T

after running my Job, i get an output like the following.

0   T, Tmr
0   T, Tmr
6   T, Tmr
6   T, Tmr
12  T, Tmr
12  T, Tmr
18  F, Fmr
18  F, Fmr
24  F, Fmr
24  F, Fmr
30  F, Fmr
30  F, Fmr
36  F, Fmr
36  F, Fmr
42  T, Fmr
42  T, Fmr
48  F, Tmr
48  F, Tmr

did i do something wrong with setting up my Job? i tried the following way to run my Job, and in this approach, the file only gets read once. why is this? the System.out.println(inpath) and System.out.println(outpath) values are identical! help?

public class MyJob2 {

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
    if (otherArgs.length != 2) {
      System.err.println("Usage: MyJob2 <in> <out>");
      System.exit(2);
    }

    String sInput = args[0];
    String sOutput = args[1];

    Path input = new Path(sInput);
    Path output = new Path(sOutput);

    System.out.println("input = " + input);
    System.out.println("output = " + output);

    Job job = new Job(conf, "dummy job");
    job.setMapOutputKeyClass(LongWritable.class);
    job.setMapOutputValueClass(Text.class);
    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(Text.class);

    job.setMapperClass(MyMapper.class);
    job.setReducerClass(MyReducer.class);

    FileInputFormat.addInputPath(job, input);
    FileOutputFormat.setOutputPath(job, output);

    job.setJarByClass(MyJob2.class);

    int result = job.waitForCompletion(true) ? 0 : 1;
    System.exit(result);
 }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T21:14:45+00:00Added an answer on May 30, 2026 at 9:14 pm

    i got help from the hadoop mailing list. my problem was with the line below.

    FileInputFormat.addInputPath(job, input);
    

    this line simply appends input back to config. after commenting this line out, the input file is read only once now. in fact, i also commented out the other line,

    FileOutputFormat.setOutputPath(job, output);
    

    and everything still works.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Have a problem that seems easy on paper but i'm having a big problem
hadoop writes in a SequenceFile in in key-value pair(record) format. Consider we have a
Have a look at one of my websites: moskah.com The problem is that it
I have a fairly large text file that I would like to convert into
I have a file containing vectors of data, where each row contains a comma-separated
Have some dates in my local Oracle 11g database that are in this format:
Have youI know that the problem has been many times discussed in the web.
Have you managed to get Aptana Studio debugging to work? I tried following this,
Have following listener for keyboard ArrowDown event(it's key code is 40 ): window.onload =
Have converted devise new session from erb to Haml but doens't work, this is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.