Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7698093
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T22:09:43+00:00 2026-05-31T22:09:43+00:00

I tried implementing a sorting program in mapreduce such that I have just the

  • 0

I tried implementing a sorting program in mapreduce such that I have just the sorted output after the map phase where the sorting is done by the hadoop framework internally. For it, I tried to set the number of reduce tasks to zero as there wasnt any reduction required. Now when I tried executing the program, I kept on getting checksum
error.. I am not able to figure out what’s to be done next. Surely it’s possible to run the program on my netbook as the sorting does work fine when I have set the reduce tasks to one.. Please help!!


For your reference, here’s the entire code that I have written to perform the sorting:

    /*
 * To change this template, choose Tools | Templates
 * and open the template in the editor.
 */

/**
 *
 * @author root
 */
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.io.*;
import java.util.*;
import java.io.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.util.*;
import org.apache.hadoop.conf.*;


public class word extends Configured implements Tool
{
    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
    {
        private static IntWritable one=new IntWritable(1);
        private Text word=new Text();

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter report) throws IOException
        {
            String line=value.toString();
            StringTokenizer token=new StringTokenizer(line," .,?!");
            String wordToken=null;

            while(token.hasMoreTokens())
            {
                wordToken=token.nextToken();
                output.collect(new Text(wordToken), one);

            }
        }

    }

    public int run(String args[])throws Exception
    {
        //Configuration conf=getConf();
        JobConf job=new JobConf(word.class);
        job.setInputFormat(TextInputFormat.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        job.setOutputFormat(TextOutputFormat.class);
        job.setMapperClass(Map.class);
        job.setNumReduceTasks(0);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        JobClient.runJob(job);

        return 0;
    }

    public static void main(String args[])throws Exception
    {
        int exitCode=ToolRunner.run(new word(), args);
        System.exit(exitCode);

    }
}

Here is the checksum error I got on executing this program:

12/03/25 10:26:42 WARN conf.Configuration: DEPRECATED: hadoop-site.xml found in the classpath. Usage of hadoop-site.xml is deprecated. Instead use core-site.xml, mapred-site.xml and hdfs-site.xml to override properties of core-default.xml, mapred-default.xml and hdfs-default.xml respectively
12/03/25 10:26:43 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/25 10:26:43 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/25 10:26:44 INFO mapred.FileInputFormat: Total input paths to process : 1
12/03/25 10:26:45 INFO mapred.JobClient: Running job: job_local_0001
12/03/25 10:26:45 INFO mapred.FileInputFormat: Total input paths to process : 1
12/03/25 10:26:45 INFO mapred.MapTask: numReduceTasks: 0
12/03/25 10:26:45 INFO fs.FSInputChecker: Found checksum error: b[0, 26]=610a630a620a640a650a740a790a780a730a670a7a0a680a730a
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/root/NetBeansProjects/projectAll/output/regionMulti/individual/part-00000 at 0
        at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/25 10:26:45 WARN mapred.LocalJobRunner: job_local_0001
org.apache.hadoop.fs.ChecksumException: Checksum error: file:/root/NetBeansProjects/projectAll/output/regionMulti/individual/part-00000 at 0
        at org.apache.hadoop.fs.FSInputChecker.verifySum(FSInputChecker.java:277)
        at org.apache.hadoop.fs.FSInputChecker.readChecksumChunk(FSInputChecker.java:241)
        at org.apache.hadoop.fs.FSInputChecker.read1(FSInputChecker.java:189)
        at org.apache.hadoop.fs.FSInputChecker.read(FSInputChecker.java:158)
        at java.io.DataInputStream.read(DataInputStream.java:100)
        at org.apache.hadoop.util.LineReader.readLine(LineReader.java:134)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:136)
        at org.apache.hadoop.mapred.LineRecordReader.next(LineRecordReader.java:40)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:192)
        at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:176)
        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/25 10:26:46 INFO mapred.JobClient:  map 0% reduce 0%
12/03/25 10:26:46 INFO mapred.JobClient: Job complete: job_local_0001
12/03/25 10:26:46 INFO mapred.JobClient: Counters: 0
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1252)
        at sortLog.run(sortLog.java:59)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at sortLog.main(sortLog.java:66)
Java Result: 1
BUILD SUCCESSFUL (total time: 4 seconds)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T22:09:45+00:00Added an answer on May 31, 2026 at 10:09 pm

    So have a look at the org.apache.hadoop.mapred.MapTask arround line 600 in 0.20.2.

      // get an output object
      if (job.getNumReduceTasks() == 0) {
         output =
           new NewDirectOutputCollector(taskContext, job, umbilical, reporter);
      } else {
        output = new NewOutputCollector(taskContext, job, umbilical, reporter);
      }
    

    If you set the number of reduce tasks to zero it will be directly written to the output. The NewOutputCollector will use the so called MapOutputBuffer which does the spilling, sorting, combining and partitioning.

    So when you set no reducer, no sort takes places, even if Tom White states this in the definitive guide.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have just tried implementing a class where numerous length/count properties, etc. are uint
I have found some solutions to this error and tried implementing them but none
How to set up auto mapping to map System.Collections.Generics.ISet<T> correctly? I tried implementing IHasManyConvention
I have a problem with Gridview sorting that is similar to others but I'm
Just starting my way with Roboguice for android. Tried implementing this simple context injection
EDIT (2011-07-23) Have gotten some very helpful answers, both of which I've tried implementing.
I tried implementing the children function in ExtJS using select(~ *), it just didnt
I just took a look at jqgrid . Has anyone tried implementing this plugin,
I'm new to WPF....I have tried implementing controltemplate and datatriggers both for textbox..I want
OK, so I tried implementing this, http://ipaddressextensions.codeplex.com/ . It is displaying the output as:-

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.