Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8010481
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T18:44:51+00:00 2026-06-04T18:44:51+00:00

as explained in the title, when i execute my Hadoop Program (and debug it

  • 0

as explained in the title, when i execute my Hadoop Program (and debug it in local mode) the following happens:

1. All 10 csv-lines in my test data are handled correctly in the Mapper, the Partitioner and the RawComperator(OutputKeyComparatorClass) that is called after the map-step. But the OutputValueGroupingComparatorClass’s and the ReduceClass’s functions do NOT get executed afterwards.

2. My application looks like the following. (due to space constraints i omit the implementation of the classes i used as configuration parameters, til somebody has an idea, that involves them):

public class RetweetApplication {

    public static int DEBUG = 1;
    static String INPUT = "/home/ema/INPUT-H";
    static String OUTPUT = "/home/ema/OUTPUT-H "+ (new Date()).toString();

    public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(RetweetApplication.class);


    if(DEBUG > 0){
        conf.set("mapred.job.tracker", "local");
        conf.set("fs.default.name", "file:///");
        conf.set("dfs.replication", "1");
    }


    FileInputFormat.setInputPaths(conf, new Path(INPUT));   
    FileOutputFormat.setOutputPath(conf, new Path(OUTPUT));


    //conf.setOutputKeyClass(Text.class);
    //conf.setOutputValueClass(Text.class);
    conf.setMapOutputKeyClass(Text.class);
    conf.setMapOutputValueClass(Text.class);

    conf.setMapperClass(RetweetMapper.class);
    conf.setPartitionerClass(TweetPartitioner.class);
    conf.setOutputKeyComparatorClass(TwitterValueGroupingComparator.class);
    conf.setOutputValueGroupingComparator(TwitterKeyGroupingComparator.class);
    conf.setReducerClass(RetweetReducer.class);

    conf.setOutputFormat(TextOutputFormat.class);

    client.setConf(conf);
    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }
    }
}

3. I get the following console output(sorry for the format, but somehow this log doesnt get formatted correctly):

12/05/22 03:51:05 INFO mapred.MapTask: io.sort.mb = 100 12/05/22
03:51:05 INFO mapred.MapTask: data buffer = 79691776/99614720

12/05/22 03:51:05 INFO mapred.MapTask: record buffer = 262144/327680

12/05/22 03:51:06 INFO mapred.JobClient: map 0% reduce 0%

12/05/22 03:51:11 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:12 INFO
mapred.JobClient: map 39% reduce 0%

12/05/22 03:51:14 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967 12/05/22 03:51:15 INFO
mapred.MapTask: Starting flush of map output

12/05/22 03:51:15 INFO mapred.MapTask: Finished spill 0

12/05/22 03:51:15 INFO mapred.Task: Task:attempt_local_0001_m_000000_0
is done. And is in the process of commiting

12/05/22 03:51:15 INFO mapred.JobClient: map 79% reduce 0%

12/05/22 03:51:17 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967

12/05/22 03:51:17 INFO mapred.LocalJobRunner:
file:/home/ema/INPUT-H/tweets:0+967

12/05/22 03:51:17 INFO mapred.Task: Task
‘attempt_local_0001_m_000000_0’ done.

12/05/22 03:51:17 INFO mapred.Task: Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@35eed0

12/05/22 03:51:17 INFO mapred.ReduceTask: ShuffleRamManager:
MemoryLimit=709551680, MaxSingleShuffleLimit=177387920

12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread started: Thread for merging
on-disk files

12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread waiting: Thread for merging
on-disk files

12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread started: Thread for merging in
memory files

12/05/22 03:51:17 INFO mapred.ReduceTask: attempt_local_0001_r_000000_0 Need another 1 map output(s) where 0 is
already in progress 12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Scheduled 0 outputs (0 slow hosts and0
dup hosts)

12/05/22 03:51:17 INFO mapred.ReduceTask:
attempt_local_0001_r_000000_0 Thread started: Thread for polling Map
Completion Events

12/05/22 03:51:18 INFO mapred.JobClient: map 100% reduce 0% 12/05/22 03:51:23 INFO mapred.LocalJobRunner: reduce > copy >

The bold marked lines repeat endlessly from this point.

4. Alot of open processes are active after the mapper saw every tuple:

RetweetApplication (1) [Remote Java Application]    
    OpenJDK Client VM[localhost:5002]   
        Thread [main] (Running) 
        Thread [Thread-2] (Running) 
        Daemon Thread [communication thread] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.0] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.1] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.2] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.4] (Running)  
        Thread [MapOutputCopier attempt_local_0001_r_000000_0.3] (Running)  
        Daemon Thread [Thread for merging on-disk files] (Running)  
        Daemon Thread [Thread for merging in memory files] (Running)    
        Daemon Thread [Thread for polling Map Completion Events] (Running)  

Is there any reason, why Hadoop expects more output from the mapper (see the bold marked lines in the log) than i put into the input directory? As already mentioned, i debugged that ALL inputs are properly processed in the mapper/partitioner/etc.

UPDATE
With the help of Chris (see comments) i found out, that my program was NOT started in localMode as i expected it: the isLocal variable in the ReduceTask class is set to false, though it should be true.

To me it is absolutely unclear why this happens, since the 3 options that have to be set to enable the standalone mode were set the right way. Surprisingly: tho the local setting was ignored, the “read from normal disc” setting wasnt, which is very strange imho, because i thought local mode and the file:/// protocol are coupled.

During debugging ReduceTask i set the isLocal variable to true by evaluating isLocal=true in my debug view and then tried to execute the rest of the program. It did not work out and this is the stacktrace:

12/05/22 14:28:28 INFO mapred.LocalJobRunner: 
12/05/22 14:28:28 INFO mapred.Merger: Merging 1 sorted segments
12/05/22 14:28:28 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1956 bytes
12/05/22 14:28:28 INFO mapred.LocalJobRunner: 
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:30 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 0 time(s).
12/05/22 14:28:31 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 1 time(s).
12/05/22 14:28:32 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 2 time(s).
12/05/22 14:28:33 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 3 time(s).
12/05/22 14:28:34 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 4 time(s).
12/05/22 14:28:35 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 5 time(s).
12/05/22 14:28:36 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 6 time(s).
12/05/22 14:28:37 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 7 time(s).
12/05/22 14:28:38 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 8 time(s).
12/05/22 14:28:39 INFO ipc.Client: Retrying connect to server: master/127.0.0.1:9001. Already tried 9 time(s).
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:39 WARN mapred.LocalJobRunner: job_local_0001
java.net.ConnectException: Call to master/127.0.0.1:9001 failed on connection exception: java.net.ConnectException: Connection refused
    at org.apache.hadoop.ipc.Client.wrapException(Client.java:1095)
    at org.apache.hadoop.ipc.Client.call(Client.java:1071)
    at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
    at $Proxy1.getProtocolVersion(Unknown Source)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
    at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
    at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
    at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.<init>(ReduceTask.java:446)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:490)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:592)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:489)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:434)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:560)
    at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:184)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1202)
    at org.apache.hadoop.ipc.Client.call(Client.java:1046)
    ... 17 more
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
12/05/22 14:28:39 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
12/05/22 14:28:39 INFO mapred.JobClient: Job complete: job_local_0001
12/05/22 14:28:39 INFO mapred.JobClient: Counters: 20
12/05/22 14:28:39 INFO mapred.JobClient:   File Input Format Counters 
12/05/22 14:28:39 INFO mapred.JobClient:     Bytes Read=967
12/05/22 14:28:39 INFO mapred.JobClient:   FileSystemCounters
12/05/22 14:28:39 INFO mapred.JobClient:     FILE_BYTES_READ=14093
12/05/22 14:28:39 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=47859
12/05/22 14:28:39 INFO mapred.JobClient:   Map-Reduce Framework
12/05/22 14:28:39 INFO mapred.JobClient:     Map output materialized bytes=1960
12/05/22 14:28:39 INFO mapred.JobClient:     Map input records=10
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce shuffle bytes=0
12/05/22 14:28:39 INFO mapred.JobClient:     Spilled Records=10
12/05/22 14:28:39 INFO mapred.JobClient:     Map output bytes=1934
12/05/22 14:28:39 INFO mapred.JobClient:     Total committed heap usage (bytes)=115937280
12/05/22 14:28:39 INFO mapred.JobClient:     CPU time spent (ms)=0
12/05/22 14:28:39 INFO mapred.JobClient:     Map input bytes=967
12/05/22 14:28:39 INFO mapred.JobClient:     SPLIT_RAW_BYTES=82
12/05/22 14:28:39 INFO mapred.JobClient:     Combine input records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce input records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce input groups=0
12/05/22 14:28:39 INFO mapred.JobClient:     Combine output records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient:     Reduce output records=0
12/05/22 14:28:39 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
12/05/22 14:28:39 INFO mapred.JobClient:     Map output records=10
12/05/22 14:28:39 INFO mapred.JobClient: Job Failed: NA
java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1265)
    at uni.kassel.macek.rtprep.RetweetApplication.main(RetweetApplication.java:50)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:616)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Since this stacktrace now shows me, that the port 9001 is used during execution, i guess that somehow the xml-configuration file overwrites the local-java-made setting (which i use for testing), which is strange since i read over and over on the internet, that java overwrites xml configuration. If nobody knows how to correct this, ill try to simply erase all configuration-xmls. Perhaps this solves the problem…

NEW UPDATE

Renaming Hadoops conf folder solved the problem of the waiting copier and the program is executed til the end. Sadly the execution doesnt wait anymore for my debugger although HADOOP_OPTS is set correctly.

RESUME:Its only a configuration issue: XML may (for some configuration parameters) overwrite JAVA. If somebody knew how i can get debugging to run again, it would be perfect, but for now im just glad i dont see this stacktrace anymore! 😉

Thank you Chris for your time and effords!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T18:44:54+00:00Added an answer on June 4, 2026 at 6:44 pm

    Sorry i didn’t see this before, but you appear to have two important configuration properties set to final in your conf xml files, as denoted by the following log statements:

    12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: fs.default.name;  Ignoring.
    12/05/22 14:28:29 WARN conf.Configuration: file:/tmp/hadoop-ema/mapred/local/localRunner/job_local_0001.xml:a attempt to override final parameter: mapred.job.tracker;  Ignoring.
    

    This means that your job is unable to actually run in local mode, it starts in local mode, but the reducer reads the serialized job configuration and determines it is not in local mode, and tried to fetch map outputs via the task tracker ports.

    You said your fix was to rename the conf folder – this will default hadoop back to the default configuration, where these two properties are not marked as ‘final’

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

As explained in the title, I am having a problem with getting the URL
I don't know if I explained this in my title, this is what i
Sorry for misunderstanding title, but I couldn't explained my issue in a small title.
is there a Java framework that will do the dirty work explained in title
I'm not sure I explained it correctly in the question title, so here's the
As explained in the title, I am trying to use the libcurl C API
I have something similar to this: <!doctype html> <html> <head> <meta charset=UTF-8> <title>JQuery Test</title>
Long title but hopefully it explained most of what I need to say. Basically
Awkward title I know, this is best explained in code. Given a set of
As basicly explained in the title I have a form with a date field.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.