I am trying to run sort example on Hadoop single-node cluster. First of all, I start the deamons:
hadoop@ubuntu:/home/user/hadoop$ bin/start-all.sh
Then I run the random writer example to generate the sequential files as input files.
hadoop@ubuntu:/home/user/hadoop$ bin/hadoop jar hadoop-*-examples.jar randomwriter rand
hadoop@ubuntu:/home/user/hadoop$ bin/hadoop jar hadoop-*-examples.jar randomwriter rand
Running 0 maps.
Job started: Thu Mar 31 18:21:51 EEST 2011
11/03/31 18:21:52 INFO mapred.JobClient: Running job: job_201103311816_0001
11/03/31 18:21:53 INFO mapred.JobClient: map 0% reduce 0%
11/03/31 18:22:01 INFO mapred.JobClient: Job complete: job_201103311816_0001
11/03/31 18:22:01 INFO mapred.JobClient: Counters: 0
Job ended: Thu Mar 31 18:22:01 EEST 2011
The job took 9 seconds.
hadoop@ubuntu:/home/user/hadoop$ bin/hadoop jar hadoop-*-examples.jar sort rand rand-sort
Running on 1 nodes to sort from hdfs://localhost:54310/user/hadoop/randinto
hdfs://localhost:54310/user/hadoop/rand-sort with 1 reduces.
Job started: Thu Mar 31 18:25:19 EEST 2011
11/03/31 18:25:20 INFO mapred.FileInputFormat: Total input paths to process : 0
11/03/31 18:25:20 INFO mapred.JobClient: Running job: job_201103311816_0002
11/03/31 18:25:21 INFO mapred.JobClient: map 0% reduce 0%
11/03/31 18:25:32 INFO mapred.JobClient: map 0% reduce 100%
11/03/31 18:25:34 INFO mapred.JobClient: Job complete: job_201103311816_0002
11/03/31 18:25:34 INFO mapred.JobClient: Counters: 9
11/03/31 18:25:34 INFO mapred.JobClient: Job Counters
11/03/31 18:25:34 INFO mapred.JobClient: Launched reduce tasks=1
11/03/31 18:25:34 INFO mapred.JobClient: FileSystemCounters
11/03/31 18:25:34 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=96
11/03/31 18:25:34 INFO mapred.JobClient: Map-Reduce Framework
11/03/31 18:25:34 INFO mapred.JobClient: Reduce input groups=0
11/03/31 18:25:34 INFO mapred.JobClient: Combine output records=0
11/03/31 18:25:34 INFO mapred.JobClient: Reduce shuffle bytes=0
11/03/31 18:25:34 INFO mapred.JobClient: Reduce output records=0
11/03/31 18:25:34 INFO mapred.JobClient: Spilled Records=0
11/03/31 18:25:34 INFO mapred.JobClient: Combine input records=0
11/03/31 18:25:34 INFO mapred.JobClient: Reduce input records=0
Job ended: Thu Mar 31 18:25:34 EEST 2011
The job took 14 seconds.
hadoop@ubuntu:/home/user/hadoop$ bin/hadoop dfs -cat rand-sort/part-00000
SEQ#”org.apache.hadoop.io.BytesWritable”org.apache.hadoop.io.BytesWritablej”��mY�&�٩�#
I’m new to Hadoop. Is everything I am doing correct, or am I doing something wrong? And my question is, how can I see that the generated data from the randomwritewr and the results from the sort example are correct? From where can I see them?
The problem is that your tasktracker isn’t started by the time you try to run the job, it doesn’t start up instantly. You can run bin/hadoop job -list-active-trackers to see if the tasktracker is up or not, it may take a moment to finish coming up. No tasktracker = no nodes to map the writer to.