I have a hadoop cluster with 3 nodes. 1 master and 2 slaves. Each of them has 24 GB ram.
When i execute
hadoop fs -put
to transfer data from local file system to hdfs dome of the data gets trasferred and then I get an exception as
12/11/06 19:01:39 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception for block blk_-2646313249080465541_1002java.net.SocketTimeoutException: 603000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.30.30.210:51735 remote=/172.30.30.211:50010]
at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
at java.io.DataInputStream.readFully(DataInputStream.java:178)
at java.io.DataInputStream.readLong(DataInputStream.java:399)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$PipelineAck.readFields(DataTransferProtocol.java:125)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$ResponseProcessor.run(DFSClient.java:3284)
12/11/06 19:01:39 WARN hdfs.DFSClient: Error Recovery for block blk_-2646313249080465541_1002 bad datanode[0] 172.30.30.211:50010
put: All datanodes 172.30.30.211:50010 are bad. Aborting...
12/11/06 19:01:39 ERROR hdfs.DFSClient: Exception closing file /user/root/input/wiki.xml-p000185003p000189874 : java.io.IOException: All datanodes 172.30.30.211:50010 are bad. Aborting...
java.io.IOException: All datanodes 172.30.30.211:50010 are bad. Aborting...
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError(DFSClient.java:3414)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2906)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:3110)
I used 30 GB data to transfer and only 22 GB got tranferred and then I got this exception and both the datanodes got rebooted.
Is there any problem with buffer. I mean datanode is receiving data from namenode through socket and may be the datanodes buffer is not large enough to accommodate huge data and its causing this exception.
These are the logs file created by HDFS
2012-11-06 18:54:10,074 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-11-06 18:54:10,239 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
2012-11-06 18:54:10,349 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-11-06 18:54:10,350 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-11-06 18:54:10,350 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system started
2012-11-06 18:54:10,644 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-11-06 18:54:11,387 WARN org.apache.hadoop.hdfs.server.common.Storage: Ignoring storage directory /data/hadoop/data due to exception: java.io.FileNotFoundException: /data/hadoop/data/in_use.lock (Permission denied)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.tryLock(Storage.java:703)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:684)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:542)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:112)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:408)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:306)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1623)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1562)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1580)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1707)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1724)
2012-11-06 18:54:11,551 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: All specified directories are not accessible or do not exist.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:143)
at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:408)
at org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:306)
at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1623)
at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1562)
at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1580)
at org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:1707)
at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1724)
2012-11-06 18:54:11,552 INFO org.apache.hadoop.hdfs.server.datanode.DataNode:
And these are the ones created by Mapred
2012-11-06 18:54:29,395 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2012-11-06 18:54:29,416 INFO org.apache.hadoop.metrics2.impl.MetricsSinkAdapter: Sink ganglia started
2012-11-06 18:54:29,449 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source MetricsSystem,sub=Stats registered.
2012-11-06 18:54:29,450 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
2012-11-06 18:54:29,450 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: TaskTracker metrics system started
2012-11-06 18:54:29,792 INFO org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi registered.
2012-11-06 18:54:30,002 INFO org.mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog
2012-11-06 18:54:30,056 INFO org.apache.hadoop.http.HttpServer: Added global filtersafety (class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)
2012-11-06 18:54:30,103 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2012-11-06 18:54:30,107 INFO org.apache.hadoop.mapred.TaskTracker: Starting tasktracker with owner as mapred
2012-11-06 18:54:30,108 INFO org.apache.hadoop.mapred.TaskTracker: Good mapred local directories are: /data/hadoop/mapred
2012-11-06 18:54:30,145 ERROR org.apache.hadoop.mapred.TaskTracker: Can not start task tracker because java.io.IOException: Cannot rename /data/hadoop/mapred/ttprivate to /data/hadoop/mapred/toBeDeleted/2012-11-06_18-54-30.117_0
at org.apache.hadoop.util.MRAsyncDiskService.moveAndDeleteRelativePath(MRAsyncDiskService.java:260)
at org.apache.hadoop.util.MRAsyncDiskService.cleanupAllVolumes(MRAsyncDiskService.java:315)
at org.apache.hadoop.mapred.TaskTracker.initialize(TaskTracker.java:736)
at org.apache.hadoop.mapred.TaskTracker.<init>(TaskTracker.java:1515)
at org.apache.hadoop.mapred.TaskTracker.main(TaskTracker.java:3814)
Your problem is related to huge data transfer within the cluster. Apache Hadoop has a specific tool for this purpose called distcp. It enables all the nodes in the cluster to equally contribute to data transfer to HDFS.
Please read more at http://hadoop.apache.org/docs/r0.20.2/distcp.html