I am reading bytes from a file using:
FileSystem fs = config.getHDFS();
try {
Path path = new Path(dirName + '/' + fileName);
byte[] bytes = new byte[(int)fs.getFileStatus(path)
.getLen()];
in = fs.open(path);
in.read(bytes);
result = new DataInputStream(new ByteArrayInputStream(bytes));
} catch (Exception e) {
e.printStackTrace();
if (in != null) {
try {
in.close();
} catch (IOException e1) {
e1.printStackTrace();
}
}
}
There are about 15,000 files in the directory i am reading from. After a certain point I get this exception on the line in.read(bytes) :
2012-05-31 14:11:45,477 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:298)
at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Status.read(DataTransferProtocol.java:115)
at org.apache.hadoop.hdfs.BlockReader.newBlockReader(BlockReader.java:427)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:725)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514)
at java.io.DataInputStream.read(DataInputStream.java:83)
Another Exception thrown is:
2012-05-31 15:09:14,849 [INFO:main] (DFSInputStream.java:414) - Failed to connect to /165.36.80.28:50010, add to deadNodes and continue
java.net.SocketException: No buffer space available (maximum connections reached?): connect
at sun.nio.ch.Net.connect(Native Method)
at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:192)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:373)
at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:719)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:390)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:514)
at java.io.DataInputStream.read(DataInputStream.java:83)
Please advice what could be the issue.
You’re ignoring the return value from
in.read, and assuming you can read the whole file in one go. Don’t do that. Loop round untilreadreturns -1 or you’ve read as much data as you want to. It’s not clear to me whether you should really be trustinggetLen()like this – what happens if the file grows (or shrinks) between the two calls?I would suggest creating a
ByteArrayOutputStreamto write to and a smallish (16K?) buffer as temporary storage, then looping round – read into the buffer, write that many bytes into your output stream, lather, rinse, repeat untilreadreturns -1 to indicate the end of the stream. Then you can get the data out of yourByteArrayOutputStreamand put it into theByteArrayInputStreamas before.EDIT: Quick code, untested – there’s similar (better) code in Guava, btw.
Then just use:
Also note that you should close your stream in a
finallyblock, not just on exception. I’d also advise against catchingExceptionitself.