I am trying on reading a file in parallel from FTP using map-reduce. I

Question

0

Asked: June 14, 20262026-06-14T22:25:58+00:00 2026-06-14T22:25:58+00:00

I am trying on reading a file in parallel from FTP using map-reduce. I

0

I am trying on reading a file in parallel from FTP using map-reduce. I have got a code working which reads a file and performs word count on it . However it fails when the input size is large (over 2 MB to be specific) .
It stops with a Spill 0 completed message , then a Map 100% Reduce 0% . and then a connection closed by server .
I don’t quite get it . What does Spill 0 mean ? Why does the code fail for large inputs? How can I split the input and provide it to mapper ? will that help ?
Can i extend FileInputFormat class to do work this out ?
Thanks 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T22:25:59+00:00

Yes, you can implement your on InputFormat. Apart from FileInputFormat there are several others in Hadoop such as TextInputFormat, KeyValueInputFormat, etc. You can also define how a record is read from a split. For that you need to implement your own RecordReader.

http://developer.yahoo.com/hadoop/tutorial/module4.html

For instance, the default InputFormat is the TextInputFormat that reads a file and uses a LineRecordReader to get records line by line. If you are reading structured data from a file you could implement your own RecordReader so each record is a structure of data from that file.

In any case, doing a MapReduce job for reading a file from FTP is really strange. Hadoop works because data is stored on Hadoop’s File System (HDFS) which is a distributed filesystem where each file is divided in chunks and spread across all the nodes of the filesystem. The way you should approach IMHO is to download that file to your HDFS and the execute your MapReduce job.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying on reading a file in parallel from FTP using map-reduce. I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply