I have a question about using hadoop to process a small file. My file only has about a 1,000 or so records but i want the records to roughly be evenly distributed among the nodes. Is there a way to do this? I’m new to hadoop and so far it seems that all the execution is happening on one node instead a multiple simultaneously. Let me know if my question makes sense or if I need to clarify anything. Like I said, i’m very new to Hadoop but am hoping to get some clarification. Thanks.
Share
Use the NLineInputFormat and specify the number of records to be processed by each mapper. This way the records in a single block will be processed by multiple mappers.