Would it make a difference to the number of map tasks spawned by a

Question

0

Asked: June 18, 20262026-06-18T12:10:25+00:00 2026-06-18T12:10:25+00:00

Would it make a difference to the number of map tasks spawned by a

0

Would it make a difference to the number of map tasks spawned by a job if I have a lot of small files (~HDFS block size) vs a few large files

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-18T12:10:26+00:00

It depends which InputFormat you use, because this is what determines the input splits computation, and thus the number of map tasks.

If you use the default TextInputFormat, each file will have at least 1 split, so at least 1 mapper per file, even if these files are a few kB, each mapper doing very little work, but this introduces a lot of overhead for the Map/Reduce framework. That said if you have a guarantee that these “small” files will be close to the block size, that probably doesn’t matter too much.

If you have no control over your files and they might get really small, I would advise using a different InputFormat called CombineFileInputFormat which combines several input files in the same split, the number of maps in this case will only depend on the overall amount of data, regardless of the number of files. An implementation can be found here.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Would it make a difference to the number of map tasks spawned by a

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply