In HDFS processing after each job empty files are created with names like part-m-0000*.

Question

0

Asked: May 27, 20262026-05-27T00:26:47+00:00 2026-05-27T00:26:47+00:00

In HDFS processing after each job empty files are created with names like part-m-0000*.

0

In HDFS processing after each job empty files are created with names like part-m-0000*. Each of these files are empty but they are consuming 64MB of disk space because that is default size of block.

It is necessary to make code changes to skip creation of these files. How do I do this?

Note: I am using org.apache.hadoop.mapreduce.lib.output.MultipleOutputs<KEYOUT,VALUEOUT> to write output records, and not Context, so I anyways end up with output records in files like “successful-m-00000” etc.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T00:26:47+00:00

According to the Hadoop : The Definitive Guide, so the underlying file system will not take a HDFS block size if the file is empty.

Unlike a filesystem for a single disk, a file in HDFS that is smaller than a single block does not occupy a full block’s worth of underlying storage.

For suppressing the output files if they are empty, use LazyOutputFormat#setOutputFormatClass. Here is the Apache documentation for the same.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

In HDFS processing after each job empty files are created with names like part-m-0000*.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply