In HDFS processing after each job empty files are created with names like part-m-0000*. Each of these files are empty but they are consuming 64MB of disk space because that is default size of block.
It is necessary to make code changes to skip creation of these files. How do I do this?
Note: I am using org.apache.hadoop.mapreduce.lib.output.MultipleOutputs<KEYOUT,VALUEOUT> to write output records, and not Context, so I anyways end up with output records in files like “successful-m-00000” etc.
According to the Hadoop : The Definitive Guide, so the underlying file system will not take a HDFS block size if the file is empty.
For suppressing the output files if they are empty, use LazyOutputFormat#setOutputFormatClass. Here is the Apache documentation for the same.