In the hadoop job counters, what is the difference between “Map output materialized bytes” vs “map output bytes”? I don’t see the former when I disable map output compression so I guess it is the real output bytes (compressed) while the latter is uncompressed bytes?
In the hadoop job counters, what is the difference between Map output materialized bytes
Share
I think you are right.
From http://hadoop.apache.org/docs/r1.0.4/releasenotes.html:
MAPREDUCE-2365. New counters for FileInputFormat (BYTES_READ) and FileOutputFormat (BYTES_WRITTEN). New counter MAP_OUTPUT_MATERIALIZED_BYTES for compressed MapOutputSize. (Siddharth Seth)
(Changes Since Hadoop 0.20.2)
…………………………………………………………………………………………………………………………………
Here is a quote from Tom White’s “Hadoop Definitive Guide”, 3rd edition (table 8-2, page 261):
“Map output materialized bytes” – The number of bytes of map output actually written to disk. If map output compression is enabled, this is reflected in the counter value.
“Map output bytes” – The number of bytes of uncompressed output produced by all the maps in the job. Incremented every time the
collect()method is called on the map’sOutputCollector.