We have a hadoop+hbase cluster on amazon EMR with the default configuration, so that

Question

0

Asked: June 12, 20262026-06-12T01:51:47+00:00 2026-06-12T01:51:47+00:00

We have a hadoop+hbase cluster on amazon EMR with the default configuration, so that

0

We have a hadoop+hbase cluster on amazon EMR with the default configuration, so that both mapred.child.tmp and hbase.tmp.dir point to /tmp. Our cluster has been running for a while and now /tmp is 500Gb, compared to 70Gb for actual /hbase data.

This kind of difference seems too much, are we supposed to periodically delete some of the /tmp data?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T01:51:48+00:00

Editorial Team

2026-06-12T01:51:48+00:00Added an answer on June 12, 2026 at 1:51 am

After some investigation I found that the largest part of our /tmp data was created by failed mapreduce tasks during Amazon’s automatic backup of Hbase to S3. Our successful mapreduce tasks don’t leave much data in /tmp.

We have decided to disable Amazon’s automatic backup and implement our own backup script using Hbase tool for importing/exporting tables.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

We have a hadoop+hbase cluster on amazon EMR with the default configuration, so that

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply