My data does not need to be loaded in realtime so I don’t have to use HBASE, but I was wondering if there are any performance benefits of using HBASE in MR Jobs, shouldn’t the joins be faster due to the indexed data?
Anybody have any benchmarks?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Generally speaking, hive/hdfs will be significantly faster than HBase. HBase sits on top of HDFS so it adds another layer. HBase would be faster if you are looking up individual records but you wouldn’t use an MR job for that.