As a simple example,
select * from tablename;
DOES NOT kick in map reduce, while
select count(*) from tablename;
DOES. What is the general principle used to decide when to use map reduce (by hive)?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
In general, any sort of aggregation, such as min/max/count is going to require a MapReduce job. This isn’t going to explain everything for you, probably.
Hive, in the style of many RDBMS, has an
EXPLAINkeyword that will outline how your Hive query gets translated into MapReduce jobs. Try running explain on both your example queries and see what it is trying to do behind the scenes.