I am trying to understand how hive and hadoop interact. From the tutorials I have read I appears that prior to running HIVE queries you run a map / reduce job to get the input data. This seems counterproductive to me, if I have already run the map / reduce job and gotten the data in an easily parsable format why would I not put the data into a traditional database.
Thanks for your help,
Nathan
Hive operates on files that are stored on HDFS. For anything other than the simplest queries, hive generates and runs mapreduce jobs. For very simple queries (
SELECT * FROM MyTable) it will just stream the files off of disk.The input data doesn’t need to come from MapReduce- it can be a simple text file uploaded to HDFS. See http://developer.yahoo.com/hadoop/tutorial/module2.html#commandref