I’m currently creating an external table like that:
CREATE EXTERNAL TABLE site_datatype (
....
yada yada
....
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LINES TERMINATED BY '\n'
LOCATION '/user/accounting/summary/2011-12-15/site_datatype.result'
But instead of creating a file called “site_datatype.result” with the contents in it when i run the insert overwrite table select, it creates a directory “site_datatype.result” with a file called “000000_0” in it (correct contents though).
Is this supposed to work this way? And if yes, how can I workaround this (inside hive) to get it done the way I need it?
Thanks,
Mario
Hive operates at the directory level, so multiple reducers can quickly dump results into HDFS. If it were to operate at the file level, it would have to send it to a single reducer to consolidate into a single file, adding an unnecessary bottleneck.
If you absolutely need data from a Hive table in a single file, you can set the number of reducers to 1, then query your data and push it to a new table or directory (via Insert Overwrite).
Another option would be to get the table from HDFS (hadoop fs -get hive/warehouse/sampletable/ .) and then ‘cat’ all of the files back together.