Currently, when I STORE into HDFS, it creates many part files.
Is there any way to store out to a single CSV file?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You can do this in a few ways:
To set the number of reducers for all Pig opeations, you can use the
default_parallelproperty – but this means every single step will use a single reducer, decreasing throughput:set default_parallel 1;Prior to calling STORE, if one of the operations execute is (COGROUP, CROSS, DISTINCT, GROUP, JOIN (inner), JOIN (outer), and ORDER BY), then you can use the
PARALLEL 1keyword to denote the use of a single reducer to complete that command:GROUP a BY grp PARALLEL 1;See Pig Cookbook – Parallel Features for more information