So here is a situation, I have a hadoop cluster that is not configured with Kerberos security and a workstation. The Hadoop cluster runs Cloudera CDH3 distribution. The data on the cluster all stored under ‘hdfs’ user.
The workstation is either linux or macos workstation that runs a complicated piece of software that embeds a PIG client. The PIG client connects to the cluster to run analytics jobs.
Here is a problem. The user accounts on the cluster and on the workstation are different, all data in the hadoop cluster is stored under ‘hdfs’ home directory user the workstation has completely set of user accounts. Is it possible to tell PIG to execute the job under a different user account. Currently pig attempts to execute job with account of the user who is current logged into the workstation. The job actually runs but it is not able to access the data because scripts use paths relative to HDFS user home directory.
I understand that when security is not configured for the cluster the user name is simply passed with the job configuration .XML, but for some reason I can’t figure out how to force username I need into that XML document.
You can’t pass user via properties. Security subsystem is more complicated, then simply passing username. You have four possible solutions:
I think preferable way in your case is 1. But if it is not possible, 4 is more appreciated.