I just started learning Hadoop and PIG (from last two days!) for one of my future project.
For experiments I’ve installed Hadoop (HDFS on default localhost:9000) as pseudo distributed mode and PIG (map-reduce mode).
When I initialized PIG by typing ./bin/pig command it launched GRUNT command line and I got message that pig connected with HDFS (localhost:9000), later I could successfully able to access HDFS thru pig.
I was expecting to perform some manual configuration for PIG to access HDFS (as per various internet articles).
My question is, from where PIG identified default HDFS configuration (localhost:9000)? I checked pig.properties but I didn’t find anything there. I need this info as I might change default HDFS configuration in future.
BTW, I have HADOOP_HOME and PIG_HOME defined in my OS PATH variable.
When installing Pig (I assume v0.10.0) you have to tell how it will connect to the
HDFS.I don’t know how you did this but generally this is done by adding the hadoop conf dir path to the
PIG_CLASSPATHenvironment variable. You can also setHADOOP_CONF_DIRas well.If you are starting the grunt shell Pig will locate the directory of the Hadoop configuration XMLs, and takes the value of
fs.default.name(core-site.xml) andmapred.job.tracker(mapred-site.xml) , i.e: the location of the Namenode and JobTracker.For reference you may have a look at the Pig shell script to see how env. variables are collected and evaluated.