I’ve hadoop single instance cluster configured to run with some IP address ( instead of localhost ) on centos linux. I was able to execute example mapreduce job correctly. That tells me that the hadoop setup appears to be fine.
I have also addded couple of data files to hadoop databse under “/data” folder and are visible through the “dfs” comand
bin/hadoop dfs -ls /data
I am trying to connect to this HDFS system from PDI/Kettle. In the HDFS File browser, if I put the HDFS connection parameters incorrectly, e.g. incorrect port, it says it can not connect to the HDFS server. Instead, If I put in all parameters correctly ( server,port,user,password ), and click ‘connect’ it does not give the error, meaning it is able to connect. But in the file list, it shows “/” .
Doesnt show data folder. What could be going wrong ?
I’ve already tried this :
-
tried chmod 777 to the datafiles using “bin/hadoop dfs -chmod -R 777 /data”
-
tried using root and also hdfs linux user in the PDI file browser
-
tried adding the data files in some other location
-
re-formatting hdfs several times and adding data files again
-
copying the hadoop-core jar file from hadoop installable to PDI extlib
but it does not list files in the PDI browser. I can not see anything in the PDI log either… Need quick help … thanks !!!
-abhay
I got past this issue. On windows, PDI was not logging anything in the log file. I tried same thing on linux, when it showed me in the log that it was missing a library from Apache, the commons-configuration. I downloaded latest version of the same and put it under the extlib/pentaho folder and boom ! it worked !!