I am trying to run few hadoop commands from python program…
For example if from command line, you do:
bin/hadoop dfs -ls /hdfs/query/path
it returns all the files in the hdfs query path..
So very similar to unix
Now I am trying to basically do this from python.. and do some manipulation from it.
exec_str = "path/to/hadoop/bin/hadoop dfs -ls " + query_path
os.system(exec_str)
Now, I am trying to grab this output to do some manipulation in it.
For example.. count number of files?
I looked into subprocess module but then… these are not native shell commands. hence not sure whether i can apply those concepts
How to solve this?
You can use http://docs.python.org/2/library/subprocess.html probably
check_ouputis what you want if it’s stdout you want to capture.