I am trying to do this in python:
cat foo | ssh me@xxxx hadoop fs -put - bar/foo
I have originally tried a check_call:
foo = 'foo'
subprocess.check_call(['cat', foo, '|','ssh',os.environ['USER']+'@'+hadoopGateway,'hadoop','fs','-put', '-', inputArgs.targetDir+'/'+foo])
which produces the error:
cat: invalid option -- 'p'
I have looked at the python pipes module documentation and played around with it in the shell, but I do not understand how to kick it off without an output file, like the example.
>>> t = pipes.Template()
>>> t.prepend('cat foo', '.-')
>>> t.append('hadoop fs -put - bar/foo', '-.') # what next
Clearly I am missing something.
You don’t need
cator a pipeline for this; all you need is to provide the file as standard input to thesshcommand. In shell, that would beand with the Python subprocess module it’s only a tiny bit more involved: