I am trying to figure out how to write directly from a EMR map task to the s3 bucket. I would like to run a python streaming job which would get some data from the internet and save it to s3 – without returning it back to reduce job. Can anyone help me with that?
Share
Why don’t you just set the output of your MR job to be a s3 directory and tell it that there is no reducer:
That should do what you want it to.
Then your script can do something like this (sorry, ruby):