I am using Python and have to work on following scenario using Hadoop Streaming:

Question

0

Asked: May 28, 20262026-05-28T05:06:13+00:00 2026-05-28T05:06:13+00:00

I am using Python and have to work on following scenario using Hadoop Streaming:

0

I am using Python and have to work on following scenario using Hadoop Streaming:
a) Map1->Reduce1->Map2->Reduce2
b) I dont want to store intermediate files
c) I dont want to install packages like Cascading, Yelp, Oozie. I have kept them as last option.

I already went through the same kind of discussion on SO and elsewhere but could not find an answer wrt Python. Can you please suggest.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T05:06:14+00:00

b) I dont want to store intermediate files

c) I dont want to install packages like Cascading, Yelp, Oozie.

Any reason why? Based on the response, a better solution could be provided.

Intermediates files cannot be avoided, because the o/p of the previous Hadoop job cannot be streamed as i/p to the next job. Create a script like this

run streaming job1
if job1 is not success then exit
run streaming job2
if job2 is success them remove o/p of job1 else exit
run streaming job3
if job3 is succcess them remove o/p of job2 else exit

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am using Python and have to work on following scenario using Hadoop Streaming:

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply