I want to write MapReduce (may be multiple rounds!) to 1. Sample N records

Question

0

Asked: June 14, 20262026-06-14T14:33:52+00:00 2026-06-14T14:33:52+00:00

I want to write MapReduce (may be multiple rounds!) to 1. Sample N records

0

I want to write MapReduce (may be multiple rounds!) to

1. Sample N records from Large data - for say X RandomTree
2. Train each tree (totally X)
3. And then test records on all these trees

Sequentially,

for X = 0 to 199:
    - sample N records from Large data
    - Train this tree
    - test for all test records

This is my homework problem, so I just need idea ..!

I’m not sure with

In mapper can I sample exactly N records and generate 200 small
training data file?
To test each record on all 200 first option
I thought of each reducer will run small test (part of test file)
for ALL trees. second option I’m not sure, how to implement this
is, run 200 tree independently and test file is in Distributed
cache, predict for each test record.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T14:33:54+00:00

It depends on how formal you work.

A formal mapper cannot sample exactly N records. Because it cannot keep a counter, and it doesn’t know the total data size. A practical mapper in Hadoop certainly could. But he probably won’t know how many records he is going to receive in total.

But as you said this is a homework, I don’t think you need to ensure it is exactly N.
In particular, as you are sampling, what is the benefit of having exactly N records?

Try the following:

Determine the total size of records.
Have each mapper emit each record with a likelihood of N/D to each tree (= key for reducer).
Let the reducer train the tree on all the records it received.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to write MapReduce (may be multiple rounds!) to 1. Sample N records

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply