I need to prototype some Hadoop MR code in Hadoop Local mode in my Mac and I would like to hear some of gotcha there might be.
One particular question is about the number of mappers and reducers. Basically it will be one for both? Specifying more than 1 would work at all? I am going to use smaller sample.
You can not specify number of mapper and reducer in the local mode. It is always single threaded. In the same time, if you want to profile your mapper or reducer performance – it will be quite realistic.
Nearest mode which can have many mappers and reducers is pseudo distributed mode when all deamons are running on the single machine.
Both of the above will not take into account possible problems with data locality, shuffling performance. I also do not expect that your dev machine has the same disk subsystem as production..
In a nutshell – if you have low single mapper / reducer performance in the local mode -you can start fixing it. If it does working good – try on real HW before planning your cluster.