I have a cluster with 50 nodes and each node has 8 cores for computation.
If I have job to which I’m planning to impose 200 reducers, what would be good computational resource allocation strategy for better performance ?
I mean is it better to allocate 50 nodes and 4 cores on each of them or to allocate 25 nodes and 8 cores for each of them ? Which one is better in what case ?
To answer your question, it depends on a few things. The 50 nodes are going to be better in general, in my opinion:
However, if your main concern is network, here are the few downsides of having 50 nodes:
Even with these network concerns, I think you’ll find that the 50 nodes is better, just because the value of a node is not just the number of cores. You have to consider mostly how many disks you have.