Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6677921
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T04:10:53+00:00 2026-05-26T04:10:53+00:00

I have a cluster with 50 nodes and each node has 8 cores for

  • 0

I have a cluster with 50 nodes and each node has 8 cores for computation.
If I have job to which I’m planning to impose 200 reducers, what would be good computational resource allocation strategy for better performance ?

I mean is it better to allocate 50 nodes and 4 cores on each of them or to allocate 25 nodes and 8 cores for each of them ? Which one is better in what case ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T04:10:54+00:00Added an answer on May 26, 2026 at 4:10 am

    To answer your question, it depends on a few things. The 50 nodes are going to be better in general, in my opinion:

    • If you are reading a lot of data off disk, 50 nodes will be better because you will parallelize the loading off disk 2x.
    • If you are computing and processing over a lot of data, 50 nodes will be better, because the number of cores doesn’t scale 1:1 with processing (i.e., 2x as many cores is not quite 2x as fast… meanwhile, more processors does scale close to 1:1).
    • Hadoop has to run things like the TaskTracker and DataNode processes on those nodes, as well as the OS layer stuff. Those “take up” cores, as well.

    However, if your main concern is network, here are the few downsides of having 50 nodes:

    • Likely, 50 nodes is going to be over two racks. Are they on a flat network or do you have to deal with iter-rack communication? You’ll have to set up Hadoop accordingly;
    • A network switch supporting 50 nodes is going to be more expensive than one that supports 25;
    • The network shuffle between the map and the reduce will cause the switch a bit more work for your 50 node cluster, but still about the same amount of data will be passed through the network.

    Even with these network concerns, I think you’ll find that the 50 nodes is better, just because the value of a node is not just the number of cores. You have to consider mostly how many disks you have.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

We have a weblogic server 10.0 instance which has a cluster with one managed
Assuming I have a cluster of n Erlang nodes, some of which may be
I have a WebLogic 9.2 Cluster which runs 2 managed server nodes. I have
I'm running a 4-node Cassandra cluster. Some of our nodes have some very large
I have to implement MPI system in a cluster. If anyone here has any
I've written some MPI code which works flawlessly on large clusters. Each node in
I have a cluster in weblogic 9.2 with 2 nodes(172.20.1.68:7101, 172.20.1.23:7102), 1 adminserver (172.20.1.23:7001)
I have a 2 node cassandra cluster with a replication factor of 2 and
We currently have a failover sql cluster with two nodes. For a new large
I am using Povray to render images over a cluster. Each worker node is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.