Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3482514
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T10:34:06+00:00 2026-05-18T10:34:06+00:00

Like the subject reads, is it important that I get dedicated hardware to run

  • 0

Like the subject reads, is it important that I get dedicated hardware to run a hadoop cluster and not VMs? If yes, what is acceptable network latency? Are you required to have Gigabit ethernet? I would like to leverage hadoop in speeding up an ETL process. In trying to do so, I did setup a few VMs (512-1GB RAM, 1core per VM of a dual core 2.2Mhz CPU) which are about 500 miles apart, with a network latency of 10-25ms on a 100Mpbs ethernet. I am unable to match a single machine performance for my ETL process, with 3-4 VMs as nodes. So, I thought I would ask this question here for more insight.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T10:34:06+00:00Added an answer on May 18, 2026 at 10:34 am

    It greatly depends on your tasks, but, generally, it’s all important – including network latencies, bandwidths, CPU loads / availability,

    I can picture a few scenarios where network bandwidth would be not very important – for example, if you’ve already loaded your data array to a HDFS, i.e. it’s cleanly distributed across all the nodes, and you’re going to do a complex computation on this array in mappers, without reducers at all or with very little fraction of that data going to reducers. For example, if you’re going to count the number of lines in text files, mappers would read multi-gigabyte files and push only one simple number to reducers – number of lines. Reducers would sum up these numbers and push single answer in the output. It’s virtually nothing transferred across the network => no effect on performance.

    However, in real life, you’d encounter such tasks rather rarely. Usually there are some group-by going on between mappers and reducers and thus most of the calculation-per-group is performed by reducers – i.e. reducers have to transfer all the data from mappers, usually using the network heavily.

    If you’ll tell more about your tasks, I can give more detailed estimations of what hardware you’d want to use and what are the weak points of current solution.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Like it or not, occasionally you have have to write tests for classes that
Like the subject reads: How much good-looking UIs can be built using Eclipse RCP?
I'd like to create a dynamic Multidimensional ArrayList that reads from a text file
Code Currently my code looks like that. $stmt = $this->db->prepare(SELECT m.id, m.from_id, m.to_id, m.subject,
Like the subject of this post suggests, I am looking at developing a suite
The subject line says it all. I'd also like to do this using pipes.
I would like to create an outlook message with a subject and some attachments
I have a describe block like this: describe Documents do subject { page }
Given an xml structure like this <gesmes:Envelope> <gesmes:subject>Reference rates</gesmes:subject> <gesmes:Sender> <gesmes:name>European Central Bank</gesmes:name> </gesmes:Sender>
OSX question, not iOS. I've been searching for a framework to get easy access

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.