Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8584527
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T21:47:39+00:00 2026-06-11T21:47:39+00:00

I wonder if it is possible to install a background hadoop cluster. I mean,

  • 0

I wonder if it is possible to install a “background” hadoop cluster. I mean, after all it is meant to be able to deal with nodes being unavailable or slow sometimes.

So assuming some university has a computer lab. Say, 100 boxes, all with upscale desktop hardware, gigabit etherner, probably even identical software installation. Linux is really popular here, too.

However, these 100 boxes are of course meant to be desktop systems for students. There are times where the lab will be full, but also times where the lab will be empty. User data is mostly stored on a central storage – say NFS – so the local disks are not used a lot.

Sounds like a good idea to me to use the systems as Hadoop cluster in their idle time. The simplest setup would be of course to have a cron job start the cluster at night, and shut down in the morning. However, also during the day many computers will be unused.

However, how would Hadoop react to e.g. nodes being shut down when any user logs in? Is it possible to easily “pause” (preempt!) a node in hadoop, and moving it to swap when needed? Ideally, we would give Hadoop a chance to move away the computation before suspending the task (also to free up memory). How would one do such a setup? Is there a way to signal Hadoop that a node will be suspended?

As far as I can tell, datanodes should not be stopped, and maybe replication needs to be increased to have more than 3 copies. With YARN there might also be a problem that by moving the task tracker to an arbitrary node, it may be the one that gets suspended at some point. But maybe it can be controlled that there is a small set of nodes that is always on, and that will run the task trackers.

Is it appropriate to just stop the tasktracker or send a SIGSTOP (then resume with SIGCONT)? The first would probably give hadoop the chance to react, the second would continue faster when the user logs out soon (as the job can then continue). How about YARN?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T21:47:40+00:00Added an answer on June 11, 2026 at 9:47 pm

    First of all, hadoop doesn’t support ‘preempt’, how you described it.
    Hadoop simply restarts task, if it detects, that task tracker dead.
    So in you case, when user logins into host, some script simply kills
    tasktracker, and jobtracker will mark all mappers/reducers, which were run
    on killed tasktracker, as FAILED. After that this tasks will be rescheduled
    on different nodes.

    Of course such scenario is not free. By design, mappers and reducers
    keep all intermediate data on local hosts. Moreover, reducers fetch mappers
    data directly from tasktrackers, where mappers was executed. So, when
    tasktracker will be killed, all those data will be lost. And in case
    of mappers, it is not a big problem, mapper usually works on relatively
    small amount of data (gigabytes?), but reducer will suffer greater.
    Reducer runs shuffle, which is costly in terms of network bandwidth and
    cpu. If tasktracker runs some reducer, restart of this reducer means,
    that all data should be redownloaded once more onto new host.
    And I recall, that jobtracker doesn’t see immediately, that
    tasktracker is dead. So, killed tasks shouldn’t restart immediately.

    If you workload is light, datanodes can live forever, don’t put them offline,
    when user login. Datanode eats small amount of memory (256M should be enough
    in case small amount of data) and if you workload is light, don’t eat much
    of cpu and disk io.

    As conclusion, you can setup such configuration, but don’t rely on
    good and predictable job execution on moderated workloads.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Wonder if this possible. Saw many posts on adding watermark after the pdf is
After reading this question I started to wonder: is it possible to have a
I wonder if its possible to keep child elements from being animated. like in
I wonder is it possible to get cookies under another domain rather than my
I wonder if it is possible to show OSM ( Open Street Maps )
I wonder if it's possible to assign a default value for web service request
I wonder if it's possible to pivot a table in one pass in Apache
I wonder if it's possible to add an Event Listener to a MovieClip that
I wonder if it is possible to use FK's in Mysql (InnoDB) for inverse
I wonder if it's possible to create an extension method which has functionality &

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.