Can we use both Fair scheduler and Capacity Scheduler in the same hadoop cluster.

Question

0

Asked: June 15, 20262026-06-15T20:40:21+00:00 2026-06-15T20:40:21+00:00

Can we use both Fair scheduler and Capacity Scheduler in the same hadoop cluster.

0

Can we use both Fair scheduler and Capacity Scheduler in the same hadoop cluster. Which scheduler is good and effective. Can anyone help me ?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-15T20:40:22+00:00

I do not think both can be used at the same time. It doesn’t make sense too. Why would you want to use both type of scheduling in the same cluster? Both scheduling algos have come up due to specific use-cases.

Fair scheduling is a method of assigning resources to jobs such that
all jobs get, on average, an equal share of resources over time. When
there is a single job running, that job uses the entire cluster. When
other jobs are submitted, tasks slots that free up are assigned to the
new jobs, so that each job gets roughly the same amount of CPU time.
Unlike the default Hadoop scheduler, which forms a queue of jobs, this
lets short jobs finish in reasonable time while not starving long
jobs. It is also a reasonable way to share a cluster between a number
of users. Finally, fair sharing can also work with job priorities –
the priorities are used as weights to determine the fraction of total
compute time that each job should get.

The Fair Scheduler arose out of Facebook’s need to share its data warehouse between multiple users. Facebook started using Hadoop to manage the large amounts of content and log data it accumulated every day. Initially, there were only a few jobs that needed to run on the data each day to build reports. However, as other groups within Facebook started to use Hadoop, the number of production jobs increased. In addition, analysts started using the data warehouse for ad-hoc queries through Hive (Facebook’s SQL-like query language for Hadoop), and more large batch jobs were submitted as developers experimented with the data set. Facebook’s data team considered building a separate cluster for the production jobs, but saw that this would be extremely expensive, as data would have to be replicated and the utilization on both clusters would be low. Instead, Facebook built the Fair Scheduler, which allocates resources evenly between multiple jobs and also supports capacity guarantees for production jobs. The Fair Scheduler is based on three concepts:

Jobs are placed into named “pools” based on a configurable attribute
such as user name, Unix group, or specifically tagging a job as being
in a particular pool through its jobconf.
Each pool can have a “guaranteed capacity” that is specified through
a config file, which gives a minimum number of map slots and reduce
slots to allocate to the pool. When there are pending jobs in the
pool, it gets at least this many slots, but if it has no jobs, the
slots can be used by other pools.
Excess capacity that is not going toward a pool’s minimum is
allocated between jobs using fair sharing. Fair sharing ensures that
over time, each job receives roughly the same amount of resources.
This means that shorter jobs will finish quickly, while longer jobs
are guaranteed not to get starved.

The scheduler also includes a number of features for ease of administration, including the ability to reload the config file at runtime to change pool settings without restarting the cluster, limits on running jobs per user and per pool, and use of priorities to weigh the shares of different jobs.

The CapacityScheduler is designed to allow sharing a large cluster
while giving each organization a minimum capacity guarantee. The
central idea is that the available resources in the Hadoop Map-Reduce
cluster are partitioned among multiple organizations who collectively
fund the cluster based on computing needs. There is an added benefit
that an organization can access any excess capacity no being used by
others. This provides elasticity for the organizations in a
cost-effective manner.

The Capacity Scheduler from Yahoo offers similar functionality to the Fair Scheduler but takes a somewhat different philosophy. In the Capacity Scheduler, you define a number of named queues. Each queue has a configurable number of map and reduce slots. The scheduler gives each queue its capacity when it contains jobs, and shares any unused capacity between the queues. However, within each queue, FIFO scheduling with priorities is used, except for one aspect – you can place a limit on percent of running tasks per user, so that users share a cluster equally. In other words, the capacity scheduler tries to simulate a separate FIFO/priority cluster for each user and each organization, rather than performing fair sharing between all jobs. The Capacity Scheduler also supports configuring a wait time on each queue after which it is allowed to preempt other queues’ tasks if it is below its fair share.

Hence it would boil down to what is your need and setup in order to decide on which scheduler you should go with.

Apache hadoop has now support for both these types of scheduling. More detailed info can be found at the following links:

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Can we use both Fair scheduler and Capacity Scheduler in the same hadoop cluster.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply