Suppose that one has set up a cassandra cluster. You’ve got a 10[TB] database

Question

0

Asked: May 21, 20262026-05-21T11:59:38+00:00 2026-05-21T11:59:38+00:00

Suppose that one has set up a cassandra cluster. You’ve got a 10[TB] database

0

Suppose that one has set up a cassandra cluster. You’ve got a 10[TB] database that is distributed evenly between 10 nodes, everything runs smoothly etc.

Suppose that you have 100 machines at your disposal, each trying to read (different) data from the cassandra cluster. in addition, you have many jobs that constantly need to be run, each job at a different time (and obviously, each job needs to be run on a different machine).

How do you manage all these tasks/jobs? how do you distribute the tasks between the machines? how do you keep track of the jobs / machines in the process?

Are there any open-source tools (preferably, with a Python client) that help doing it in a Linux environment?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-21T11:59:38+00:00

What you need is a Grid/HPC Framework to handle your distributed infrastructure and to run jobs.

In unix/linux there are two systems that might of good use for you. Portable Batch Systems (PBS) or Condor

How do you manage all these
tasks/jobs?

Both Condor and PBS have a master need to act as receptor of every Job/Task, for every Job/Task you can associate level of priority and discriminators. The administrator of the cluster sets up rules based on those discriminators to schedule the jobs.

how do you distribute the
tasks between the machines?

Condor or PBS will do it for you, you only need to submit the job to the master node and specify priority, inputs and outputs, etc.

You can periodically check for when a job is finished, subscribe for notification via different mechanisms or do a sort of job.wait() to block till its finished.

how do you
keep track of the jobs / machines in
the process?

Both PBS and Condor have top alike commands to list jobs that are queued in wait, or running, or cancel. They also have utilities to stop or cancel a job if the process allows snapshots.

For a large cluster, my advice is to try Condor. It’s been there for ages to solve problems exactly like they one you have. Here there are some examples for Condor + Python

Other more recent solutions to consider are:

Celery a distributed task queue for Python.
DiscoProject a distributed computing framework based on the MapReduce paradigm.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Suppose that one has set up a cassandra cluster. You’ve got a 10[TB] database

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply