(I shall try to keep this question as short as possible while clearly describing the situation. Please comment if anything is missing.)
The situation
- I’m running a cluster with three servers in the same datacenter
- To ease deployment, each server runs exactly the same application code
The objective
- To run a single task (call it Task X) every minute by a single server.
Under these conditions
- The cluster remains distributed and highly available
- Each server keeps running the same application code. In other words there is no such thing as “deploy code A to a master server and deploy code B to all secondary servers.
The reason I do not wish to distinguish between the kind of server is to maintain high availability (avoid problems when a so called master goes down), redundancy (distribute load), and to avoid creating a complex deployment procedure where I need to deploy different applications to different kinds of servers.
Why is this so difficult? If I were to add code that would execute this task every 5 minutes, then each server would execute it because each server runs the same application code. As such they need to be able to coordinate which server is going to run the same during each tick.
I am able to use distributed messaging mechanisms such as Apache Kafka or Redis. If using such mechanism to coordinate such task, how would such “algorithm” work?
I posed this question to someone else, his reply was to use a task queue. However, this does not seem to solve the issue, because the question remains: which server is going to add the task to the task queue? If all servers would add the task to the queue then it would result in duplicate entries. Moreover, which server is going to execute the next task in the queue? All this needs to be decided through coordination within the cluster, without distinguishing between different kinds of servers.
It sounds like you are looking for a distributed lock. Redis does this wonderfully with
setnx. If you combine it withexpirethen you can create global locks that are released every N Seconds.setnxwill only write the value and return true if the key does not already exist. Redis operations are atomic so only the first server to callsetnxafter the key expires will get the go ahead to run the task.Here is an example in ruby:
With this you are still reliant on calling one server
Redis Masterunless you take advantage ofredis-sentinel. Take a look at the redis-sentinel docs for information on how to configure automatic fallover.