I am attempting to build a job queue using two redis master servers in two EC2 availability zones. All LPUSH operations are done in the application layer to both master machines in both AZs. Ideally I would be using GitHub’s resque, but resque does not seem to have any notion of multiple masters in multiple AZs.
I need to ensure only one worker is working on a given job. Some workers will be in AZ 1A talking to the redis machine in 1A, and some will be in AZ 1B talking to the machine in 1B. I need to avoid the scenario where a worker in 1A and a worker in 1B both deque the same job from different redis masters and try to work on it simultaneously.
Does this worker pseudocode have any race conditions that I may have missed?
job_id = master1.BRPOPLPUSH "queue", "working"
m1lock = master1.SETNX "lock.#{job_id}"
m2lock = master2.SETNX "lock.#{job_id}"
completed = master1.ZSCORE "completed", job_id
if completed
# must have been completed just now on other server, no-op
master1.LREM "working", 0, job_id
master1.del "lock.#{job_id}"
master2.del "lock.#{job_id}"
elsif not m1lock or not m2lock
# other server is working on it? We will put back at the end of our queue
master1.LPUSH "queue", job_id
master1.LREM "working", 0, job_id
master1.del "lock.#{job_id}" if m1lock
master2.del "lock.#{job_id}" if m2lock
else
# have a lock, it's not complete, so do work
do_work(job_id)
now = Time.now.to_i
master1.ZADD "completed", now, job_id
master2.ZADD "completed", now, job_id
master1.del "lock.#{job_id}"
master2.del "lock.#{job_id}"
master1.LREM "working", 0, job_id
master2.LREM "queue", 0, job_id # not strictly necessary b/c of "completed"
end
what you are trying to do in essence is master-master replication, whether it’s a queue or anything else, redis doesn’t support it, and your pseudo code has race conditions.
just doing:
means another worker can take the job while you are doing this, and two workers will work on it at once.
I don’t think redis is ideal for your pattern, and I don’t know any queue server that can work that way, but then again, I don’t know many such servers, so I’m sure there is.
If you load balance your work so that only one master gets a job at once, it is possible, but then you have two queues in essence, not one.