I’m using delayed_jobs with mongomapper. However, it’s slow when fetching delayed_jobs records (around 500k records).
I’m running to create indexes { locked_by: -1, priority: 1, run_at: 1 }, but it doesn’t help.
I really don’t know which indexes to improve the query. Each fetching takes around 2 seconds.
Here is the mongodb log:
Tue Dec 13 09:52:38 [conn497] query api_production.$cmd ntoreturn:1 command: {
findandmodify: "delayed_jobs", query: { run_at: { $lte: new Date(1323769957289) }, failed_at:
null, $or: [ { locked_by: "host:ip-10-128-145-246 pid:26157" }, { locked_at: null }, {
locked_at: { $lt: new Date(1323769057289) } } ] }, sort: { locked_by: -1, priority: -1,
run_at: 1 }, update: { $set: { locked_at: new Date(1323769957289), locked_by: "host:ip-10-
128-145-246 pid:26157" } } } reslen:699 1486ms
Your indexes don’t match the query. Your query first eliminates candidates based on
run_at, so that should be your first index, but it’s not.Then comes a rather inelegant
$orclause. Now it will be hard to choose an appropriate index, because two criteria arelocked_atwhile one islocked_by.To make matters worse, there are three sort criteria, but they are exactly reverse of the direction of the query constraints. Also, you’re sorting on a rather lengthy string.
Basically, I think the query is not very well designed, it tries to accomplish too much in a single query. I don’t know if
delayed_jobsis some kind of module, but it would be much easier if the rules were simpler. Why does a worker lock so many jobs, for instance? In fact, I think it’s best if you only lock the job you’re currently working on and have different workers fetch different job types for scaling. The workers might want to use uuids instead of using their ip address and pid (with a prefix that adds no entropy and no selectivity), etc.