Simplified example:
I have a to-do. It can be future, current, or late based on what time it is.
Time State
8:00 am Future
9:00 am Current
10:00 am Late
So, in this example, the to-do is “current” from 9 am to 10 am.
Originally, I thought about adding fields for “current_at” and “late_at” and then using an instance method to return the state. I can query for all “current” todos with now > current and now < late.
In short, I’d calculate the state each time or use SQL to pull the set of states I need.
If I wanted to use a state machine, I’d have a set of states and would store that state name on the to-do. But, how would I trigger the transition between states at a specific time for each to-do?
- Run a cron job every minute to pull anything in a state but past the transition time and update it
- Use background processing to queue transition jobs at the appropriate times in the future, so in the above example I would have two jobs: “transition to current at 9 am” and “transition to late at 10 am” that would presumably have logic to guard against deleted todos and “don’t mark late if done” and such.
Does anyone have experience with managing either of these options when trying to handle a lot of state transitions at specific times?
It feels like a state machine, I’m just not sure of the best way to manage all of these transitions.
Update after responses:
- Yes, I need to query for “current” or “future” todos
- Yes, I need to trigger notifications on state change (“your todo wasn’t to-done”)
Hence, my desire to more of a state-machine-like idea so that I can encapsulate the transitions.
One simple solution for moderately large datasets is to use a SQL database. Each todo record should have a “state_id”, “current_at”, and “late_at” fields. You can probably omit the “future_at” unless you really have four states.
This allows three states:
Storing the state as
state_id(optionally make a foreign key to a lookup table named “states” where1: Future,2: Current,3: Late) is basically storing de-normalized data, which lets you avoid recalculating the state as it rarely changes.If you aren’t actually querying todo records according to state (eg
... WHERE state_id = 1) or triggering some side-effect (eg sending an email) when the state changes, perhaps you don’t need to manage state. If you’re just showing the user a todo list and indicating which ones are late, the cheapest implementation might even be to calculate it client side. For the purpose of answering, I’ll assume you need to manage the state.You have a few options for updating state_id. I’ll assume you are enforcing the constraint
current_at < late_at.The simplest is to update every record:
UPDATE todos SET state_id = CASE WHEN late_at <= NOW() THEN 3 WHEN current_at <= NOW() THEN 2 ELSE 1 END;.You probably will get better performance with something like (in one transaction)
UPDATE todos SET state_id = 3 WHERE state_id <> 3 AND late_at <= NOW(),UPDATE todos SET state_id = 2 WHERE state_id <> 2 AND NOW() < late_at AND current_at <= NOW(),UPDATE todos SET state_id = 1 WHERE state_id <> 1 AND NOW() < current_at. This avoids retrieving rows that don’t need to be updated but you’ll want indices on “late_at” and “future_at” (you can try indexing “state_id”, see note below). You can run these three updates as frequently as you need.Slight variation of the above is to get the IDs of records first, so you can do something with the todos that have changed states. This looks something like
SELECT id FROM todos WHERE state_id <> 3 AND late_at <= NOW() FOR UPDATE. You should then do the update likeUPDATE todos SET state_id = 3 WHERE id IN (:ids). Now you’ve still got the IDs to do something with later (eg email a notification “20 tasks have become overdue”).Scheduling or queuing update jobs for each todo (eg update this one to “current” at 10AM and “late” at 11PM) will result in a lot of scheduled jobs, at least two times the number of todos, and poor performance — each scheduled job is updating only a single record.
You could schedule batch updates like
UPDATE state_id = 2 WHERE ID IN (1,2,3,4,5,...)where you’ve pre-calculated the list of todo IDs that will become current near some specific time. This probably won’t work out so nicely in practice for several reasons. One being some todo’scurrent_atandlate_atfields might change after you’ve scheduled updates.Note: you might not gain much by indexing “state_id” as it only divides your dataset into three sets. This is probably not good enough for a query planner to consider using it in a query like
SELECT * FROM todos WHERE state_id = 1.The key to this problem that you didn’t discuss is what happens to completed todos? If you leave them in this todos table, the table will grow indefinitely and your performance will degrade over time. The solution is partitioning the data into two separate tables (like “completed_todos” and “pending_todos”). You can then use
UNIONto concatenate both tables when you actually need to.