The Background
Our clients use a service for which they set daily budget. This is a prepaid service and we allocate a particular amount from user’s budget every day.
Tables:
- budgets – how much we are allowed to spend per day
- money – clients real balance
- money_allocated – amount deducted from money that can be spent today (based on budgets)
There is a cron job that runs every few minutes and checks:
- if user has money_allocated for a given day
- if money_allocated >= budgets (user may increase budget during the day)
In the first case we allocate full amount of daily budget, in the latter – the difference between budget and already allocated amount for that day (in this case we create additional record in money_allocated for the same day).
Allocation has two stages – in the first round we add a row with status “pending” (allocation requested) and another cron checks all “pending” allocations and moves money from money to money_allocated if user has enough money. This changes status to “completed”.
The Problem
We have a cluster of application servers (under NLB) and above cron job runs on each of them which means that money can accidentally be allocated multiple times (or not allocated at all if we implement wrong “already allocated” triggers).
Our options include:
- Run cron job on one server only – no redundancy, client complaints and money lost on failure
- Add a unique index on money_allocated that goes like (client_id, date, amount) – won’t allocate more money for a given day if client doubles the budget or increases it multiple times by the same amount during the day
There is an option to record each movement in budgets and link all allocations to either “first allocation of the day” or “change of budget (id xxx)” (add this to the unique index as well). This does not look sexy enough, however.
Any other options? Any advice would be highly appreciated!
Ok, so I ended up running this on one of the cluster’s instances. If you use Amazon AWS and are in a similar situation, below is one of the options..
On each machine, at the beginning of your cron job’s code, do the following:
describe_load_balancers(AWS API), parse the response to get a list/array of all instanceshttp://169.254.169.254/latest/meta-data/instance-id– this returns instance ID of the machine that is sending requestAlso, be sure to automatically replace unhealthy instances under this load balancer in short time as
describe_load_balancersreturns a list of both healthy and unhealthy instances. You may end up with a job not being done for a while if instance #1 goes down.