I’m writing some code which processes a queue of items. The way it works is this:
- Get the next item flagged as needing
to be processed from the mysql
database row. - Request some info from a google API
using Curl, wait until the info is
returned. - Do the remainder of the processing
based on the info returned. - Flag the item as processed in the
db, move onto the next item.
The problem is that on step # 2. Google sometimes takes 10-15 seconds to return the requested info, during this time my script has to remain halted and wait.
I’m wondering if I could change the code to do the following instead:
- Get the next 5 items to be processed
as usual. - Request info for items 1-5 from
google, one after the other. - When the info for item 1 is
returned, a ‘callback’ should be
done which calls up a function or
otherwise calls some code which then
does the remainder of the processing
on items 1-5. - And then the script starts over
until all pending items in db are
marked processed.
How can something like this be achieved?
You can split this in 2 process types.
Worker process (there are many of them running): knows the database row being processed, makes and waits for Google API call, and then does the job, and saves the results to the database.
Scheduler (one and only): periodically (say, every few seconds) checks if there’s work to do, and makes sure that there are N (5 or whatever is optimal) workers running. If less then N workers are running, starts more workers (with
exec) to keep it N, until all the work is done.