Say you were writing a daemon that services a job queue. Various other software writes jobs for the daemon into the queue. The daemon polls the queue every few seconds for pending jobs. Assume the queue is implemented as a table in a MySQL database and that the daemon is a simple loop:
- get all due jobs from the queue
- do the jobs
- sleep for N seconds
- goto 1
The daemon must survive interrupted service from the MySQL DB server and disruption to DB connections.
Would you design the daemon to connect to the DB server once per cycle? i.e. connect before 1. and disconnect between 2 an 3?
Or would you have daemon keep a connection open? In which case it needs also to a) detect when the server or connection is not working, b) disconnect and reconnect, and c) do so without accumulating DB connections, dud connection descriptors or other dead resources.
If you have a preference, why?
Pros and cons?
Factors that enter into the design?
Any other approaches?
The answer here: mysql connection from daemon written in php doesn’t say why it’s better to keep the connection open. I read elsewhere that the per-connection overhead in MySQL is very light. So it isn’t obvious why permanently consuming one server connection is better than connecting/disconnecting every few seconds.
In my case the daemon is written in PHP.
I’m actually work on something very close to what you described, but in my case the daemon doesn’t poll for event it get’s them asynchronously via XMPP (but that’s besides the point).
Cut out the Middle Man
I think instead of storing the events in the database and polling them w/ MySQL you could probably use Gearman to send them from the client asynchronously (example).
Garbage Collection
PHP isn’t really designed to run as a daemon, and it wasn’t until PHP 5.3 when it got circular reference garbage collection that it became a viable option. It’s very important that you use PHP 5.3 if you want any chance at long term running without memory leaks.
Another thing to keep in mind about GC is that memory is only free if it’s not longer referenced (anywhere). So if you assign a variable to the global scope, it’ll be there until the daemon exits. It’s important that any code you create or use doesn’t built up variables in places (ie, static log, not removing old data, etc).
Stat Cache
Another thing is that it’s important to run
clearstatcacheevery so often. Since your PHP process isn’t restarted it’s important to do this call manually to prevent getting old stat data (which may or may not effect you). According to the documentation these functions are cached.Resource management
If your going to be using thing like MySQL during the lifetime of your process, I’d suggest making one connection at startup and keeping it alive. Even though it might use more ram on the MySQL side, you’ll cut out some latency and CPU overhead by not having to connect every 1 second.
No URL request
This may seem obvious, but with CLI PHP there is no URL request info. Some libraries aren’t written with this in mind, and this can cause some problems.
LooPHP
I’m going to pop a shameless plug in here for a framework I wrote to help with the management of PHP daemons. LooPHP is a run loop framework that lets you schedule event to happen or create listens for abstract sources (socket, stream, etc). In my case, I have the daemon doing more than 1 thing, so it’s very helpful to have system keep track of all the timers for me so that I can effectively poll
stream_selectfor the XMPP connection.