Newb quesion about Django app design:
Im building reporting engine for my web-site. And I have a big (and getting bigger with time) amounts of data, and some algorithm which must be applied to it. Calculations promise to be heavy on resources, and it would be stupid if they are performed by requests of users. So, I think to put them into background process, which would be executed continuously and from time to time return results, which could be feed to Django views-routine for producing html output by demand.
And my question is – what proper design approach for building such system? Any thoughts?
Celery is one of your best choices. We are using it successfully. It has a powerful scheduling mechanism – you can either schedule tasks as a timed job or trigger tasks in background when user (for example) requests it.
It also provides ways to query for the status of such background tasks and has a number of flow control features. It allows for a very easy distribution of the work – i.e your celery background tasks can be run on a separate machine (this is very useful for example with heroku web/workers split where web process is limited to max 30s per request). It provides various queue backends (it can use database, rabbitMQ or a number of other queuing mechanisms. With simplest setup it can use the same database that your Django site already uses for that (which makes it easy to setup).
And if you are using automated tests it also has a feature that helps with testing – it can be set in “eager” mode, where background tasks are not executed in background – thus giving predictable logic testing.
More info here: http://docs.celeryproject.org:8000/en/latest/django/