My application is running on Google App Engine and most of requests constantly gets yellow flag due to high CPU usage. Using profiler I tracked the issue down to the routine of creating jinja2.Environment instance.
I’m creating the instance at module level:
from jinja2 import Environment, FileSystemLoader
jinja_env = Environment(loader=FileSystemLoader(TEMPLATE_DIRS))
Due to the Google AppEngine operation mode (CGI), this code can be run upon each and every request (their module import cache seems to cache modules for seconds rather than for minutes).
I was thinking about storing the environment instance in memcache, but it seems to be not picklable. FileSystemLoader instance seems to be picklable and can be cached, but I did not observe any substantial improvement in CPU usage with this approach.
Anybody can suggest a way to decrease the overhead of creating jinja2.Environment instance?
Edit: below is (relevant) part of profiler output.
222172 function calls (215262 primitive calls) in 8.695 CPU seconds
ncalls tottime percall cumtime percall filename:lineno(function)
33 1.073 0.033 1.083 0.033 {google3.apphosting.runtime._apphosting_runtime___python__apiproxy.Wait}
438/111 0.944 0.002 2.009 0.018 /base/python_dist/lib/python2.5/sre_parse.py:385(_parse)
4218 0.655 0.000 1.002 0.000 /base/python_dist/lib/python2.5/pickle.py:1166(load_long_binput)
1 0.611 0.611 0.679 0.679 /base/data/home/apps/with-the-flow/1.331879498764931274/jinja2/environment.py:10()
One call, but as far I can see (and this is consistent across all my GAE-based apps), the most expensive in the whole request processing cycle.
Armin suggested to pre-compile Jinja2 templates to python code, and use the compiled templates in production. So I’ve made a compiler/loader for that, and it now renders some complex templates 13 times faster, throwing away all the parsing overhead. The related discussion with link to the repository is here.