I’m attempting to make a Flask web application where you have to request the entirety of a non-local website and I was wondering if it was possible to cache it for the purposes of speeding things up, because the website does not change that often but I still want it to update the cache once a day or so.
Anyway, I looked it up and found Flask-Cache, which seemed to do what I wanted so I made appropriate changes to it, and came up with adding this:
from flask.ext.cache import Cache
[...]
cache = Cache()
[...]
cache.init_app(app)
[...]
@cache.cached(timeout=86400, key_prefix='content')
def get_content():
return lxml.html.fromstring(urllib2.urlopen('http://WEBSITE.com').read())
and then I make a call from the functions that need the content to proceed like so:
content = get_content()
Now I’d expect it to reuse the cached lxml.html object everytime a call is made, but that’s not what I’m seeing. The id of the object changes every time a call is made and there’s no speed-up at all. So have I misunderstood what Flask-Cache does, or am I doing something wrong here? I’ve tried using the memoize decorator instead, I’ve tried decreasing the timeout or removing it all together but nothing seems to be making anything difference.
Thanks.
The default
CACHE_TYPEisnullwhich gives you aNullCache– so you get no caching at all which is what you observe. The documentation does not make this explicit, but this line in the source ofCache.init_appdoes:To actually employ some caching, initialise your
Cacheinstance to use a proper cache.Aside: Note that
SimpleCacheis great for development and testing, and this example, but you shouldn’t use it in production. Something likeMemCachedorRedisCachewould be much betterNow, with an actual cache in place, you will run into the next problem. On the second call, the cached
lxml.htmlobject will be retrieved from theCache, but it is broken because these objects are not cacheable. Stacktrace looks like this:So instead of caching the
lxml.htmlobject, you should just cache the simple string – the content of the website that you downloaded, and then reparse that to get a freshlxml.htmlobject every time. Your cache still helps as you don’t hit the other website every time. Here is a full program to demonstrate that solution which works:When I run the program, and make two requests to
http://127.0.0.1:5000/, I get this output. Note thatget_contentis not called the second time, because the content is served from cache.