I need to iterate over large collection (3 * 10^6 elements) in Django to do some kind of analysis that can’t be done using single SQL statement.
- Is it possible to turn off collection caching in django? (Caching all the data is not to be acceptable data has around 0.5GB)
- Is it possible to make django fetch collection in chunks? It seems that it tries to pre fetch whole collection in to the memory and then iterate over it. I think that observing the speed of execution:
iter(Coll.objects.all()).next()– this takes foreveriter(Coll.objects.all()[:10000]).next()– this takes less than a second
Use
QuerySet.iterator()to walk over the results instead of loading them all first.