We have an app with highly interrelated data, i.e. there are many cases where two objects might refer to the same object via a relationship. As far as I can tell, Django does not make any attempt to return a reference to an already-fetched object if you attempt to fetch it via a different, previously unevaluated relationship.
For example:
class Customer( Model ):
firstName = CharField( max_length = 64 )
lastName = CharField( max_length = 64 )
class Order( Model ):
customer = ForeignKey( Customer, related_name = "orders" )
Then assume we have a single customer who has two orders in the DB:
order1, order2 = Order.objects.all()
print order1.customer # (1) One DB fetch here
print order2.customer # (2) Another DB fetch here
print order1.customer == order2.customer # (3) True, because PKs match
print id( order1.customer ) == id( order2.customer ) # (4) False, not the same object
When you have highly interrelated data, the degree to which accessing relationships of your objects results in repeated queries of the DB for the same data increases and becomes a problem.
We also program for iOS and one of the nice things about CoreData is that it maintains context, so that in a given context there is only ever one instance of a given model. In the example given above, CoreData would not have done the second fetch at (2), because it would have resolved the relationship using the customer already in memory.
Even if line (2) was replaced with a spurious example designed to force another DB fetch (like print Order.objects.exclude( pk = order1.pk ).get( customer = order1.customer )), CoreData would realize that the result of that second fetch resolved to an model in memory and return the existing model instead of a new one (i.e. (4) would print True in CoreData because they would actually be the same object).
To hedge against this behaviour of Django, we are kinda writing all this horrible stuff to try to cache models in memory by their (type, pk) and then check relationships with the _id suffix to try to pull them from the cache before blindly hitting the DB with another fetch. This is cutting down on DB throughput but feels really brittle and likely to cause problems if normal relationship lookups via properties accidentally happen in some contrib framework or middleware that we don’t control.
Are there any best practices or frameworks out there for Django to help avoid this problem? Has anyone attempted to install some kind of thread-local context into Django’s ORM to avoid repeat lookups and having multiple in-memory instances mapping to the same DB model?
I know that query-caching stuff like JohnnyCache is out there (and helps cut down on the DB throughput) however there is still the issue of multiple instances mapping to the same underlying model even with those measures in place.
David Cramer’s django-id-mapper is one attempt to do this.