I’ve read that it’s bad to avoid large IN clauses, because they are slow (especially with PostgreSQL).
Say I have a class called Fridge, and a classes called Vegetables and Condiments.
Both of these have ManyToMany relationships between themselves and Fridge.
So something like:
class Fridge(models.Model):
condiments = models.ManyToManyField(Condiments)
vegetables = models.ManyToManyField(Vegetables)
And here we have a QuerySet that represents our white fridges:
qs = Fridges.objects.filter(color='white')
First question:
“Given a list of condiment IDs, get me all the fridges that have ANY of those condiments in them (modifying the original QuerySet).””
Second query:
“Given a list of vegetable IDs, get me all the fridges that have ALL of those vegetables in them (modifying the original QuerySet).”
How on earth would I do that without building a list of fridge IDs and adding an IN clause to my queryset?
Here are solutions that do it with IN clauses (name changed versions of my existing solutions):
First query:
condiment_ids = [...] # list of condiment IDs
condiments = Condiment.objects.filter(
id__in=condiment_ids).all()
condiment_fridges = None
for condiment in condiments:
qs = condiment.fridge_set.all()
if not condiment_fridges:
condiment_fridges = qs
else:
condiment_fridges = condiment_fridges | qs
qs = qs.filter(id__in=[l.id for l in condiment_fridges])
Second query:
vegetable_ids = [...] # list of vegetable IDs
vegetables = vegetable.objects.filter(id__in=vegetable_ids).all()
vegetable_fridges = None
for vegetable in vegetables:
qs = vegetable.location_set.all()
if not vegetable_fridges:
vegetable_fridges = qs
else:
vegetable_fridges = vegetable_fridges & qs
qs = qs.filter(id__in=[l.id for l in vegetable_fridges])
These solutions seem horrible and hackish and I was wondering if there was a better way to do them with Django’s ORM.
Unless I’m misunderstanding the question then all you need is:
There might be a more efficient way to find if a Fridge has all the condiments. Not tested but something like:
That could well be slower though depending on your db backend and the number of rows for each model.