I’m trying to optimize database queries for a Django app. Here’s a simplified example:
class Label(models.Model):
name = models.CharField(max_length=200)
# ... many other fields ...
class Thing(models.Model):
name = models.CharField(max_length=200)
labels = models.ManyToManyField(Label)
I have a function that fetches all Labels and Things and puts them into a JSON data structure, in which Things refer to Labels using their ids (primary keys). Something like this:
{
'labels': [
{ 'id': 123, 'name': 'label foo' },
...
],
'things': [
{ 'id': 45, 'name': 'thing bar', 'labels': [ 123, ... ] },
...
]
}
What is the most efficient way of obtaining such a data structure using Django? Suppose I have L Labels and T Things, and the average Thing has x Labels.
Method 1:
data = {}
data['labels'] = [model_to_dict(label) for label in Label.objects.all()]
data['things'] = [model_to_dict(thing) for thing in Thing.objects.all()]
This makes (1 + 1 + T) database queries, since model_to_dict(thing) needs to fetch the Labels for each Thing individually.
Method 2:
data = {}
data['labels'] = [model_to_dict(label) for label in Label.objects.all()]
data['things'] = [model_to_dict(thing) for thing in
Thing.objects.prefetch_related('labels').all()]
This makes (1 + 1 + 1) database queries only, since the Things fetched now have their Labels prefetched in a single additional query.
This is still not satisfactory. prefetch_related('labels') will fetch many copies of the same Label, whereas I only need their ids. Is there any way to prefetch the ids of the Labels only? I tried prefetch_related('labels__id') but that didn’t work. I am also concerned that because T is large (hundreds), prefetch_related('labels') results in a SQL query with a large IN clause. L is much smaller (< 10), so I could do this instead:
Method 3:
data = {}
data['labels'] = [model_to_dict(label) for label in
Label.objects.prefetch_related('thing_set').all()]
things = list(Thing.objects.all())
# plug in label ids by hand, and also fetch things that have zero labels
# somehow
This results in a smaller IN clause, but is still not satisfactory because prefetch_related('thing_set') fetches duplicate Things, if a Thing has multiple Labels.
Summary:
Label and Thing are connected by a ManyToManyField. I am fetching all Labels and Things anyway. So how do I also fetch their many-to-many relationship efficiently?
I got it. Thanks to ilvar, whose comment to the question pointed me to
throughtables.Long story short:
This makes (1 + 1 + 1) queries, and does not fetch anything repeatedly. I can also change the first for loop to:
in case I have more
Labels thanThings, which will result in a query with a smallerINclause.Django-debug-toolbar‘s
debugsqlshellmanagement command is superb for actually seeing the queries that a piece of code is making.