I’m using a small tree/graph package (django_dag) that has that give my model a Many-to-many children field which refers to itself. The basic structure can be shown as the following models
#models
class Foo(FooBase):
class Meta:
abstract = True
children = models.ManyToManyField('self', symmetrical = False,
through = Bar)
class Bar():
parent = models.ForeignKey(Foo)
child = models.ForeignKey(Foo)
All is fine with the models and all the functionality of the package.
FooBase adds a variety of functions to the model, including a way of recursively finding all children of a Foo and the children’s children and so forth.
My concern is with the following function within FooBase:
def descendants_tree(self):
tree = {}
for f in self.children.all():
tree[f] = f.descendants_tree()
return tree
It outputs something like {Foo1:{}, Foo2: {Child_of_Foo2: {Child_of_Child_of_Foo2:{}}}} where the progeny are in a nested dictionary.
The alert reader may notice that this method calls a new query for each child.
While these db hits are pretty quick, they can add up quickly when there might be 50+ children. And eventually, there will be tens of thousands of db entries. Right now, each query averages 0.6 msec with a row count of almost 2000.
Is there a more efficient way of doing this nested query?
In my mind, doing a select_related().all() beforehand would get it down to one query but that smells like trouble in the future. At what point is one large query better or worse than many small ones?
—Edit—
Here’s what I’m trying to test the select_related().all() option with, but it’s still hitting every iteration:
all_foo = Foo.objects.select_related('children').all()
def loop(baz):
tree = {}
for f in all_foo.get(id = baz).children.all()
tree[f] = loop(f)
return tree
I assume the children.all() is causing the hit. Is there another way to get all of Many-to-Many children without using the callable attribute?
You’ll have to test under your own environment with your own circumstances.
select_relatedis generally always recommended, but in cases where there will be many recursive levels, that one large query is generally slower than the multiple queries.The amount of children doesn’t really matter, the levels of recursion is what matters most. If you’re doing 3 or so,
select_related()might be better, but much more than that would likely result in a slow down. The plugin author likely did it this way to allow for many, many levels of recursion, because it only really hurts when there’s just a few, and that’s only a few extra queries.