I’m using PostgreSQL’s Ltree module for storing hierarchical data. I’m looking to retrieve the full hierarchy sorted by a particular column.
Consider the following table:
votes | path | ...
-------+-------+-----
1 | 1 | ...
2 | 1.1 | ...
4 | 1.2 | ...
1 | 1.2.1 | ...
3 | 2 | ...
1 | 2.1 | ...
2 | 2.1.1 | ...
4 | 2.1.2 | ...
... | ... | ...
In my current implementation, I’d query the database with SELECT * FROM comments ORDER BY path, which would return the whole tree:
Node 1
-- Node 1.1
-- Node 1.2
---- Node 1.2.1
Node 2
-- Node 2.1
---- Node 2.1.1
---- Node 2.1.2
However, I want to sort by votes (not by id, which is what sorting by path amounts to). Each depth level needs to be independently sorted, with the correct tree structure kept intact. Something that would return the following:
Node 2
-- Node 2.1
---- Node 2.1.2
---- Node 2.1.1
Node 1
-- Node 1.2
---- Node 1.2.1
-- Node 1.1
Postgres’ WITH RECURSIVE might be appropriate, but I’m not sure. Any ideas?
You were on the right track with
WITH RECURSIVE.Solution with recursive CTE
Major points
The crucial part is to replace every level of the path with the value of
votes. Thereby we assemble one column we canORDER BYat the end. This is necessary, because the path has an unknown depth and we cannot order by an unknown number of expressions in static SQL.In order to get a stable sort, I convert
votesto a string with leading zeroes usingto_char(). I use seven digits in the demo, which works for vote values below 10.000.000. Adjust according to your maximum vote count.In the final
SELECTI exclude all intermediary states to eliminate duplicates. Only the last step withmax(sort)remains.This works in standard SQL with a recursive CTE, but is not very efficient for large trees. A plpgsql function that recursively updates the sort path in a temporary table without creating temporary dupes might perform better.
Only works with the additional module ltree installed, which provides the functions
subltree()andnlevel(), as well as theltreedata type.My test setup, for review convenience:
PL/pgSQL table function doing the same
Should be faster with huge trees.
Call:
Read in the manual about setting
temp_buffers.I would be interested which performs faster with your real life data.