Say there is a table which stores a hierarchical structure like this:
item_id | hierarchical_id
--------+-----------------
1 | ;1;
2 | ;1;2;
3 | ;1;2;3;
4 | ;1;2;4;
5 | ;1;2;4;5;
The hierarchy stored here is 1 as root, 2 is a child of 1, 3 and 4 are children of 2 and 5 is the child of 4.
The query
SELECT
-- the substr is used to remove the first and last semicolumns
regexp_split_to_table(substr(hierarchical_id, 2, length(hierarchical_id) - 2)
, E';'
) as parent_id,
item_id,
hierarchical_id
FROM
table
returns
parent_id | item_id | hierarchical_id
----------+---------+-----------------
1 | 1 | ;1;
1 | 2 | ;1;2;
2 | 2 | ;1;2;
1 | 3 | ;1;2;3;
3 | 3 | ;1;2;3;
1 | 4 | ;1;2;3;
2 | 4 | ;1;2;4;
4 | 4 | ;1;2;4;
1 | 5 | ;1;2;4;5;
2 | 5 | ;1;2;4;5;
4 | 5 | ;1;2;4;5;
5 | 5 | ;1;2;4;5;
How can I modify the query to get a 4th column like this:
parent_id | item_id | hierarchical_id | distance
----------+---------+-----------------+---------
1 | 1 | ;1; | 0
1 | 2 | ;1;2; | 1
2 | 2 | ;1;2; | 0
1 | 3 | ;1;2;3; | 2
2 | 3 | ;1;2;3; | 1
3 | 3 | ;1;2;3; | 0
1 | 4 | ;1;2;4; | 2
2 | 4 | ;1;2;4; | 1
4 | 4 | ;1;2;4; | 0
1 | 5 | ;1;2;4;5; | 3
2 | 5 | ;1;2;4;5; | 2
4 | 5 | ;1;2;4;5; | 1
5 | 5 | ;1;2;4;5; | 0
The meaning of distance is the distance between the item_id and the parent_id on the current row. Eg: the distance between a node and itself is 0, and the distance between a node and its parent is 1, the distance between a node and its parent parent is 2 etc. It does not have to start at 0.
row_number would work fine if I could get it to restart at 0 for each group of equal item_ids, since the ids in hierarchical_id are ordered.
Any suggestions?
Window functions give you lots of control; see 4.2.8. Window Function Calls.
The key thing you need is:
Given data:
the query:
produces:
which looks roughly right, but since your expected output doesn’t seem to match the output of the query you provided when I run it (Pg 9.1) it’s hard to know for sure.