I have a possibly large rooted tree structure that I want to transform into a X * Y matrix with X being the amount of leaves in the tree and Y being the amount of nodes in the tree with a degree larger than 1, i.e. the root node and internal nodes. The matrix should be filled as such:
Mi,j = { 0 if leaf i has ancestor j, 1 otherwise
For example, this tree:
--A
/
1 B
/ \ /
/ 3
/ \
0 C
\
\ --D
\ /
2
\--E
would translate into this matrix:
0 1 2 3
A T T F F
B T T F T
C T T F T
D T F T F
E T F T F
Since the trees might become pretty big (possibly ~100,000 leaves), I was wondering if there is a smarter/faster way of doing this than traversing up the tree for every one of the leaf nodes. It feels like some kind of algorithm is in this problem somewhere, but I haven’t figured it out yet. Maybe someone can help?
In my application, the tree represents large phylogenetic hierarchies, so it is not balanced and there might be nodes with more than two children.
I’d go with post-order traversal.
Maintain lists of leaves while traversing the tree, and in each level – the list will contain all the leaves up to this level.
decalrations for functions we will use:
Pseudo code:
Time is
O(n*m)– for setting the matrix (though the algorithm itself isO(nlogn)time for a balanced tree).If you want to prevent the
O(n*m)initialization you can initialize the matrix inO(1), and then run the algorithm above inO(nlogn). Though it will give better asymptotic complexity, I doubt it will actually be faster.