I’ve known the algorithm to find the diameter of a tree mentioned here for quite some time:
- Select a random node A
- Run BFS on this node to find furthermost node from A. name this node as S.
- Now run BFS starting from S, find the furthermost node from S, name it D.
- Path between S and D is diameter of the tree.
But why does it work?
I would accept both Ivan‘s and coproc‘s answer if I can. These are 2 very different approaches that both answer my question.
Suppose you’ve completed steps 1 and 2 and have found S, and that there is no diameter in the tree that includes S. Pick a diameter PQ of the tree. You basically have to check the possible cases and in all of them, find that either PS or SQ is at least as long as PQ – which would be a contradiction.
In order to systematically check all cases, you can assume that the tree is rooted at A. Then the shortest path between any two vertices U and V is calculated in the following way – let W be the lowest common ancestor of U and V. Then the length of UV is equal to the sum of the distances between U and W and between V and W – and, in a rooted tree, these distances are just differences in the levels of the nodes (and S has a maximum level in this tree).
Then analyze all possible positions S could take with respect to the subtree rooted at W (lowest common ancestor of P and Q) and the vertices P and Q. For example, the first case is simple – S is not in the subtree rooted at W. Then, we can trivially improve the path by selecting the one of P and Q that is more distant to the root, and connecting it to the S. The rest of the cases are similar.