For a tree structure as follows
public class Node implements Comparable<Node> {
private List<Node> nodes=new ArrayList<Node>();
private String name="";
private List<String> leaves=new ArrayList<String>();
private Node parent=null;
public List<Node> getNodes() {
return nodes;
}
public void setNodes(List<Node> nodes) {
this.nodes = nodes;
}
public List<String> getLeaves() {
return leaves;
}
public void setLeaves(List<String> leaves) {
this.leaves = leaves;
}
@Override
public int compareTo(Node o) {
return this.getName().compareTo(o.getName());
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public Node getParent() {
return parent;
}
public void setParent(Node parent) {
this.parent = parent;
}
public int getDepth() {
int depth = 0;
Node parent = this.getParent();
while (parent != null) {
depth++;
parent = parent.getParent();
}
return depth;
}
}
From a node, I wish to have a method that returns all the distinct direct and indirect leaves (In the above case the strings leaves would be the leaves), for that node in sorted order.
Above is a highly torn down data structure to easy testing and demonstration. I have tried the following 3 approaches,
Approach A
Very slow when depth is large ~20, since the deepest leaves are traversed to several times, once for each of its ancestor, hence same path is traversed multiple times.
public List<String> getLeavesDeep1() {
Set<String> leaves = new TreeSet<String>();
leaves.addAll(getLeaves());
for (Node node : getNodes()) {
leaves.addAll(node.getLeavesDeep1());
}
return new ArrayList<String>(leaves);
}
Avg: 12694 ms / Without sort/distinct> Avg: 471 ms
Approach B
Little faster than A, as the number of nodes is comparatively very less than leaves, so using the approach A but for nodes, and then for each of the nodes, getting direct leaves only.
private List<Node> getNodesDeep2() {
Set<Node> nodes = new TreeSet<Node>();
nodes.addAll(getNodes());
for (Node node : getNodes()) {
nodes.addAll(node.getNodesDeep2());
}
return new ArrayList<Node>(nodes);
}
public List<String> getLeavesDeep2() {
Set<String> leaves = new TreeSet<String>();
leaves.addAll(getLeaves());
for (Node node : getNodesDeep2()) {
leaves.addAll(node.getLeaves());
}
return new ArrayList<String>(leaves);
}
Avg: 4355 ms / Without sort/distinct> Avg: 2406 ms
Approach C
Avoid TreeSet, used ArrayList’s and sorted & filtered (not the best way to sort/distinct though) just before returning
private List<Node> getNodesDeep3() {
List<Node> nodes = new ArrayList<Node>();
nodes.addAll(getNodes());
for (Node node : getNodes()) {
nodes.addAll(node.getNodesDeep3());
}
return new ArrayList<Node>(new TreeSet<Node>(nodes));
}
public List<String> getLeavesDeep3() {
List<String> leaves = new ArrayList<String>();
leaves.addAll(getLeaves());
for (Node node : getNodesDeep3()) {
leaves.addAll(node.getLeaves());
}
return new ArrayList<String>(new TreeSet<String>(leaves));
}
Avg: 4400
Looking for something faster, I know there are certain tree traversals that can be used, but I would prefer something simpler if there exists. P.S. These is no use case for searching at the moment. In my real class the times are much higher approx 3x to the above cases, as the structure is much more complex with the leaves not being simple strings, but POJOs
Following is the test I have used to get the times
private static final int NODES = 5;
private static final int LEAVES = 25;
private static final int DEPTH = 8;
public void addChildren(Node parent) {
List<Node> nodes = new ArrayList<Node>();
List<String> leaves = new ArrayList<String>();
for (int i = 0; i < LEAVES; i++) {
leaves.add(String.format("%s_leaf_%s", parent.getName(), i));
}
for (int i = 0; i < NODES; i++) {
Node child = new Node();
child.setParent(parent);
child.setName(String.format("%s_%s", parent.getName(), i));
nodes.add(child);
if (child.getDepth() < DEPTH) {
addChildren(child);
}
}
parent.setNodes(nodes);
parent.setLeaves(leaves);
}
@Test
public void testCase() {
long start, tot=0;
long t = 0;
List<String> leaves;
Node target = new Node();
target.setName("Root");
addChildren(target);
for (int i = 0; i < 10; i++) {
start = System.currentTimeMillis();
leaves = target.getLeavesDeep5();
t = System.currentTimeMillis() - start;
tot += t;
System.out.println(leaves.size() + " " + t);
}
System.out.println("Avg: " + (tot / 10));
}
Answers in any language are acceptable including pseudo code, as long as it doesn’t tightly tie the solution to that language (Exception: Pure java code is barred from the second clause)
I ran your test and it gave me the following results (i used your version 3, one slightly modified version3 and a new version)
I first changed
to
See Is it faster to add to a collection then sort it, or add to a sorted collection?
Which gave an almost 50% reduction in execution time.
Note: The TreeSet will remove duplicates, sort will not.
I then wrote a new Iterator method combining your 2 methods into one and eliminating recursion all together. I also got rid of ArrayLists to avoid the resizing and copying which we don’t need because we only iterate and never access by index.
Edit: using ArrayList to store the leaves increases time from 800ms to about 1400ms.
I put all results into different lists and compared those at the end.
Output:
So at least on my system its about a 10 fold increase in speed.
Edit2: Skipping the sorting in case 3 brings it to 140ms. So 600ms are used comparing and sorting. Any further major improvement needs to be done there.
Edit3: Eliminating recursion also has the benefit that the depth of the tree has less impact on performance. Changing the TestTree to 2/2/20 (N/L/D) yields about the same number of leaves(2m) but performs much worse with recursion (>70k) but is not much slower (2500 from 1200) without.