I’ve written an algorithm which calculates and stores all paths of a DAG, and it works nicely on small graphs – but now i’m looking to improve it’s efficiency to run over larger graphs. The core logic of the algorithm is in createSF() and makePathList() below, the other methods are helpers – I can see that append is a bottleneck. However, I guess the biggest help would be to devise a data structure that can store paths in a dictionary, since many of the paths are composed of other paths, this is the crux of my question.
private Multiset<String> paths = new Multiset<String>();
public Multiset<String> createSF(DAGNode n) {
for (DAGNode succ : n.getSuccessors())
createSF(succ);
if (!n.isVisited())
for (String s : makePathList(n))
paths.put(s);
n.setVisited(true);
return paths;
}
private List<String> makePathList(DAGNode n) {
List<String> list = new ArrayList<String>();
list.add(n.getLabel());
for (DAGNode node : n.getSuccessors())
list.addAll(append(n.getLabel(), makePathList(node)));
return list;
}
private List<String> append(String s, List<String> src) {
List<String> ls = new ArrayList<String>();
for (String str : src)
ls.add(s + "/" + str);
return ls;
}
EDIT:
I’m now using a path object to represent paths and have pin-pointed the bottle neck to these two methods:
public List<Path> createPathList(Tree n) {
List<Path> list = new ArrayList<Path>();
list.add(new Path(n.getNodeName()));
for (Tree node : n.getSuccessors()) {
list.addAll(append(n.getNodeName(), createPathList(node)));
}
return list;
}
public List<Path> append(String s, List<Path> src) {
List<Path> ls = new ArrayList<Path>();
for (Path path : src) {
ls.add(new Path(path, s));
}
return ls;
}
Trouble is when a graph is size M these methods will be called M times, this means there is a lot of lists being created here… is there a more efficient way to build up the return for createPathList()?
In order to answer this question, it is necessary to understand why do you need the list of paths. The list of paths does not give you any additional information over what you have in the DAG representation.
If you want to calculate things for each path separately, or calculate something like sum/min/max over all paths, it too could be done using the DAG itself.
If you do insist on saving separate paths, one option would be to convert your DAG into a variant of a Trie. Another option could be to use some variant of the Lempel-Ziv representation. It depends on your DAG types, and what you expect to do with the paths information.