Has the following problem got a name and is there an algorithm to solve it? :
Given a graph, either directed or not, find all the paths which satisfy the specification given by
- a list of exact nodes, or
- ‘*?’ which denotes just ‘any node or no node at all’, or
- ‘*{n}’ which denote ‘any n consecutively connected nodes’
e.g.
A -> B -> *? -> D which results in ABXD and ABYD and ABD etc.
or
A -> *{1} -> D -> *? -> E which results in ABXDZE and ABYDZE and ABDZE etc. etc.
thanks
p.s.
Does anyone know a graph library doing this either in R or perl or C?
What I did at the end was:
For example
the resultant data structure holding the input data in perl notation looks like this after reading in all the data as an ‘edgelist’:
finding if a pair of nodes is DIRECTLY connected can be done roughly as (perl):
In the above function there is provision to return all the nodes a ‘from’ node is connected to, by setting $to to ‘*’. The return is an array ref of nodes connected directly to the $from parameter.
Searching for the path between two nodes requires using the above function recursively.
e.g.
}
It’s ok to use recursion if the depth is not too much (i.e. $hops < 6 ?) because of stack overflow [sic].
The most tricky part is to read through the results and extract the nodes for each path. After a lot of deliberation I decided to use a Tree::Nary (n-ary tree) to store the results. At the end we have the following tree:
In order to extract all the paths, do:
The above was implemented using perl, but have also done it in C++ using boost::unordered_map for hashtable. I haven’t yet added a tree structure in then C++ code.
Results: for 3281415 edges and 18601 unique nodes, perl takes 3 mins to find A->’*’->’*’->B. I will give an update on the C++ code when ready.