I have a large directed, acylic graph (DAG) from which I would like to efficiently draw a sample node according to the following criteria:
- I specify a fixed node A that must never be sampled
- Nodes that directly or indirectly refer to A are never sampled
- All other nodes are sampled with equal probability
Nodes are stored as objects with pointers to the other nodes that they refer to, the entire graph can be reached from a single root node that refers to everything else directly or indirectly.
Is there a good algorithm to do this? Ideally without requiring large amounts of additional memory since the DAG is large!
The only solution I can come up with is to
put the nodes in a hash set
(traverse them from the root using, say, a breadth first traversal), O(|E|+|V|)
start from node A and remove all predecessors by traversing the edges backwards
(again O(|E|+|V|))
select a random node from the remaining nodes.
This would result in a O(|E|+|V|) algorithm with a O(|V|) memory requirement.
Note that you wouldn’t have to copy the nodes in step 1, only save a reference to the node.