I have the following data set, which represents nodes in a directed graph.
CREATE TABLE nodes (NODE_FROM VARCHAR2(10),
NODE_TO VARCHAR2(10));
INSERT INTO nodes VALUES('GT','TG');
INSERT INTO nodes VALUES('GG','GC');
INSERT INTO nodes VALUES('AT','TG');
INSERT INTO nodes VALUES('TG','GC');
INSERT INTO nodes VALUES('GC','CG');
INSERT INTO nodes VALUES('TG','GG');
INSERT INTO nodes VALUES('GC','CA');
INSERT INTO nodes VALUES('CG','GT');
Visual representation:
http://esser.hopto.org/temp/image1.JPG
Using this data set, I want a user to enter a level (e.g. 2) and this returns all nodes 2 “hops” away from a specific node):
NODE_FROM NODE_TO
TG GC
TG GG
AT TG
GT TG
http://esser.hopto.org/temp/image2.JPG
My current attempt looks like this:
SELECT node_from, node_to
FROM nodes
WHERE level <= 2 -- Display nodes two "hops" from 'AT'
START WITH node_from = 'AT'
CONNECT BY NOCYCLE PRIOR node_to = node_from
OR node_to = PRIOR node_from
GROUP BY node_from, node_to;
http://esser.hopto.org/temp/image3.JPG
As you can see, the relationship: GT -> TG is missing.
So your graph looks like this:
You can use Oracle’s
START WITH/CONNECT BYfeature to do what you want. If we start at node GA, we can reach all nodes in the graph, as shown below.Output:
NOTE
Since your graph has cycles, it’s important to use the
NOCYCLEsyntax on theCONNECT BY, otherwise this won’t work.EDITED ANSWER BASED ON LATEST EDITS BY OP
First of all, I assume that by “2 hops” you mean “at most 2 hops”, because your current query is using
level <= 2. If you want exactly 2 hops, it should belevel = 2.In your updated graph (image2.JPG), there is no path from AT to GT that takes 2 hops, so the query is returning what I would expect. From AT to GT, we can go
AT->TG->GC->CG->GT, but that’s 4 hops, which is greater than 2, so that’s why you aren’t getting that result back.If you are expecting to be able to reach AT to GT in 2 hops, then you need to add an edge between TG and GT, like this:
Now when you run your query, you’ll get this data back:
NODE_FROM NODE_TO
AT TG
TG GC
TG GG
TG GT
Remember that
START WITH/CONNECT BYis going to only work if there is a path between the nodes. In your graph (before I added the new edge above), there is no path forAT->TG->GT, so that’s why you’re not getting the result back.Now, if you added the edge
TG->AT, then we would have the pathGT->TG->AT. So in that case AT is 2 hops away from GT (i.e. we’re going the reverse way now, starting from GT and ending at AT). But to find those paths, you would need to set START WITH node_from = ‘GT’.If your goal is to find all paths from a start node to any target node that is level <= 2 hops or less away, then the above should work.
However, if you want to all find all paths from some target node back to a source node (i.e. the reverse example I gave, from
GT->TG->AT), then that’s not going to work here. You’d have to run the query for all nodes in the graph.Think of
START WITH/CONNECT BYas doing a depth first search. It’s going to go everywhere it can from a starting node. But it’s not going to do any more than that.Summary:
I think the query works fine, given the constraints above. I’ve explained why the
GT-TGpath is not returned, so I hope that makes sense.Keep in mind, however, if you are trying to traverse reverse paths as well, you’ll have to loop over every node and run the query, changing the
START WITHnode each time.