I would like to find a subgraph in Neo4j DB with only strong two way relationships.
Let’s say the relationship is LOVES and attribute is Kisses then I would like to only find a subgraph where both sides have kissed the other more than 2 times.
START n=node(*)
MATCH n-[r1:LOVES]->friend<-[r2:LOVES]-n
WHERE r1.Kisses > 2 and r2.Kisses > 2
RETURN n, friend, r1, r2
LIMIT 20
the problem is that the query seems to run forever on a 3M node 30M relationship graph, on a 32GB RAM, quad core system with 16GB max heap for neo4j(neo4j calculator suggested 8GB).
I suspect, there is an endless loop hiding somewhere.
OS: Ubuntu 12.04.1 LTS server
Soft: neo4j-community-1.8.1
java version “1.7.0_10” (neo4j start says to use JDK6)
Java(TM) SE Runtime Environment (build 1.7.0_10-b18)
Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode)
EDIT: match is incorrect it should be
MATCH n-[r1:LOVES]->friend-[r2:LOVES]->n
UPDATE:
After correcting semantics above, I am still unable to get a full result in 5 minutes.
LOVES is the only relationship type and about 10-20% or relationships have a corresponding one going the other way.
My ultimate goal is to find appropriate Kiss values so that I am left with <100k nodes (and all appropriate LOVES relationships) and can export this subgraph.
Here’s pseudocode for algorithm for what I am trying to do:
let E be edge.list to be processed
let myedgelist be empty list
for e in E:
if e.n1 > e.n2: # so we do not look twice
continue
if not exist(e[n2][n1]): # this is where lookup can be a problem O(1) for hash, O(logn) for sorted, O(n) for something random)
continue
if e.kisses > 2 and e[n2][n1].kisses > 2:
add e to myedgelist
add e[n2][n1] to myedgelist
return myedgelist
This algorithm should run at most edgecount * log(edgecount), unless there is no effective way to lookup existence of reverse relationship in neo4j, which would seem rather inconceivable.
Can you try to make
friendthe target nodeyou can try to split the match into two parts
or alternatively only do half of the query with a match and the second half with a filter
here’s a console to try the queries: http://console.neo4j.org/r/tqvb1p
But after all you’re processing 3M nodes with a match that potentially blows up to 3M^2