I was wondering how can we can use the python module networkX to implement SimRank to compare the similarity of 2 nodes? I understand that networkX provides methods for looking at neighbors, and link analysis algorithms such as PageRank and HITS, but is there one for SimRank?
Examples, tutorials are welcomed too!
Update
I implemented an networkx_addon library. SimRank is included in the library. Check out: https://github.com/hhchen1105/networkx_addon for details.
Sample Usage:
You may obtain the similarity score between two nodes (say, node ‘a’ and node ‘b’) by
SimRank is a vertex similarity measure. It computes the similarity between two nodes on a graph based on the topology, i.e., the nodes and the links of the graph. To illustrate SimRank, let’s consider the following graph, in which a, b, c connect to each other, and d is connected to d. How a node a is similar to a node d, is based on how a‘s neighbor nodes, b and c, similar to d‘s neighbors, c.
As seen, this is a recursive definition. Thus, SimRank is recursively computed until the similarity values converges. Note that SimRank introduces a constant r to represents the relative importance between in-direct neighbors and direct neighbors. The formal equation of SimRank can be found here.
The following function takes a networkx graph $G$ and the relative imporance parameter r as input, and returns the simrank similarity value sim between any two nodes in G. The return value sim is a dictionary of dictionary of float. To access the similarity between node a and node b in graph G, one can simply access sim[a][b].
To calculate the similarity values between nodes in the above graph, you can try this.
You’ll get
Let’s verify the result by calculating similarity between, say, node a and node b, denoted by S(a,b).
S(a,b) = r * (S(b,a)+S(b,c)+S(c,a)+S(c,c))/(2*2) = 0.9 * (0.6538+0.6261+0.6261+1)/4 = 0.6538,
which is the same as our calculated S(a,b) above.
For more details, you may want to checkout the following paper:
G. Jeh and J. Widom. SimRank: a measure of structural-context similarity. In KDD’02 pages 538-543. ACM Press, 2002.