I have a weighted graph with (in practice) up to 50,000 vertices. Given a vertex, I want to randomly choose an adjacent vertex based on the relative weights of all adjacent edges.
How should I store this graph in memory so that making the selection is efficient? What is the best algorithm? It could be as simple as a key value store for each vertex, but that might not lend itself to the most efficient algorithm. I’ll also need to be able update the network.
Note that I’d like to take only one “step” at a time.
More Formally: Given a weighted, directed, and potentially complete graph, let W(a,b) be the weight of edge a->b and let Wa be the sum of all edges from a. Given an input vertex v, I want to choose a vertex randomly where the likelihood of choosing vertex x is W(v,x) / Wv
Example:
Say W(v,a) = 2, W(v,b) = 1, W(v,c) = 1.
Given input v, the function should return a with probability 0.5 and b or c with probability 0.25.
If you are concerned about the performance of generating the random walk you may use the alias method to build a datastructure which fits your requirements of choosing a random outgoing edge quite well. The overhead is just that you have to assign each directed edge a probability weight and a so-called alias-edge.
So for each note you have a vector of outgoing edges together with the weight and the alias edge. Then you may choose random edges in constant time (only the generation of th edata structure is linear time with respect to number of total edges or number of node edges). In the example the edge is denoted by
->[NODE]and nodevcorresponds to the example given above:If you want to choose an outgoing edge (i.e. the next node) you just have to generate a single random number
runiform from interval [0,1).You then get
no=floor(N[v] * r)andpv=frac(N[v] * r)whereN[v]is the number of outgoing edges. I.e. you pick each edge with the exact same probability (namely 1/3 in the example of nodev).Then you compare the assigned probability
pof this edge with the generated valuepv. Ifpvis less you keep the edge selected before, otherwise you choose its alias edge.If for example we have
r=0.6from our random number generator we haveTherefore we choose the second outgoing edge (note the index starts with zero) which is
and switch to the alias edge
->asincep=3/4 < pv.For the example of node
vwe thereforebwith probability1/3*3/4(i.e. wheneverno=1andpv<3/4)cwith probability1/3*3/4(i.e. wheneverno=2andpv<3/4)awith probability1/3 + 1/3*1/4 + 1/3*1/4(i.e. wheneverno=0orpv>=3/4)