I’m using the Walktrap community detection method to return a number (19 in this case) of clusters. I have a list of members which belong to one or more of these clusters.
-
I need a method to search each cluster for the presence of the
members and return the percentage of matches found. ( e.g cluster[0]
= 0%, cluster[1] =Y%…..cluster[18]=Z%) Thus selecting the optimum cluster that represents the members on the list. -
Once the optimum cluster is found, I need a method to count the
number of members of the optimum cluster and from the original
(19-1) clusters select another cluster that is nearest in size
(number of members)library(igraph) edges <- read.csv('http://dl.dropbox.com/u/23776534/Facebook%20%5BEdges%5D.csv') list<-read.csv("http://dl.dropbox.com/u/23776534/knownlist.csv") all<-graph.data.frame(edges) summary(all) all_wt<- walktrap.community(all, steps=6,modularity=TRUE,labels=TRUE) all_wt_memb <- community.to.membership(all,all_wt$merges,steps=which.max(all_wt$modularity)-1) all_wt_memb$csize >[1] 176 13 204 24 9 263 16 2 8 4 12 8 9 19 15 3 6 2 1
The
%in%function, when used like:a %in% bwill determine which of the elements in vectoraare also present in vectorb. So for each cluster, I would%in%this cluster — which will return a Boolean vectorsum()on the Boolean vector to count the number of true elements (i.e. the number of elements in your initial vector which are present in this clusterYou can loop through each cluster using
for()or anapplyvariant.Then given
all_wt_memb$csize, you’ll have a given value which is your target, and you’ll want to find the nearest number. See this link, but you’re just calculating the minimum absolute difference: