I’m trying to make use of parallel processing with the wordnet package for R on a Windows 7 computer. Specifically, I’m trying to find the synonyms for a list of nouns. I’ve made some sample code below to show what I’m trying to do, but it doesn’t seem to properly execute in parallel. It is starting up the workers, and it is calculating on one of the workers, but not the others.The list I’ve made below has a length of 4 with 4 words in each slot. I’m attempting to divide the list by the number of cores available and send a subset of the list to each core. Then the sapply function gets the synonyms for the 4 words (within the parallel loop). I’ve also tried doing this with Snowfall but I couldn’t get it to export the dictionary (sfExport didn’t seem to do it). I’m not using the “.export” within the foreach loop because it was also giving errors with the dictionary not being located, but putting it within the parallel loop seems to make it work. Any help would be much appreciated.
library(wordnet)
library(foreach)
library(doSMP)
library(rJava)
NbrOfCores <- 2
workers <- startWorkers(NbrOfCores) # number of cores
registerDoSMP(workers)
getDoParName() # check name of parallel backend
getDoParVersion() # check version of parallel backend
getDoParWorkers() # check number of workers
set.seed(1)
setDict<-setDict("C:\\Program Files (x86)\\WordNet\\2.1\\dict\\")
initDict<-initDict("C:\\Program Files (x86)\\WordNet\\2.1\\dict\\")
dict<-getDictInstance()
words <- list(c("cat", "dog", "bird"),c("mouse", "iguana", "fish"),c("car", "tree", "house"),c("shoe", "shirt", "hat"))
rows=length(words) #4
prow<-floor(rows/NbrOfCores) #2
nouns<-foreach(i=1:NbrOfCores, .combine = c, .packages ="wordnet","rJava") %dopar% {
setDict<-setDict("C:\\Program Files (x86)\\WordNet\\2.1\\dict\\")
initDict<-initDict("C:\\Program Files (x86)\\WordNet\\2.1\\dict\\")
dict<-getDictInstance()
foreach(j=(prow*(i-1)+1):(prow*i)) %do% sapply(words[[j]],synonyms,"NOUN")}
I think your problem is in how you setup the
ivariable in yourforeach. What this should be looping through is thewordsobject, not the number of cores. This code works:It looks like the
doSMPpackages isn’t available for my version of R, so I just switched it tosnow, but you could use whatever backend you want.