I have R code that I need to get to A “parallelization” stage. Im new at this so please forgive me if I use the wrong terms. I have a process that just has to chug through individual by individual one at a time and then average across individuals in the end. The process is the exact same for each individual (its a Brownian Bridge), I just have to do this for >300 individuals. So, I was hoping someone here might know how to change my code so that it can be spawned? or parallelized? or whatever the word is to make sure that the 48 CPU’s I now have access to can help reduce the 58 days it will take to compute this with my little laptop. In my head I would just send out 1 individual to one processor. Have it run through the script and then send another one….if that makes sense.
Below is my code. I have tried to comment in it and have indicated where I think the code needs to be changed.
for (n in 1:(length(IDNames))){ #THIS PROCESSES THROUGH EACH INDIVIDUAL
#THIS FIRST PART IS JUST EXTRACTING THE DATA FROM MY TWO INPUT FILES.
#I HAVE ONE FILE WITH ALL THE LOCATIONS AND THEN ANOTHER FILE WITH A DATE RANGE.
#EACH INDIVIDUAL HAS DIFFERENT DATE RANGES, THUS IT HAS TO PULL OUT EACH INDIVIDUALS
#DATA SET SEPARATELY AND THEN RUN THE FUNCTION ON IT.
IndivData = MovData[MovData$ID==IDNames[n],]
IndivData = IndivData[1:(nrow(IndivData)-1),]
if (UseTimeWindow==T){
IndivDates = dates[dates$ID==IDNames[n],]
IndivData = IndivData[IndivData$DateTime>IndivDates$Start[1]&IndivData$DateTime<IndivDates$End[1],]
}
IndivData$TimeDif[nrow(IndivData)]=NA
########################
#THIS IS THE PROCESS WHERE I THINK I NEED THAT HAS TO HAVE EACH INDIVIDUAL RUN THROUGH IT
BBMM <- brownian.bridge(x=IndivData$x, y=IndivData$y,
time.lag = IndivData$TimeDif[1:(nrow(IndivData)-1)], location.error=20,
area.grid = Grid, time.step = 0.1)
#############################
# BELOW IS JUST CODE TO BIND THE RESULTS INTO A GRID DATA FRAME I ALREADY CREATED.
#I DO NOT UNDERSTAND HOW THE MULTICORE PROCESSED CODE WOULD JOIN THE DATA BACK
#WHICH IS WHY IVE INCLUDED THIS PART OF THE CODE.
if(n==1){ #creating a data fram with the x, y, and probabilities for the first individual
BBMMProbGrid = as.data.frame(1:length(BBMM[[2]]))
BBMMProbGrid = cbind(BBMMProbGrid,BBMM[[2]],BBMM[[3]],BBMM[[4]])
colnames(BBMMProbGrid)=c("GrdId","X","Y",paste(IDNames[n],"_Prob", sep=""))
} else { #For every other individual just add the new information to the dataframe
BBMMProbGrid = cbind(BBMMProbGrid,BBMM[[4]])
colnames(BBMMProbGrid)[n*2+2]=paste(IDNames[n],"_Prob", sep ="")
}# end if
} #end loop through individuals
Not sure why this has been voted down either. I think the foreach package is what you’re after. Those first few pdfs have very clear useful information in them. Basically write what you want done for each person as a function. Then use foreach to send the data for one person out to a node to run the function (while sending another persons to another node etc) and then it compiles all the results using something like rbind. I’ve used this a few times with great results.
Edit: I didn’t look to rework your code as I figure given you’ve got that far you’ll easily have the skills to wrap it into a function and then use the one liner foreach.
Edit 2: This was too long for a comment to reply to you.
I thought since you had got that far with the code that you would be able to get it into a function 🙂 If you’re still working on this, it might help to think of writing a for loop to loop over your subjects and do the calculations required for that subject. Then, that for loop is what you want in your function. I think in your code that is everything down to ‘area.grid’. Then you can get rid of most of your [n]’s since the data is only subset once per iteration.
Perhaps:
Then something like:
Hard to say whether that is it without some test data, let me know how you get on.