I have a data.frame of cells, values and coordinates. It resides in the global environment.
> head(cont.values)
cell value x y
1 11117 NA -34 322
2 11118 NA -30 322
3 11119 NA -26 322
4 11120 NA -22 322
5 11121 NA -18 322
6 11122 NA -14 322
Because my custom function takes almost a second to calculate individual cell (and I have tens of thousands of cells to calculate) I don’t want to duplicate calculations for cells that already have a value. My following solution tries to avoid that. Each cell can be calculated independently, screaming for parallel execution.
What my function actually does is check if there’s a value for a specified cell number and if it’s NA, it calculates it and inserts it in place of NA.
I can run my magic function (result is value for a corresponding cell) using apply family of functions and from within apply, I can read and write cont.values without a problem (it’s in global environment).
Now, I want to run this in parallel (using snowfall) and I’m unable to read or write from/to this variable from individual core.
Question: What solution would be able to read/write from/to a dynamic variable residing in global environment from within worker (core) when executing a function in parallel. Is there a better approach of doing this?
The pattern of a central store that workers consult for values is implemented in the rredis package on CRAN. The idea is that the Redis server maintains a store of key-value pairs (your global data frame, re-implemented). Workers query the server to see if the value has been calculated (
redisGet) and if not do the calculation and store it (redisSet) so that other workers can re-use it. Workers can be R scripts, so it’s easy to expand the work force. It’s a very nice alternative parallel paradigm. Here’s an example that uses the notion of ‘memoizing’ each result. We have a function that is slow (sleeps for a second)We write a ‘memoizer’ that returns a variant of
funthat first checks to see if the value forxhas already been calculated, and if so uses thatWe then memoize our function
and go
This does require a redis server running outside R but very easy to install; see the documentation that comes with the rredis package.
A within-R parallel version might be