Hey there,
I want to evaluate a mathematical function (user-defined) which returns several values in an array (this function is a vector based function f:R^n->R^m with n input coordinates and m output functions) in C++ for certain parameters, e.g.:
double *my_func(const mxArray *point)
{
double *dat = mxGetPr(point);
double *vals = new double[ 3 ];
vals[0] = dat[0]*dat[0]*dat[0]*dat[0]*dat[0];
vals[1] = sin(dat[0])*dat[1]*dat[2]*dat[2]*cos(dat[1]);
vals[2] = exp(dat[0])*sin(dat[0])*dat[3];
double *pnt = vals;
return pnt;
}
Currently I do this on the CPU. So I call the function once and get back an array with all function values. As I want to parallelize it now on the GPU, I thought about how to do it.
I assume it would be kind of stupid to evaluate my_func() completely in each thread since than each thread would calculate the whole function-array. Is this the right assumption?
Would there be any way to comfortable calculate only the n-th element of the function-array and return it, so that 5 threads could easily calculate the function-array in parallel instead of one CPU calculating it completely ‘alone’?
The only way I could think off was:
double my_func0(const mxArray *point)
{
double *dat = mxGetPr(point);
return dat[0]*dat[0]*dat[0]*dat[0]*dat[0];
}
double my_func1(const mxArray *point)
{
double *dat = mxGetPr(point);
return sin(dat[0])*dat[1]*dat[2]*dat[2]*cos(dat[1]);
}
double my_func2(const mxArray *point)
{
double *dat = mxGetPr(point);
return exp(dat[0])*sin(dat[0])*dat[3];
}
etc… But this would be quite ‘uncomfortable’ for the user who uses the program later because he always would have to create new C++ functions if he wants to extend the function-array instead of just adapting ONE single C++-function. And a further problem would be: I have to dynamically call the function since the number of functions is ‘dynamic’ and thus I would have to do a call to my_func_%%i%% and don’t know if this is a good way to do it… So the question is if there would be a better way to deal with this problem?
When you say “user_defined” I presume you mean that someone else writes
my_func()and then your code calls it?If this is the case, consider running many calls to
my_func()in parallel rather than trying to break the function up. This means whoever writesmy_func()only needs to write one function, and you will be responsible for delegating multiple calls, ensuring they have the correct data to work on, and gathering up the results.Update Based on Comments
In your situation, If the operation required to calculate each member of the
valsis different then the user would either have to parameterise themy_func()by the index required; as you suggesteddouble my_func(const mxArray *point, const unsigned & index), note how it now returns a single double value as opposed to the whole result array. Or provide a differentmy_func()for each index;double my_func_n(const mxArray *point).You could then call this function or set of functions from as many different threads as you like and get a single result for further computation. We are ignoring many concurrency issues however to do with reading/writing data simultaneously which need thinking about.
General Mutlitasking Advice
Before looking into multitasking with your GPU have a look at standard multithreading on a CPU (I recommend Boost Thread Libraries to help: http://www.boost.org/). Once you see how threads are created and used you may find you better understand what you can do with them and how you’d go about doing it.
Multitasking with a GPU becomes more useful if you are applying mathematical functions to very large matrices or vectors and it is possible to use hardware implementations of certain graphical functions to achieve the mathematical result. There are further libraries to support GPGPU (General Purpose GPU) programming, such as OpenCL, Nvidia’s CUDA, or ATI’s Stream. Have a look at what these libraries provide to give you an idea of how applicable they are to your situation.