I’m getting starting with OpenCL, I could see the add vector example and understand it. But I was thinking about the trapezium method. This is the code ( C ) for the integral calculation for x^2 in [a,b].
double f(double x)
{
return x*x;
}
double Simple_Trap(double a, double b)
{
double fA, fB;
fA = f(a);
fB = f(b);
return ((fA + fB) * (b-a)) / 2;
}
double Comp_Trap( double a, double b)
{
double Suma = 0;
double i = 0;
i = a + INC;
Suma += Simple_Trap(a,i);
while(i < b)
{
i+=INC;
Suma += Simple_Trap(i,i + INC);
}
return Suma;
}
The question is ¿how to obtain a kernel for integral calculation using the trapezium method?
So, I was thinking about the idea: partials[i] = integrate(a,a+offset), and then make a kernel to compute the sum of partials as mentioned Patrick87.
But, this is the best way?
Here’s what I came up with. I didn’t get to do an end-to-end test of this kernel. I will do an update when I get a bit more time.
comp_trap is the basic divide & conquer method based on the code you have above.
comp_trap_multi boosts accuracy by getting each work item to divide its sub-section
You need only allocate an array of doubles in the host so that each work group has one double to return its result. This should help in cutting down the vector allocation you wanted to avoid.
Please let me know if there are any problems with this.
Updated:
1) changed all double references to float, because double is optional in opencl
2) hard coded the work group size to 64. this value is optimal on my system, and should be determined experimentally. I prefer hard-coding this value over passing in a local array of floats to use, because the host program will eventually use only the optimal value on the target system anyway.
3) fixed an incorrect calculation (a1 was wrong, should be better now)