I’m getting starting with OpenCL, I could see the add vector example and understand

Question

0

Asked: May 28, 20262026-05-28T04:30:55+00:00 2026-05-28T04:30:55+00:00

I’m getting starting with OpenCL, I could see the add vector example and understand

0

I’m getting starting with OpenCL, I could see the add vector example and understand it. But I was thinking about the trapezium method. This is the code ( C ) for the integral calculation for x^2 in [a,b].

double f(double x)
{
    return x*x;
}

double Simple_Trap(double a, double b)
{
    double fA, fB;
    fA = f(a);
    fB = f(b);
    return ((fA + fB) * (b-a)) / 2;
}

double Comp_Trap( double a, double b)
{
    double Suma = 0;
    double i = 0;
    i = a + INC;
    Suma += Simple_Trap(a,i);
    while(i < b)
    {
        i+=INC;
        Suma += Simple_Trap(i,i + INC);
    }
    return Suma;
}

The question is ¿how to obtain a kernel for integral calculation using the trapezium method?

So, I was thinking about the idea: partials[i] = integrate(a,a+offset), and then make a kernel to compute the sum of partials as mentioned Patrick87.

But, this is the best way?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T04:30:56+00:00

Here’s what I came up with. I didn’t get to do an end-to-end test of this kernel. I will do an update when I get a bit more time.

comp_trap is the basic divide & conquer method based on the code you have above.
comp_trap_multi boosts accuracy by getting each work item to divide its sub-section

You need only allocate an array of doubles in the host so that each work group has one double to return its result. This should help in cutting down the vector allocation you wanted to avoid.

Please let me know if there are any problems with this.

Updated:

1) changed all double references to float, because double is optional in opencl

2) hard coded the work group size to 64. this value is optimal on my system, and should be determined experimentally. I prefer hard-coding this value over passing in a local array of floats to use, because the host program will eventually use only the optimal value on the target system anyway.

3) fixed an incorrect calculation (a1 was wrong, should be better now)

/*
numerical-integration.cl
*/

float f(float x)
{
    return x*x;
}

float simple_trap(float a, float b)
{
    float fA, fB;
    fA = f(a);
    fB = f(b);
    return ((fA + fB) * (b-a)) / 2;
}

__kernel void comp_trap(
    float a,
    float b,
    __global float* sums)
{
/*
- assumes 1D global and local work dimensions
- each work unit will calculate 1/get_global_size of the total sum
- the 0th work unit of each group then accumulates the sum for the
group and stores it in __global * sums
- memory allocation: sizeof(sums) = get_num_groups(0) * sizeof(float)
- assumes local scratchpad size is at lease 8 bytes per work unit in the group
ie sizeof(wiSums) = get_local_size(0) * sizeof(float)
*/
    __local float wiSums[64];
    int l_id = get_local_id(0);

    //cumpute range for this work item is: a1, b1 
    float a1 = a+((b-a)/get_global_size(0))*get_global_id(0);
    float b1 = a1+(b-a)/get_global_size(0);

    wiSums[l_id] = simple_trap(a1,b1);

    barrier(CLK_LOCAL_MEM_FENCE);

    int i;
    if(l_id == 0){
        for(i=1;i<get_local_size(0);i++){
            wiSums[0] += wiSums[i];
        }
        sums[get_group_id(0)] = wiSums[0];
    }
}

__kernel void comp_trap_multi(
    float a,
    float b,
    __global float* sums,
    int divisions)
{
/*
- same as above, but each work unit further divides its range into
'divisions' equal parts, yielding a more accurate result
- work units still store only one sum in the local array, which is
used later for the final group accumulation
*/
    __local float wiSums[64];
    int l_id = get_local_id(0);

    float a1 = a+((b-a)/get_global_size(0))*get_global_id(0);
    float b1 = a1+(b-a)/get_global_size(0);
    float range;
    if(divisions > 0){
        range = (b1-a1)/divisions;
    }else{
        range = (b1-a1);
    }

    int i;
    wiSums[l_id] = 0;
    for(i=0;i<divisions;i++){
        wiSums[l_id] += simple_trap(a1+range*i,a1+range*(i+1));
    }

    barrier(CLK_LOCAL_MEM_FENCE);

    if(l_id == 0){
        for(i=1;i<get_local_size(0);i++){
            wiSums[0] += wiSums[i];
        }
        sums[get_group_id(0)] = wiSums[0];
    }
}

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m getting starting with OpenCL, I could see the add vector example and understand

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply