The problem is solved (If you’re interested ; you can see the second paragraph ; below the line) . Now I have a new question ; why #define BLOCK_DIM 16; cause an error in the function below ? Just use 16 is fine .
Here are the errors
expected a "]"
__local float2 block[BLOCK_DIM * (BLOCK_DIM + 1)] ;
^
line 110: error:
expected a ")"
__local float2 block[BLOCK_DIM * (BLOCK_DIM + 1)] ;
^
line 110: error: operand
of "*" must be a pointer
__local float2 block[BLOCK_DIM * (BLOCK_DIM + 1)] ;
error:
expected a ";"
int Idout = get_local_id(0)*(BLOCK_DIM+1)+get_local_id(1);
^
and the function
__kernel void transpose(
__global float2* dataout,
__global float2* datain,
int width, int height)
// width = N (signal length)
// height = batch_size (number of signals in a batch)
{
// read the matrix tile into shared memory
__local float2 block[32 * (32 + 1)] ;
unsigned int xIndex = get_global_id(0);
unsigned int yIndex = get_global_id(1);
if((xIndex < width) && (yIndex < height))
{
unsigned int index_in = yIndex * width + xIndex;
int Idin = get_local_id(1)*(32+1)+get_local_id(0);
block[Idin]= datain[index_in];
}
barrier(CLK_LOCAL_MEM_FENCE);
// write the transposed matrix tile to global memory
xIndex = get_group_id(1) * 32 + get_local_id(0);
yIndex = get_group_id(0) * 32 + get_local_id(1);
if((xIndex < height) && (yIndex < width))
{
unsigned int index_out = yIndex * height + xIndex;
int Idout = get_local_id(0)*(32+1)+get_local_id(1);
dataout[index_out] = block[Idout];
}
}
===============================
I’m working to improve the perfomance of a 2D FFT on images . After a benchmark ; I regconize the transpose function is the reason make the program slow , so I replace it with a more optimized one .
But after that ; I received a return code of all the functions which work fine before CL_INVALID_KERNEL_NAME. Except the transpose function and clSetKernelArg in the host code ; I don’t change anything else . So I’m out of idea. Hope you guys help me out 🙂
UPDATE : Here are the errors . Don’t mind the line number 🙂 Those lines seems normal with me . Is anything wrong ?
error:
expected a "]"
__local float2 block[BLOCK_DIM * (BLOCK_DIM + 1)] ;
^
line 110: error:
expected a ")"
__local float2 block[BLOCK_DIM * (BLOCK_DIM + 1)] ;
^
line 110: error: operand
of "*" must be a pointer
__local float2 block[BLOCK_DIM * (BLOCK_DIM + 1)] ;
error:
expected a ";"
int Idout = get_local_id(0)*(BLOCK_DIM+1)+get_local_id(1);
^
Here is the kernel file
The new one :
#define BLOCK_DIM 16
__kernel void transpose(
__global float2* dataout,
__global float2* datain,
int width, int height)
// width = N (signal length)
// height = batch_size (number of signals in a batch)
{
// read the matrix tile into shared memory
__local float2 block[BLOCK_DIM * (BLOCK_DIM + 1)] ;
unsigned int xIndex = get_global_id(0);
unsigned int yIndex = get_global_id(1);
if((xIndex < width) && (yIndex < height))
{
unsigned int index_in = yIndex * width + xIndex;
int Idin = get_local_id(1)*(BLOCK_DIM+1)+get_local_id(0);
block[Idin]= datain[index_in];
}
barrier(CLK_LOCAL_MEM_FENCE);
// write the transposed matrix tile to global memory
xIndex = get_group_id(1) * BLOCK_DIM + get_local_id(0);
yIndex = get_group_id(0) * BLOCK_DIM + get_local_id(1);
if((xIndex < height) && (yIndex < width))
{
unsigned int index_out = yIndex * height + xIndex;
int Idout = get_local_id(0)*(BLOCK_DIM+1)+get_local_id(1);
dataout[index_out] = block[Idout];
}
}
Your #define question.. they don’t require semicolons. Basically, #define X Y will replace all occurrences of “X” with “Y” in the code before being compiled, if you add a semicolon in the end that’ll become part of “Y” and create lots of syntax errors. A #define is not a statement.
Actually, that’s a simplistic explanation, but it suffices for the scope of this question (if you want to learn more, I recommend you look at preprocessor tutorials and documentation).