Im looking at this implementation of DCT using cuda: http://www.cse.nd.edu/courses/cse60881/www/source_code/dct8x8/dct8x8_kernel1.cu
The part in question is here:
__shared__ float CurBlockLocal1[BLOCK_SIZE2];
__global__ void CUDAkernel1DCT(float *Dst, int ImgWidth, int OffsetXBlocks, int OffsetYBlocks)
{
// Block index
const int bx = blockIdx.x + OffsetXBlocks;
const int by = blockIdx.y + OffsetYBlocks;
// Thread index (current coefficient)
const int tx = threadIdx.x;
const int ty = threadIdx.y;
// Texture coordinates
const float tex_x = (float)( (bx << BLOCK_SIZE_LOG2) + tx ) + 0.5f;
const float tex_y = (float)( (by << BLOCK_SIZE_LOG2) + ty ) + 0.5f;
//copy current image pixel to the first block
CurBlockLocal1[ (ty << BLOCK_SIZE_LOG2) + tx ] = tex2D(TexSrc, tex_x, tex_y);
//synchronize threads to make sure the block is copied
__syncthreads();
where block size is 8 so block_size_log2 is 3.
Why is the texture coordinates defined as it is? Why do we need to use texture coordinates? What is the “<<” in Cuda?
To answer your questions in reverse order:
a << bis equivalent toa * 2^bwhereaandbare both positive integers. So code you are asking about is basically shorthand for integer power of two multiplication.The code you have asked about could probably been written as