Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7445925
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T12:10:17+00:00 2026-05-29T12:10:17+00:00

I am trying to declare a variable for matrix multiplication as follows: __shared__ float

  • 0

I am trying to declare a variable for matrix multiplication as follows:

__shared__ float As[BLOCK_SIZE][BLOCK_SIZE];

I am trying to make it so the user could input the size of the matrix to calculate, however that would mean changing the BLOCK_SIZE. I changed it but I am getting a compiler error:

error: constant value is not known

I’ve looked into it and it’s similar to this thread. So I tried:

__shared__ int buf [];

But then I get:

error: incomplete type is not allowed

Thanks,
Dan

Update with code (pretty much followed this guide and the staring out with CUDA guide):
The block size is passed in by asking the user of the size of the matrix. They enter the x and y. Block size is only x and right now it has to accept the same size as x and y.

__global__ void matrixMul( float* C, float* A, float* B, int wA, int wB,size_t block_size)
{
    // Block index
    int bx = blockIdx.x;
    int by = blockIdx.y;
    
    // Thread index
    int tx = threadIdx.x;
    int ty = threadIdx.y;
    
    // Index of the first sub-matrix of A processed 
    // by the block
    int aBegin = wA * block_size * by;
 
    // Index of the last sub-matrix of A processed 
    // by the block
    int aEnd   = aBegin + wA - 1;
 
    // Step size used to iterate through the 
    // sub-matrices of A
    int aStep  = block_size;
 
    // Index of the first sub-matrix of B processed 
    // by the block
    int bBegin = block_size * bx;
 
    // Step size used to iterate through the 
    // sub-matrices of B
    int bStep  = block_size * wB;
    float Csub=0;
    // Loop over all the sub-matrices of A and B
    // required to compute the block sub-matrix
    for (int a = aBegin, b = bBegin; a <= aEnd; a += aStep, b += bStep) 
    {
        // Declaration of the shared memory array As 
        // used to store the sub-matrix of A
        
        extern __shared__ float As[];

        // Declaration of the shared memory array Bs 
        // used to store the sub-matrix of B
        extern __shared__ float Bs[];
        extern __shared__ float smem[];

        // Load the matrices from global memory
        // to shared memory; each thread loads
        // one element of each matrix
        smem[ty*block_size+tx] = A[a + wA * ty + tx];
        //cuPrintf("\n\nWhat are the memory locations?\n");
        //cuPrintf("The shared memory(A) is: %.2f\n",smem[ty*block_size+tx]);
        smem[block_size*block_size+ty*block_size+tx]  = B[b + wB * ty + tx];
        //cuPrintf("The shared memory(B) is: %.2f\n",smem[block_size*block_size+ty*block_size+tx]);
        // Synchronize to make sure the matrices 
        // are loaded
        __syncthreads();
 
        // Multiply the two matrices together;
        // each thread computes one element
        // of the block sub-matrix
        for (int k = 0; k < block_size; ++k)
        {
        
            Csub += smem[ty*block_size+k] * smem[block_size*block_size+k*block_size+tx] ;
            //cuPrintf("Csub is currently: %.2f\n",Csub);
        }
        //cuPrintf("\n\n\n");
        // Synchronize to make sure that the preceding
        // computation is done before loading two new
        // sub-matrices of A and B in the next iteration
        //cuPrintf("the results are csub: %.2f\n",Csub);
        __syncthreads();
    }
    // Write the block sub-matrix to device memory;
    // each thread writes one element
    int c = wB * block_size * by + block_size * bx;
    C[c + wB * ty + tx] = Csub;
    
    
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T12:10:18+00:00Added an answer on May 29, 2026 at 12:10 pm
    extern __shared__ int buf[];
    

    when you launch the kernel you should launch it this way;

    kernel<<<blocks,threads,numbytes_for_shared>>>(...);
    

    If you have multiple extern declaration of shared:

    extern __shared__ float As[];
    // ...
    extern __shared__ float Bs[];
    

    this will lead to As pointing to the same address as Bs.

    You will need to keep As and Bs inside the 1D-array.

    extern __shared__ float smem[];
    

    When calling kernel, you should launch it with 2*BLOCK_SIZE*BLOCK_SIZE*sizeof(float).

    When indexing into As, use smem[y*BLOCK_SIZE+x] and when indexing into Bs use smem[BLOCK_SIZE*BLOCK_SIZE+y*BLOCK_SIZE+x]

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to declare a variable in an if-else block as follows: int main(int
I'm trying to make a copy of a table variable: DECLARE @lt_Sections TABLE (
I'm trying to declare a variable depending on what the number is. I'm trying
I am trying to declare a variable that has a default value or if
I'm trying to declare a C++ variable that takes up zero bytes. Its in
I'm trying to declare static variable and now my code is: class StaticClass: varA
I am being very, very confused... Basically trying to declare a global variable pointing
I'm trying to understand better how to declare a variable (string) and how methods
Im trying to model the EBNF expression (declare namespace ;)* (declare variable ;)* I
If i declare a variable in my A.jsp and i am trying to include

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.