Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3322378
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 17, 20262026-05-17T23:10:30+00:00 2026-05-17T23:10:30+00:00

I’m writing a code that does N X N matrix multiplication using thread level

  • 0

I’m writing a code that does N X N matrix multiplication using thread level parallelism.

To get C = A X B,
first I transposed matrix B, divided matrices into blocks.

A thread takes a block from A and B and multiply them,
and then adds the result to the corresponding block in C.
All matrices are allocated in heap memory using malloc().

But the problem is that for the same input,
the answer is sometimes incorrect and sometimes correct.

I’m not quite sure why this happens,
but I guess the code need to be improved in terms of thread safety.

I post some part of my code.
blocks is the number of blocks in row and column, i.e. N / block size.
So the total number of threads is blocks^3.

while (thread_loaded < total_thread)
 {
  if (thread_count < MAX_THREAD)
  {
   p[thread_loaded].idx = thread_idx;
   p[thread_loaded].matinfo = &mi;   
   threads[thread_loaded] = CreateThread(NULL, 0, matmul_tlp, &p[thread_loaded], 0, &threadid[thread_loaded]);
   if (threads[thread_loaded] != NULL)
   {     
    thread_loaded++;
    thread_count++;
    thread_idx += blocks;
    if (thread_idx >= total_thread)
     thread_idx = (thread_idx % total_thread) + 1;
   }          
  }
 }

for the thread function,

int i, j, k;  
param* p = (param*)arg; 

int blocks = BLOCKS;
int block_size = BLOCK_SIZE;
int Ar = p->idx / (blocks * blocks); 
int Ac = p->idx % blocks;
int Br = (p->idx / blocks) % blocks; 
int Bc = p->idx % blocks;
int Cr = p->idx / (blocks * blocks);
int Cc = (p->idx % (blocks * blocks)) / blocks;

double** A = p->matinfo->A;
double** B = p->matinfo->B;
double** C = p->matinfo->C;


DWORD res = WaitForSingleObject(mutex[Cr * blocks + Cc], INFINITE);
if (res != WAIT_OBJECT_0)
 perror("cannot acquire mutex.");


for (i = 0; i < block_size; i++)
{
 for (j = 0; j < block_size; j++)
 {
  for (k = 0; k < block_size; k++)
  {
   C[Cr * block_size + i][Cc * block_size + j] +=
    A[Ar * block_size + i][Ac * block_size + k] *
    B[Br * block_size + j][Bc * block_size + k];
  }
 }
}


ReleaseMutex(mutex[Cr * blocks + Cc]);


thread_count--;
return NULL;

It would be appreciated if anyone can find a hole. 🙂

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-17T23:10:30+00:00Added an answer on May 17, 2026 at 11:10 pm

    There are two shared values that are being written to in your threaded function. The C matrix and the thread_count variable. I don’t see anything wrong with how the C matrix is synchronised but it’s worth double checking to make sure that the Cc and Cr values are properly computed since this is what your mutex choice is based on.

    The most likely source of error is the thread_count variable which is not synchronized. Remember that a– is the equivalent of ‘a = a – 1 and consists of one read followed by one write operation (it’s NOT atomic). This means it’s very likely to be preempted between the read and the write and thus some decrements of the variables could be lost. For example

    Thread A reads the value 10 from thread_count.
    Thread B reads the value 10 from thread_count.
    Thread B writes the value 10-1 into thread_count.
    Thread A writes the value 10-1 into thread_count.
    thread_count is now equal to 9 when it should be 8.
    

    So you’ve just lost one of your signals to the thread creation function. This could cause your algorithm to run with less and fewer threads as you lose decrements. Sadly, unless I missed something, I don’t see it explaining why you get bad results. Your program will just run slower than it really should.

    Easiest way to fix this would be to use a condition variable which is made specifically for these types of situation and avoids the lost-wakeup problem that you have here.

    By the way, creating threads is usually a pretty costly operation so a thread pool would be preferable to creating and destroying tons of threads. If thread pools are more complex then necessary, you could use conditions variables to make your threads wait for a new task once they’re done with their current one.

    EDIT: I wrote the solution a little too quickly and I ended up confusing two different issues.

    First you want to synchronise your accesses to the thread_count variable in both the thread creation function and the calculation function. To do that you can either use a mutex or you can use atomic operands like InterlockedDecrement, InterlockedIncrement (I hope these are the correct Windows equivalent). Note that, while I don’t see any issues with using atomic operands in this case, they are not as simple to use as they might seem. Make sure you fully understand them before using them.

    The second issue is that you’re spinning in your thread creation function while waiting for one of the threads to finish. You can avoid that by using a condition variables to signal the main thread once its calculations are completed. That way you don’t end up taking processor time for no reason.

    One more thing, make sure your thread_count variable is declared volatile to indicate to the compiler that the order in which it’s read and written within the function is important. Otherwise, the compiler is free to re-order the operations to increase performance.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

That's pretty much it. I'm using Nokogiri to scrape a web page what has
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
We're building an app, our first using Rails 3, and we're having to build
I'm making a simple page using Google Maps API 3. My first. One marker
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I've got a string that has curly quotes in it. I'd like to replace
I am reading a book about Javascript and jQuery and using one of the
I have this code to decode numeric html entities to the UTF8 equivalent character.
I have a French site that I want to parse, but am running into

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.