Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7745609
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T10:03:54+00:00 2026-06-01T10:03:54+00:00

I was playing with Go language concurrency and found something which is kinda opaque

  • 0

I was playing with Go language concurrency and found something which is kinda opaque to me.

I wrote parallel matrix multiplication, that is, each task computes single line of product matrix, multiplying corresponding rows and columns of source matrices.

Here is Java program

public static double[][] parallelMultiply(int nthreads, final double[][] m1, final double[][] m2) {
    final int n = m1.length, m = m1[0].length, l = m2[0].length;
    assert m1[0].length == m2.length;

    double[][] r = new double[n][];

    ExecutorService e = Executors.newFixedThreadPool(nthreads);
    List<Future<double[]>> results = new LinkedList<Future<double[]>>();
    for (int ii = 0; ii < n; ++ii) {
        final int i = ii;
        Future<double[]> result = e.submit(new Callable<double[]>() {
            public double[] call() throws Exception {
                double[] row = new double[l];
                for (int j = 0; j < l; ++j) {
                    for (int k = 0; k < m; ++k) {
                        row[j] += m1[i][k]*m2[k][j];
                    }
                }
                return row;
            }
        });
        results.add(result);
    }
    try {
        e.shutdown();
        e.awaitTermination(1, TimeUnit.HOURS);
        int i = 0;
        for (Future<double[]> result : results) {
            r[i] = result.get();
            ++i;
        }
    } catch (Exception ex) {
        ex.printStackTrace();
        return null;
    }

    return r;
}

and this is Go program

type Matrix struct {
    n, m int
    data [][]float64
}

func New(n, m int) *Matrix {
    data := make([][]float64, n)
    for i, _ := range data {
        data[i] = make([]float64, m)
    }
    return &Matrix{n, m, data}
}

func (m *Matrix) Get(i, j int) float64 {
    return m.data[i][j]
}

func (m *Matrix) Set(i, j int, v float64) {
    m.data[i][j] = v
}

func MultiplyParallel(m1, m2 *Matrix) *Matrix {
    r := New(m1.n, m2.m)

    c := make(chan interface{}, m1.n)
    for i := 0; i < m1.n; i++ {
        go func(i int) {
            innerLoop(r, m1, m2, i)
            c <- nil
        }(i)
    }

    for i := 0; i < m1.n; i++ {
        <-c
    }

    return r
}

func innerLoop(r, m1, m2 *Matrix, i int) {
    for j := 0; j < m2.m; j++ {
        s := 0.0
        for k := 0; k < m1.m; k++ {
            s = s + m1.Get(i, k) * m2.Get(k, j)
        }
        r.Set(i, j, s)
    }
}

When I use Java program with nthreads=1 and nthreads=2 there is nearly double speedup on my dual-core N450 Atom netbook.
When I use Go program with GOMAXPROCS=1 and GOMAXPROCS=2 there is no speedup at all!

Even though Java code uses additional storage for Futures and then collectes their values to the result matrix instead of direct array update in the worker code (that’s what Go version does), it performs much more faster on several cores than Go version.

Especially funny is that Go version with GOMAXPROCS=2 loads both cores (htop displays 100% load on both processors while program works), but the time of computation is the same as with GOMAXPROCS=1 (htop displays 100% load only on one core in this case).

Another concern is that Java program is faster than Go one even in simple single-thread multiplication, but that is not exactly unexpected (taking benchmarks from here into account) and should not affect multicore performance multiplier.

What I’m doing incorrectly here? Is there a way to speedup Go program?

UPD:
it seems i found what I’m doing incorrectly. I was checking time of java program using System.currentTimeMillis() and Go program using time shell command. I mistakingly took ‘user’ time from zsh output as program working time instead of ‘total’ one. Now i double-checked the computation speed and it gives me nearly double speedup too (though it is slighlty lesser than Java’s):

% time env GOMAXPROCS=2 ./4-2-go -n 500 -q
env GOMAXPROCS=2 ./4-2-go -n 500 -q  22,34s user 0,04s system 99% cpu 22,483 total
% time env GOMAXPROCS=2 ./4-2-go -n 500 -q -p
env GOMAXPROCS=2 ./4-2-go -n 500 -q -p  24,09s user 0,10s system 184% cpu 13,080 total

Seems I have to be more attentive.

Still java program gives five time lesser times on the same case. But it is a matter for another question I think.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T10:03:55+00:00Added an answer on June 1, 2026 at 10:03 am

    You are probably experiencing the effects of false sharing. In a nutshell, if two pieces of data happen to fall onto the same CPU cache line, modifying these two pieces of data from threads that execute on different CPU cores will trigger the expensive cache coherency protocol.

    This kind of cache “ping-pong” is extremely hard to diagnose, and can happen on logically completely unrelated data, just because they happen to be placed close enough in memory. The 100% CPU load is typical of false sharing – your cores really are working 100%, they are just not working on your program – they are working on synchronizing their caches.

    The fact that in Java program you have a thread-private data until the time comes to “integrate” it into the final result is what saves you from false sharing. I’m not familiar with Go, but judging on your own words, threads are writing directly to the common array, which is exactly the kind of thing that could trigger the false sharing. This is an example how a perfectly valid single-threaded reasoning does exactly the opposite in the multi-threaded environment!

    For more in-depth discussion on the topic, I warmly recommend Herb Sutter’s article: Eliminate False Sharing, or a lecture: Machine Architecture: Things Your Programming Language Never Told You (and associated PDF slides).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am playing with Zend 1.9 and would like to have the language parameter
I've got a bit of fettish for language design and I'm currently playing around
How do I learn currently selected keyboard layout / input language? I was playing
Playing around with MongoDB and NoRM in .NET. Thing that confused me - there
How to deal with the fact that you don't know the language/framework in a
im playing with assembly language and gdb trying to modify a memory address: +67
I'm playing with Ruby Mechanize with a website. But it seems that it ignored
I'm just starting to learn D programming language, and happily playing with delegate and
I'm playing around with a toy language design of mine and I have a
Not long ago I was playing around with GETEXT in PHP for language files

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.