I found link to this article while was reading Parallel Patterns
But I’m a little confused now.
What if every iteration of Parallel.For produces result and stores it as array’s item.
There’s no race condition and no need for synchronization. But cache line must be in sync with all threads what degrades performance. (If I’m not mistaken).
So I’m interested whether ways of performance improvement exist.
In order for false sharing to occur you’d need different threads to access array items that are near to each other.
In practice, you have a small number of threads (let’s call it C) processing a large array of size N, with N >> C. This means that each thread gets a reasonably large number of items to process. Assuming they can be processed independently, the ideal way to do it is by doing a contiguous split, thus each thread gets consecutive positions in the array. This avoids false sharing pretty well.
It wouldn’t make sense to process array elements interleaved for example, because that would indeed cause false sharing. It may not always be possible to use the contiguous strategy however, because sometimes load-balancing comes into play. In this case you have to see which is more detrimental: occasional false sharing or bad load-balancing. The discussion is long. I’m sure the underlying TPL scheduler is designed well-enough to obtain the best trade-off.