I’m writing an N-Body simulation, and for computational simplification I’ve divided the whole space into a number of uniformly-sized regions.
For each body, I compute the force of all other bodies in the same region, and for the other regions I aggregate the mass and distances together so there’s less work to be done.
I have a List<Region> and Region defines public void Index() which sums the total mass at this iteration.
I have two variants of my Space.Tick() function:
public void Tick()
{
foreach (Region r in Regions)
r.Index();
}
This is very quick. For 20x20x20 = 8000 regions with 100 bodies each = 800000 bodies in total, it only takes about 0.1 seconds to do this. The CPU graph shows 25% utilisation on my quad-core, which is exactly what I would expect.
Now I write this multi-threaded variant:
public void Tick()
{
Thread[] threads = new Thread[Environment.ProcessorCount];
foreach (Region r in Regions)
while (true)
{
bool queued = false;
for (int i = 0; i < threads.Length; i++)
if (threads[i] == null || !threads[i].IsAlive)
{
Region s = r;
threads[i] = new Thread(s.Index);
threads[i].Start();
queued = true;
break;
}
if (queued)
break;
}
}
So a quick explanation in case it’s not obvious: threads is an array of 4, in the case of my CPU. It starts off being 4xnull. For each region, I loop through all 4 Thread objects (which could be null). When I find one that’s either null or isn’t IsAlive, I queue up the Index() of that Region and Start() it. I set queued to true so that I can tell that the region has started indexing.
This code takes about 7 seconds. That’s 70x slower. I understand that there’s a bit of overhead involved with setting up the threads, finding a thread that’s vacant, etc. But I would still expect that I would have at least some sort of performance gain.
What am I doing wrong?
Why not try PLINQ?
PLINQ is usually SUPER fast for me, and it scales dependent on your environment.. If it shouldn’t be Parallel, it does single thread.
So, if you had to have a multidimensional array come into the function, you could just do this: