I’m doing a project that spawn some hundreds of threads. All these threads are in a “sleeping” condition (they are locked on a Monitor object). I have noticed that if I increase the number of “sleeping” threads the program slow down very much. The “funny” thing is that looking at the Task Manager it seems that the greater the number of threads, the more free is the processor. I have narrowed the problem to object creation.
Can someone explain it to me?
I have produced a small sample to test it. It’s a console program. It creates a thread for each processor and measure it’s speed with a simple test (a “new Object()” ). No, the “new Object()” isn’t jitted away (try if you don’t trust me). The main thread show the speed of each thread. Pressing CTRL-C, the program spawns 50 “sleeping” threads. The slow down begins with just 50 threads. With around 250 it’s very visible on the Task Manager that the CPU isn’t 100% used (on mine it’s 82%).
I have tried three methods of locking the “sleeping” thread: Thread.CurrentThread.Suspend() (bad, bad, I know 🙂 ), a lock on an already locked object and a Thread.Sleep(Timeout.Infinite). It’s the same. If I comment the row with the new Object(), and I replace it with a Math.Sqrt (or with nothing) the problem isn’t present. The speed doesn’t change with the number of threads.
Can someone else check it? Does anyone knows where is the bottle neck?
Ah… you should test it in Release Mode WITHOUT launching it from the Visual Studio.
I’m using XP sp3 on a dual processor (no HT). I have tested it with the .NET 3.5 and 4.0 (to test the different framework runtimes)
namespace TestSpeed
{
using System;
using System.Collections.Generic;
using System.Threading;
class Program
{
private const long ticksInSec = 10000000;
private const long ticksInMs = ticksInSec / 1000;
private const int threadsTime = 50;
private const int stackSizeBytes = 256 * 1024;
private const int waitTimeMs = 1000;
private static List<int> collects = new List<int>();
private static int[] objsCreated;
static void Main(string[] args)
{
objsCreated = new int[Environment.ProcessorCount];
Monitor.Enter(objsCreated);
for (int i = 0; i < objsCreated.Length; i++)
{
new Thread(Worker).Start(i);
}
int[] oldCount = new int[objsCreated.Length];
DateTime last = DateTime.UtcNow;
Console.Clear();
int numThreads = 0;
Console.WriteLine("Press Ctrl-C to generate {0} sleeping threads, Ctrl-Break to end.", threadsTime);
Console.CancelKeyPress += (sender, e) =>
{
if (e.SpecialKey != ConsoleSpecialKey.ControlC)
{
return;
}
for (int i = 0; i < threadsTime; i++)
{
new Thread(() =>
{
/* The same for all the three "ways" to lock forever a thread */
//Thread.CurrentThread.Suspend();
//Thread.Sleep(Timeout.Infinite);
lock (objsCreated) { }
}, stackSizeBytes).Start();
Interlocked.Increment(ref numThreads);
}
e.Cancel = true;
};
while (true)
{
Thread.Sleep(waitTimeMs);
Console.SetCursorPosition(0, 1);
DateTime now = DateTime.UtcNow;
long ticks = (now - last).Ticks;
Console.WriteLine("Slept for {0}ms", ticks / ticksInMs);
Thread.MemoryBarrier();
for (int i = 0; i < objsCreated.Length; i++)
{
int count = objsCreated[i];
Console.WriteLine("{0} [{1} Threads]: {2}/sec ", i, numThreads, ((long)(count - oldCount[i])) * ticksInSec / ticks);
oldCount[i] = count;
}
Console.WriteLine();
CheckCollects();
last = now;
}
}
private static void Worker(object obj)
{
int ix = (int)obj;
while (true)
{
/* First and second are slowed by threads, third, fourth, fifth and "nothing" aren't*/
new Object();
//if (new Object().Equals(null)) return;
//Math.Sqrt(objsCreated[ix]);
//if (Math.Sqrt(objsCreated[ix]) < 0) return;
//Interlocked.Add(ref objsCreated[ix], 0);
Interlocked.Increment(ref objsCreated[ix]);
}
}
private static void CheckCollects()
{
int newMax = GC.MaxGeneration;
while (newMax > collects.Count)
{
collects.Add(0);
}
for (int i = 0; i < collects.Count; i++)
{
int newCol = GC.CollectionCount(i);
if (newCol != collects[i])
{
collects[i] = newCol;
Console.WriteLine("Collect gen {0}: {1}", i, newCol);
}
}
}
}
}
My guess is that the problem is that garbage collection requires a certain amount of cooperation between threads – something either needs to check that they’re all suspended, or ask them to suspend themselves and wait for it to happen, etc. (And even if they are suspended, it has to tell them not to wake up!)
This describes a “stop the world” garbage collector, of course. I believe there are at least two or three different GC implementations which differ in the details around parallelism… but I suspect that all of them are going to have some work to do in terms of getting threads to cooperate.