I am experimenting with the new parallelism tools in .NET 4, by calculating Pi using Monte Carlo methods.
(The actual algorithm is not so important, but for clarity’s sake, here it is:
- Pick
numIterationsrandom points inside a unit square. - Count the number of these points that lie within a circle that is bounded by that square (i.e. the points whose distance from the centre of the square is less than 0.5)
- Then, for very large
numIterations,PI=4 * iterationsInsideCircle / numIterations.)
I have a method int ThrowDarts(int numDarts) which picks numDarts random points inside the unit square (described above) and returns the number of points which lie within the unit circle:
protected static int ThrowDarts(int iterations)
{
int dartsInsideCircle = 0;
Random random = new Random();
for (int iteration = 0; iteration < iterations; iteration++)
{
double pointX = random.NextDouble() - 0.5;
double pointY = random.NextDouble() - 0.5;
double distanceFromOrigin = Math.Sqrt(pointX*pointX + pointY*pointY);
bool pointInsideCircle = distanceFromOrigin <= 0.5;
if (pointInsideCircle)
{
dartsInsideCircle++;
}
}
return dartsInsideCircle;
}
Essentially, in each of my different implementations (which each use different parallel mechanisms), I am writing different ways of throwing and counting the darts inside the circle.
For example, my single threaded implementation is simply:
protected override int CountInterationsInsideCircle()
{
return ThrowDarts(_numInterations);
}
I also have this method for one of my parallel algorithms:
protected override int CountInterationsInsideCircle()
{
Task<int>[] tasks = new Task<int>[_numThreads];
for (int i = 0; i < _numThreads; i++)
{
tasks[i] = Task.Factory.StartNew(() => ThrowDarts(_numInterations/_numThreads));
}
int iterationsInsideCircle = 0;
for (int i = 0; i < _numThreads; i++)
{
iterationsInsideCircle += tasks[i].Result;
}
return iterationsInsideCircle;
}
Hopefully you get the picture.
Here, I get to my conundrum. The Parallel.For version I am writing causes massive amounts of context switching. The code is below:
protected override int CountInterationsInsideCircle()
{
ConcurrentBag<int> results = new ConcurrentBag<int>();
int result = 0;
Parallel.For(0, _numInterations,
// initialise each thread by setting it's hit count to 0
() => 0,
//in the body, we throw one dart and see whether it hit or not
(iteration, state, localState) => localState + ThrowDarts(1),
// finally, we sum (in a thread-safe way) all the hit counts of each thread together
results.Add);
foreach(var threadresult in results)
{
result+=threadresult;
}
return result;
}
The version using Parallel.For does work, but very, very slowly, because of the aforementioned context switching (which does not occur in the previous two methods).
Is anyone able to enlighten me as to why this may be happening?
I’ve actually found the solution to the question.
Previously, in my ThrowDarts method, I was creating a new
Randomwith every call (this was because theRandomclass is not thread safe.)However, turns out, this is relatively expensive. (At least, it is when only performing one dart throw, such that we generate a new
Randomfor each iteration.)Thus, I have modified my
ThrowDartsmethod to take aRandomwhich the caller creates, and modified my LoopState to contain it’s own Random.Therefore, each thread in the
Parallel.Forcontains it’s ownRandom. My new implementation is as follows:I guess the context switching metric was a bit of a red herring, and a simple profile would have done the trick. Nice curve ball, .NET, nice. Anyway, lesson learned!
Thanks all,
Alex