The following short but complete example program
const long iterations = 1000000000;
T[] array = new T[1 << 20];
for (int i = 0; i < array.Length; i++)
{
array[i] = new T();
}
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
array[i % array.Length].Value0 = i;
}
Console.WriteLine("{0,-15} {1} {2:n0} iterations/s",
typeof(T).Name, sw.Elapsed, iterations * 1000d / sw.ElapsedMilliseconds);
with T replaced by the following types
class SimpleClass struct SimpleStruct
{ {
public int Value0; public int Value0;
} }
class ComplexClass struct ComplexStruct
{ {
public int Value0; public int Value0;
public int Value1; public int Value1;
public int Value2; public int Value2;
public int Value3; public int Value3;
public int Value4; public int Value4;
public int Value5; public int Value5;
public int Value6; public int Value6;
public int Value7; public int Value7;
public int Value8; public int Value8;
public int Value9; public int Value9;
public int Value10; public int Value10;
public int Value11; public int Value11;
} }
yields the following interesting results on my machine (Windows 7 .NET 4.5 32-bit)
SimpleClass 00:00:10.4471717 95,721,260 iterations/s ComplexClass 00:00:37.8199150 26,441,736 iterations/s SimpleStruct 00:00:12.3075100 81,254,571 iterations/s ComplexStruct 00:00:32.6140182 30,661,679 iterations/s
Question 1: Why is ComplexClass so much slower than SimpleClass? The elapsed time seems to increase linearly with the number of fields in the class. Writing to the first field of a class with a lot of fields shouldn’t be very different from writing to the first field of a class with only one field, no?
Question 2: Why is ComplexStruct slower than SimpleStruct? A look at the IL code shows that i is written directly to the array, not to a local instance of ComplexStruct that is then copied into the array. So there should be no overhead caused by copying more fields.
Bonus question: Why is ComplexStruct faster than ComplexClass?
Edit: Updated test results with a smaller array, T[] array = new T[1 << 8];:
SimpleClass 00:00:13.5091446 74,024,724 iterations/s ComplexClass 00:00:13.2505217 75,471,698 iterations/s SimpleStruct 00:00:14.8397693 67,389,986 iterations/s ComplexStruct 00:00:13.4821834 74,172,971 iterations/s
So virtually no difference between SimpleClass and ComplexClass, and only a small difference between SimpleStruct and ComplexStruct. However, the performance significantly decreased for SimpleClass and SimpleStruct.
Edit: And now with T[] array = new T[1 << 16];:
SimpleClass 00:00:09.7477715 102,595,670 iterations/s ComplexClass 00:00:10.1279081 98,745,927 iterations/s SimpleStruct 00:00:12.1539631 82,284,210 iterations/s ComplexStruct 00:00:10.5914174 94,419,790 iterations/s
The result for 1<<15 is like 1<<8, and the result for 1<<17 is like 1<<20.
Possible answer to Question 1:
Your CPU reads memory into its cache a page at a time.
With the larger data type, you can fit fewer objects onto each cache page. Even though you’re only writing one 32-bit value, you still need the page in CPU cache. With the smaller objects, you can get through more loops before you next need to read from main memory.