It seems that C# is faster at adding two arrays of UInt16[] than it is at adding two arrays of int[]. This makes no sense to me, since I would have assumed the arrays would be word-aligned, and thus int[] would require less work from the CPU, no?
I ran the test-code below, and got the following results:
Int for 1000 took 9896625613 tick (4227 msec)
UInt16 for 1000 took 6297688551 tick (2689 msec)
The test code does the following:
- Creates two arrays named
aandb, once. - Fills them with random data, once.
- Starts a stopwatch.
- Adds
aandb, item-by-item. This is done 1000 times. - Stops the stopwatch.
- Reports how long it took.
This is done for int[] a, b and for UInt16 a,b. And every time I run the code, the tests for the UInt16 arrays take 30%-50% less time than the int arrays. Can you explain this to me?
Here’s the code, if you want to try if for yourself:
public static UInt16[] GenerateRandomDataUInt16(int length)
{
UInt16[] noise = new UInt16[length];
Random random = new Random((int)DateTime.Now.Ticks);
for (int i = 0; i < length; ++i)
{
noise[i] = (UInt16)random.Next();
}
return noise;
}
public static int[] GenerateRandomDataInt(int length)
{
int[] noise = new int[length];
Random random = new Random((int)DateTime.Now.Ticks);
for (int i = 0; i < length; ++i)
{
noise[i] = (int)random.Next();
}
return noise;
}
public static int[] AddInt(int[] a, int[] b)
{
int len = a.Length;
int[] result = new int[len];
for (int i = 0; i < len; ++i)
{
result[i] = (int)(a[i] + b[i]);
}
return result;
}
public static UInt16[] AddUInt16(UInt16[] a, UInt16[] b)
{
int len = a.Length;
UInt16[] result = new UInt16[len];
for (int i = 0; i < len; ++i)
{
result[i] = (ushort)(a[i] + b[i]);
}
return result;
}
public static void Main()
{
int count = 1000;
int len = 128 * 6000;
int[] aInt = GenerateRandomDataInt(len);
int[] bInt = GenerateRandomDataInt(len);
Stopwatch s = new Stopwatch();
s.Start();
for (int i=0; i<count; ++i)
{
int[] resultInt = AddInt(aInt, bInt);
}
s.Stop();
Console.WriteLine("Int for " + count
+ " took " + s.ElapsedTicks + " tick ("
+ s.ElapsedMilliseconds + " msec)");
UInt16[] aUInt16 = GenerateRandomDataUInt16(len);
UInt16[] bUInt16 = GenerateRandomDataUInt16(len);
s = new Stopwatch();
s.Start();
for (int i=0; i<count; ++i)
{
UInt16[] resultUInt16 = AddUInt16(aUInt16, bUInt16);
}
s.Stop();
Console.WriteLine("UInt16 for " + count
+ " took " + s.ElapsedTicks + " tick ("
+ s.ElapsedMilliseconds + " msec)");
}
What happens is that you see a leaky abstraction. UInt16 takes half of memory that int does (16 vs. 32 bit).
This means that the memory area occupied by int16 array takes half of area that the int32 does. So more of that area can fit in processor cache and thus be accessed very quickly.
You could try that code on a processor that has more cache and the difference is likely to be smaller.
Also try with much larger arrays.