I have a simple method which converts an array from one type to another. I wanted to find out which method is the fastest. But so far I get differing results from which I cannot conclude which method is really faster by which margin.
Since the conversion is only about allocating memory, reading the array and converting values I am surprised that the values are not more stable. I wanted to know how I can make accurate measurements which are meaningful and to not change from one day to the other.
The differences are about 20% from one day to the other.
There are of course differences between the JITer of .NET 3.5 and 4.0, debug and release mode, not running the executable under a debugger (disables JIT optimizations until you disable it), code generation of the C# compiler between DEBUG and RELEASE (mainly nop operations and more temporary variables in the IL code).
using System;
using System.Collections.Generic;
using System.Diagnostics;
namespace PerfTest
{
class Program
{
const int RUNS = 10 * 1000 * 1000;
static void Main(string[] args)
{
int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 };
var s2 = Stopwatch.StartNew();
for (int i = 0; i < RUNS; i++)
{
float[] arr = Cast(array);
}
s2.Stop();
GC.Collect();
var s3 = Stopwatch.StartNew();
for (int i = 0; i < RUNS; i++)
{
float[] arr = Cast2(array);
}
s3.Stop();
GC.Collect();
var s4 = Stopwatch.StartNew();
for (int i = 0; i < RUNS; i++)
{
var arr = CastSafe(array);
}
s4.Stop();
Console.WriteLine("Times: {0} {1} {2}", s2.ElapsedMilliseconds, s3.ElapsedMilliseconds, s4.ElapsedMilliseconds);
}
// Referece cast implementation to check performance
public static unsafe float[] Cast(int[] input)
{
int N = input.Length;
float[] output = new float[N];
fixed (int* pIStart = &input[0])
{
int* pI = pIStart;
fixed (float* pOStart = &output[0])
{
float* pO = pOStart;
for (int i = 0; i < N; i++)
{
*pO = (float)*pI;
pI++;
pO++;
}
}
}
return output;
}
// Referece cast implementation to check performance
public static unsafe float[] Cast2(int[] input)
{
int N = input.Length;
float[] output = new float[N];
fixed (int* pIStart = &input[0])
{
int* pI = pIStart;
fixed (float* pOStart = &output[0])
{
float* pO = pOStart;
for (int i = 0; i < N; i++)
{
pO[i] = (float) pI[i];
}
}
}
return output;
}
public static float[] CastSafe(int[] input)
{
int N = input.Length;
float[] output = new float[N];
for (int i = 0; i < input.Length; i++)
{
output[i] = (float)input[i];
}
return output;
}
}
}
I do get then
- Times: 1257 1388 1180
- Times: 1331 1428 1267
- Times: 1337 1435 1267
- Times: 1208 1414 1145
From this it does look like the dumb safe variant is faster than any unsafe variant although bounds check elimination of the unsafe methods should make it at least as fast if not faster.
Just for fun I did also compile the same IL code via LCG (DynamicMethod) which seem to be even slower than any of these methods although the additional cost of the delegate invocation does not seem to play such a big role here.
The for loop does execute this code 10 million times which should produce stable results. Why am I seeing any differences here at all? Using realtime as process priority did also not help (psexec -realtime executable). How can I get reliable numbers?
My tests did include
- Dual Quad Core machines
- Windows 7 32 / 64 bit editions
- .NET Framework 3.5/4.0
- 32/64 bit versions of the executable.
If I use a profiler I am not sure if he will distort the measurements even more. Since he does interrupt my application from time to time to get the call stacks he will certainly destroy any cache locality which might aid performance. If there is any approach with better (data) cache locality I will not be able to find it out with a profiler.
Edit1:
To take into account that I do not have a real time OS I do now sample my measurements. Since for one thread I have a 15ms time window granted the the Windows Scheduler I can keep out the Scheduler if I measure shorter than 15ms. If I do measure too shortly I will end up with very small tick counts which will not tell me much.
To get stable values I need a time span long enough to let the OS do whatever it does on a regular basis. Empiric tests have shown that 30+ seconds is a good time span one measurment should take.
This time span is then divided into sample time spans which are well below 15ms. Then I will get timing information for each sample. From the samples I can extract min/max and average. This way I can also see first time initialization effects.
The code looks now like this
class Program
{
const int RUNS = 100 * 1000 * 1000; // 100 million runs will take about 30s
const int RunsPerSample = 100; // 100 runs for on sample is about 0,01ms << 15ms
static void Main(string[] args)
{
int[] array = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43 };
long[] sampleTimes = new long [RUNS/RunsPerSample];
int sample = 0;
for (int i = 0; i < RUNS; i+=RunsPerSample)
{
var sw = Stopwatch.StartNew();
for (int j = i; j < i+RunsPerSample; j++)
{
float[] arr = Cast(array);
}
sw.Stop();
sampleTimes[sample] = sw.ElapsedTicks;
sample++;
}
Console.WriteLine("SampleSize: {0}, Min {1}, Max {2}, Average {3}",
RunsPerSample, sampleTimes.Min(), sampleTimes.Max(), sampleTimes.Average());
The values from these tests do still vary (<10%) but I think that if I create a histogram chart of my values and drop the 10% highest values which are likely caused by the OS, GC, … I can get really stable numbers which I can trust.
SampleSize: 100, Min 25, Max 86400, Average 28,614631
- SampleSize: 100, Min 24, Max 86027, Average 28,762608
- SampleSize: 100, Min 25, Max 49523, Average 32,102037
- SampleSize: 100, Min 24, Max 48687, Average 32,030088
Edit2:
The histograms show that the measured values are not random. They look like a Landau distribution which should give me with the right approximation algorithms stable values. I wish in .NET would exist something like ROOT where I can interactivly fit the right distribution function to my data and get the results.

The code to generate the histogram with the MSChart controls is below:
using System.Collections.Generic;
using System.Drawing;
using System.Linq;
using System.Windows.Forms;
using System.Windows.Forms.DataVisualization.Charting;
namespace ConsoleApplication4
{
public partial class Histogram : Form
{
public Histogram(long [] sampleTimes)
{
InitializeComponent();
Series histogramSeries = cHistogram.Series.Add("Histogram");
// Set new series chart type and other attributes
histogramSeries.ChartType = SeriesChartType.Column;
histogramSeries.BorderColor = Color.Black;
histogramSeries.BorderWidth = 1;
histogramSeries.BorderDashStyle = ChartDashStyle.Solid;
var filtered = RemoveHighValues(sampleTimes, 40);
KeyValuePair<long,int>[] histoData = GenerateHistogram(filtered);
ChartArea chartArea = cHistogram.ChartAreas[histogramSeries.ChartArea];
chartArea.AxisY.Title = "Frequency";
chartArea.AxisX.Minimum = histoData.Min( x=>x.Key );
chartArea.AxisX.Maximum = histoData.Max( x=>x.Key );
foreach (var v in histoData)
{
histogramSeries.Points.Add(new DataPoint(v.Key, v.Value));
}
chartArea.AxisY.Minimum = 0;
chartArea.AxisY.Maximum = histoData[0].Value + 100;
}
// Count the occurence of each value of input and return an array with the value as key and its count as value
// as ordered list starting with the highest counts.
KeyValuePair<long,int>[] GenerateHistogram(long [] input)
{
Dictionary<long, int> counts = new Dictionary<long, int>();
foreach (var value in input)
{
int old = 0;
if (!counts.TryGetValue(value, out old))
{
counts[value] = 0;
}
counts[value] = ++old;
}
var orderedCounts = (from x in counts
orderby x.Value descending
select x).ToArray();
return orderedCounts;
}
long[] RemoveHighValues(long[] input, int maxDifference)
{
var min = input.Min();
var max = input.Max();
long[] filtered = input;
while (max - min > maxDifference) // remove all values wich differ by more than maxDifference ticks
{
filtered = input.Where(x => x < max).ToArray();
max = filtered.Max();
}
return filtered;
}
}
}
I have amended my original question with the finding that the numbers were not random but follow a distribution (looks like a Landau distribution) where I can use fitting algorithms to get the peak value with the most likely true timing.