I’ve been doing some profiling lately and I’ve encountered one case which is driving me nuts. The following is a piece of unsafe C# code which basically copies a source sample buffer to a target buffer with a different sample rate. As it is now, it takes up ~0.17% of the total processing time per frame. What I don’t get is that if I use floats instead of doubles, the processing time will raise to 0.38%. Could someone please explain what’s going on here?
Fast version (~17%)
double rateIncr = ...
double readOffset = ...
double offsetIncr = ...
float v = ... // volume
// Source and target buffers.
float* src = ...
float* tgt = ...
for( var c = 0; c < chunkCount; ++c)
{
for( var s = 0; s < chunkSampleSize; ++s )
{
// Source sample
var iReadOffset = (int)readOffset;
// Interpolate factor
var k = (float)readOffset - iReadOffset;
// Linearly interpolate 2 contiguous samples and write result to target.
*tgt++ += (src[ iReadOffset ] * (1f - k) + src[ iReadOffset + 1 ] * k) * v;
// Increment source offset.
readOffset += offsetIncr;
}
// Increment sample rate
offsetIncr += rateIncr;
}
Slow version (~38%)
float rateIncr = ...
float readOffset = ...
float offsetIncr = ...
float v = ... // volume
// Source and target buffers.
float* src = ...
float* tgt = ...
for( var c = 0; c < chunkCount; ++c)
{
for( var s = 0; s < chunkSampleSize; ++s )
{
var iReadOffset = (int)readOffset;
// The cast to float is removed
var k = readOffset - iReadOffset;
*tgt++ += (src[ iReadOffset ] * (1f - k) + src[ iReadOffset + 1 ] * k) * v;
readOffset += offsetIncr;
}
offsetIncr += rateIncr;
}
Odd version(~22%)
float rateIncr = ...
float readOffset = ...
float offsetIncr = ...
float v = ... // volume
// Source and target buffers.
float* src = ...
float* tgt = ...
for( var c = 0; c < chunkCount; ++c)
{
for( var s = 0; s < chunkSampleSize; ++s )
{
var iReadOffset = (int)readOffset;
var k = readOffset - iReadOffset;
// By just placing this test it goes down from 38% to 22%,
// and the condition is NEVER met.
if( (k != 0) && Math.Abs( k ) < 1e-38 )
{
Console.WriteLine( "Denormalized float?" );
}
*tgt++ += (src[ iReadOffset ] * (1f - k) + src[ iReadOffset + 1 ] * k) * v;
readOffset += offsetIncr;
}
offsetIncr += rateIncr;
}
All I know by now is that I know nothing
Are you running this on a 64 or 32 bit processor? My experience has been that in some edge cases there are optimisations the CPU can do with low level functionality like this if the size of your object matches the size of the registers (even though you may assume that two floats would fit neatly in a 64 bit register you may still lose the optimisation benefit). You may find the situation reversed if you run it on a 32 bit system…
A quick search and the best I can do for a cite on this is a couple of posts to C++ game development forums (it was during my one year in game dev that I noticed this myself, but then that was the only time I was profiling to this level). This post has some interesting disassembly results from a C++ method that may be applicable at a very low level.
Another thought:
This article from MSDN goes into a lot of the internal specifics of using floats in .NET primarily to address the problematic issue of float comparison. There is one interesting paragraph from it which sums up the CLR spec for handling float values:
So your floats may not actually be floats when operations are being performed against them, instead they could be 80-bit numbers on a x87 FPU or anything else that the compiler may think is an optimisation or required for calculation accuracy. Without looking in the IL you won’t know for sure, but there could be many costly casts when you are working with floats that don’t hit when you are using doubles. It’s a shame that you can’t define the required precision for floating point operations in C# as you can through the fp switches in C++, since that would stop the compiler from putting everything into a larger container before operating on it.