You need to analyse the blobs a bit more to…

Question

0

Asked: May 11, 20262026-05-11T19:20:41+00:00 2026-05-11T19:20:41+00:00

I’ve been doing some profiling lately and I’ve encountered one case which is driving

0

I’ve been doing some profiling lately and I’ve encountered one case which is driving me nuts. The following is a piece of unsafe C# code which basically copies a source sample buffer to a target buffer with a different sample rate. As it is now, it takes up ~0.17% of the total processing time per frame. What I don’t get is that if I use floats instead of doubles, the processing time will raise to 0.38%. Could someone please explain what’s going on here?

Fast version (~17%)

double rateIncr = ...
double readOffset = ...
double offsetIncr = ...

float v = ... // volume

// Source and target buffers.
float* src = ...
float* tgt = ...

for( var c = 0; c < chunkCount; ++c)
{
    for( var s = 0; s < chunkSampleSize; ++s )
    {
        // Source sample            
        var iReadOffset = (int)readOffset;

        // Interpolate factor
        var k = (float)readOffset - iReadOffset;

        // Linearly interpolate 2 contiguous samples and write result to target.
        *tgt++ += (src[ iReadOffset ] * (1f - k) + src[ iReadOffset + 1 ] * k) * v;

        // Increment source offset.
        readOffset += offsetIncr;
    }
    // Increment sample rate
    offsetIncr += rateIncr;
}

Slow version (~38%)

float rateIncr = ...
float readOffset = ...
float offsetIncr = ...

float v = ... // volume

// Source and target buffers.
float* src = ...
float* tgt = ...

for( var c = 0; c < chunkCount; ++c)
{
    for( var s = 0; s < chunkSampleSize; ++s )
    {
        var iReadOffset = (int)readOffset;

        // The cast to float is removed
        var k = readOffset - iReadOffset;

        *tgt++ += (src[ iReadOffset ] * (1f - k) + src[ iReadOffset + 1 ] * k) * v;
        readOffset += offsetIncr;
    }
    offsetIncr += rateIncr;
}

Odd version(~22%)

float rateIncr = ...
float readOffset = ...
float offsetIncr = ...

float v = ... // volume

// Source and target buffers.
float* src = ...
float* tgt = ...

for( var c = 0; c < chunkCount; ++c)
{
    for( var s = 0; s < chunkSampleSize; ++s )
    {
        var iReadOffset = (int)readOffset;
        var k = readOffset - iReadOffset;

        // By just placing this test it goes down from 38% to 22%,
        // and the condition is NEVER met.
        if( (k != 0) && Math.Abs( k ) < 1e-38 )
        {
           Console.WriteLine( "Denormalized float?" );
        }

        *tgt++ += (src[ iReadOffset ] * (1f - k) + src[ iReadOffset + 1 ] * k) * v;
        readOffset += offsetIncr;
    }
    offsetIncr += rateIncr;
}

All I know by now is that I know nothing

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T19:20:41+00:00

Are you running this on a 64 or 32 bit processor? My experience has been that in some edge cases there are optimisations the CPU can do with low level functionality like this if the size of your object matches the size of the registers (even though you may assume that two floats would fit neatly in a 64 bit register you may still lose the optimisation benefit). You may find the situation reversed if you run it on a 32 bit system…

A quick search and the best I can do for a cite on this is a couple of posts to C++ game development forums (it was during my one year in game dev that I noticed this myself, but then that was the only time I was profiling to this level). This post has some interesting disassembly results from a C++ method that may be applicable at a very low level.

Another thought:

This article from MSDN goes into a lot of the internal specifics of using floats in .NET primarily to address the problematic issue of float comparison. There is one interesting paragraph from it which sums up the CLR spec for handling float values:

This spec clearly had in mind the x87
FPU. The spec is basically saying that
a CLR implementation is allowed to use
an internal representation (in our
case, the x87 80 bit representation)
as long as there is no explicit
storage to a coerced location (a class
or valuet type field), that forces
narrowing. Also, at any point, the IL
stream may have conv.r4 and conv.r8
instructions, which will force the
narrowing to happen.

So your floats may not actually be floats when operations are being performed against them, instead they could be 80-bit numbers on a x87 FPU or anything else that the compiler may think is an optimisation or required for calculation accuracy. Without looking in the IL you won’t know for sure, but there could be many costly casts when you are working with floats that don’t hit when you are using doubles. It’s a shame that you can’t define the required precision for floating point operations in C# as you can through the fp switches in C++, since that would stop the compiler from putting everything into a larger container before operating on it.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions