Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 227337
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 11, 20262026-05-11T19:35:24+00:00 2026-05-11T19:35:24+00:00

I am looking for the fastest way to de/interleave a buffer. To be more

  • 0

I am looking for the fastest way to de/interleave a buffer.
To be more specific, I am dealing with audio data, so I am trying to optimize the time I spend on splitting/combining channels and FFT buffers.

Currently I am using a for loop with 2 index variables for each array, so only plus operations, but all the managed array checks will not compare to a C pointer method.

I like the Buffer.BlockCopy and Array.Copy methods, which cut a lot of time when I process channels, but there is no way for an array to have a custom indexer.

I was trying to find a way to make an array mask, where it would be a fake array with a custom indexer, but that proves to be two times slower when using it in my FFT operation. I guess there are a lot of optimization tricks the compiler can pull when accessing an array directly, but accessing through a class indexer cannot be optimized.

I do not want an unsafe solution, although from the looks of it, that might be the only way to optimize this type of operation.

Thanks.

Here is the type of thing I’m doing right now:

private float[][] DeInterleave(float[] buffer, int channels)
{
    float[][] tempbuf = new float[channels][];
    int length = buffer.Length / channels;
    for (int c = 0; c < channels; c++)
    {
        tempbuf[c] = new float[length];
        for (int i = 0, offset = c; i < tempbuf[c].Length; i++, offset += channels)
            tempbuf[c][i] = buffer[offset];
    }
    return tempbuf;
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-11T19:35:24+00:00Added an answer on May 11, 2026 at 7:35 pm

    I ran some tests and here is the code I tested:

    delegate(float[] inout)
    { // My Original Code
        float[][] tempbuf = new float[2][];
        int length = inout.Length / 2;
        for (int c = 0; c < 2; c++)
        {
            tempbuf[c] = new float[length];
            for (int i = 0, offset = c; i < tempbuf[c].Length; i++, offset += 2)
                tempbuf[c][i] = inout[offset];
        }
    }
    delegate(float[] inout)
    { // jerryjvl's recommendation: loop unrolling
        float[][] tempbuf = new float[2][];
        int length = inout.Length / 2;
        for (int c = 0; c < 2; c++)
            tempbuf[c] = new float[length];
        for (int ix = 0, i = 0; ix < length; ix++)
        {
            tempbuf[0][ix] = inout[i++];
            tempbuf[1][ix] = inout[i++];
        }
    
    }
    delegate(float[] inout)
    { // Unsafe Code
        unsafe
        {
            float[][] tempbuf = new float[2][];
            int length = inout.Length / 2;
            fixed (float* buffer = inout)
                for (int c = 0; c < 2; c++)
                {
                    tempbuf[c] = new float[length];
                    float* offset = buffer + c;
                    fixed (float* buffer2 = tempbuf[c])
                    {
                        float* p = buffer2;
                        for (int i = 0; i < length; i++, offset += 2)
                            *p++ = *offset;
                    }
                }
        }
    }
    delegate(float[] inout)
    { // Modifying my original code to see if the compiler is not as smart as i think it is.
        float[][] tempbuf = new float[2][];
        int length = inout.Length / 2;
        for (int c = 0; c < 2; c++)
        {
            float[] buf = tempbuf[c] = new float[length];
            for (int i = 0, offset = c; i < buf.Length; i++, offset += 2)
                buf[i] = inout[offset];
        }
    }
    

    and results: (buffer size = 2^17, number iterations timed per test = 200)

    Average for test #1:      0.001286 seconds +/- 0.000026
    Average for test #2:      0.001193 seconds +/- 0.000025
    Average for test #3:      0.000686 seconds +/- 0.000009
    Average for test #4:      0.000847 seconds +/- 0.000008
    
    Average for test #1:      0.001210 seconds +/- 0.000012
    Average for test #2:      0.001048 seconds +/- 0.000012
    Average for test #3:      0.000690 seconds +/- 0.000009
    Average for test #4:      0.000883 seconds +/- 0.000011
    
    Average for test #1:      0.001209 seconds +/- 0.000015
    Average for test #2:      0.001060 seconds +/- 0.000013
    Average for test #3:      0.000695 seconds +/- 0.000010
    Average for test #4:      0.000861 seconds +/- 0.000009
    

    I got similar results every test. Obviously the unsafe code is the fastest, but I was surprised to see that the CLS couldn’t figure out that that it can drop the index checks when dealing with jagged array. Maybe someone can think of more ways to optimize my tests.

    Edit:
    I tried loop unrolling with the unsafe code and it didn’t have an effect.
    I also tried optimizing the loop unrolling method:

    delegate(float[] inout)
    {
        float[][] tempbuf = new float[2][];
        int length = inout.Length / 2;
        float[] tempbuf0 = tempbuf[0] = new float[length];
        float[] tempbuf1 = tempbuf[1] = new float[length];
    
        for (int ix = 0, i = 0; ix < length; ix++)
        {
            tempbuf0[ix] = inout[i++];
            tempbuf1[ix] = inout[i++];
        }
    }
    

    The results are also a hit-miss compared test#4 with 1% difference. Test #4 is my best way to go, so far.

    As I told jerryjvl, the problem is getting the CLS to not index check the input buffer, since adding a second check (&& offset < inout.Length) will slow it down…

    Edit 2:
    I ran the tests before in the IDE, so here are the results outside:

    2^17 items, repeated 200 times
    ******************************************
    Average for test #1:      0.000533 seconds +/- 0.000017
    Average for test #2:      0.000527 seconds +/- 0.000016
    Average for test #3:      0.000407 seconds +/- 0.000008
    Average for test #4:      0.000374 seconds +/- 0.000008
    Average for test #5:      0.000424 seconds +/- 0.000009
    
    2^17 items, repeated 200 times
    ******************************************
    Average for test #1:      0.000547 seconds +/- 0.000016
    Average for test #2:      0.000732 seconds +/- 0.000020
    Average for test #3:      0.000423 seconds +/- 0.000009
    Average for test #4:      0.000360 seconds +/- 0.000008
    Average for test #5:      0.000406 seconds +/- 0.000008
    
    
    2^18 items, repeated 200 times
    ******************************************
    Average for test #1:      0.001295 seconds +/- 0.000036
    Average for test #2:      0.001283 seconds +/- 0.000020
    Average for test #3:      0.001085 seconds +/- 0.000027
    Average for test #4:      0.001035 seconds +/- 0.000025
    Average for test #5:      0.001130 seconds +/- 0.000025
    
    2^18 items, repeated 200 times
    ******************************************
    Average for test #1:      0.001234 seconds +/- 0.000026
    Average for test #2:      0.001319 seconds +/- 0.000023
    Average for test #3:      0.001309 seconds +/- 0.000025
    Average for test #4:      0.001191 seconds +/- 0.000026
    Average for test #5:      0.001196 seconds +/- 0.000022
    
    Test#1 = My Original Code
    Test#2 = Optimized safe loop unrolling
    Test#3 = Unsafe code - loop unrolling
    Test#4 = Unsafe code
    Test#5 = My Optimized Code
    

    Looks like loop unrolling is not favorable. My optimized code is still my best way to go and with only 10% difference compared to the unsafe code. If only I could tell the compiler that (i < buf.Length) implies that (offset < inout.Length), it will drop the check (inout[offset]) and I will basically get the unsafe performance.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am looking the fastest way to draw thousands of individually calculated pixels directly
I'm looking for the fastest way to obtain the value of π, as a
I'm looking for the fastest way to determine if a long value is a
I'm looking for the fastest way of counting the number of bit transitions in
I'm doing a bookmarking system and looking for the fastest (easiest) way to retrieve
Looking for easy way to support retina displays. It occurred to me that if
Looking for feedback on : http://code.google.com/p/google-perftools/wiki/GooglePerformanceTools
Looking for an example that: Launches an EXE Waits for the EXE to finish.
Looking for C# class which wraps calls to do the following: read and write
Looking at what's running and nothing jumps out. Thanks!

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.