I have written a function which reads an input buffer of bytes and produces

Question

0

Asked: May 15, 20262026-05-15T05:50:39+00:00 2026-05-15T05:50:39+00:00

I have written a function which reads an input buffer of bytes and produces

0

I have written a function which reads an input buffer of bytes and produces an output buffer of words where every word can be either 0x0081 for each ON bit of the input buffer or 0x007F for each OFF bit. The length of the input buffer is given. Both arrays have enough physical place. I also have about 2Kbyte free RAM which I can use for lookup tables or so.

Now, I found that this function is my bottleneck in a real time application. It will be called very frequently. Can you please suggest a way how to optimize this function? I see one possibility could be to use only one buffer and do in-place substitution.

void inline BitsToWords(int8    *pc_BufIn, 
                        int16   *pw_BufOut, 
                        int32   BufInLen)
{
 int32 i,j,z=0;

 for(i=0; i<BufInLen; i++)
 {
  for(j=0; j<8; j++, z++)
  {
   pw_BufOut[z] = 
                    ( ((pc_BufIn[i] >> (7-j))&0x01) == 1? 
                    0x0081: 0x007f );
  }
 }
}

Please do not offer any library-, compiler specific or CPU/Hardware specific optimization, because it is a multi-platform project.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-15T05:50:39+00:00

I also have about 2Kbyte free RAM which I can use for lookup tables

Your lookup tables can placed in a const array at compile time, so it could be in ROM – does this give you room for the straightforward 4KB table?

If you can afford 4KB of ROM space, the only problem is building the table as an initialized array in a .c file – but that only has to be done once, and you can write a script to do it (which may help ensure it’s correct and may also help if you decide that the table needs to change for some reason in the future).

You’d have to profile to ensure that the copy from ROM to the destination array is actually faster than calculating what needs to go into the destination – I wouldn’t be surprised if something along the lines of:

/* untested code - please forgive any bonehead errors */
void inline BitsToWords(int8    *pc_BufIn, 
                        int16   *pw_BufOut, 
                        int32   BufInLen)
{
    while (BufInLen--) {
        unsigned int tmp = *pc_BufIn++;

        *pw_BufOut++ = (tmp & 0x80) ? 0x0081 : 0x007f;
        *pw_BufOut++ = (tmp & 0x40) ? 0x0081 : 0x007f;
        *pw_BufOut++ = (tmp & 0x20) ? 0x0081 : 0x007f;
        *pw_BufOut++ = (tmp & 0x10) ? 0x0081 : 0x007f;
        *pw_BufOut++ = (tmp & 0x08) ? 0x0081 : 0x007f;
        *pw_BufOut++ = (tmp & 0x04) ? 0x0081 : 0x007f;
        *pw_BufOut++ = (tmp & 0x02) ? 0x0081 : 0x007f;
        *pw_BufOut++ = (tmp & 0x01) ? 0x0081 : 0x007f; 
    }
}

ends up being faster. I’d expect that an optimized build of that function would have everything in registers or encoded into the instructions except for a single read of each input byte and a single write of each output word. Or pretty close to that.

You might be able to further optimize by acting on more than one input byte at a time, but then you have to deal with alignment issues and how to handle input buffers that aren’t a multiple of the chunk size you’re dealing with. Those aren’t problems that are too hard to deal with, but they do complicate things, and it’s unclear what kind of improvement you might be able to expect.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have written a function which reads an input buffer of bytes and produces

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply