Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 841715
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 15, 20262026-05-15T05:50:39+00:00 2026-05-15T05:50:39+00:00

I have written a function which reads an input buffer of bytes and produces

  • 0

I have written a function which reads an input buffer of bytes and produces an output buffer of words where every word can be either 0x0081 for each ON bit of the input buffer or 0x007F for each OFF bit. The length of the input buffer is given. Both arrays have enough physical place. I also have about 2Kbyte free RAM which I can use for lookup tables or so.

Now, I found that this function is my bottleneck in a real time application. It will be called very frequently. Can you please suggest a way how to optimize this function? I see one possibility could be to use only one buffer and do in-place substitution.

void inline BitsToWords(int8    *pc_BufIn, 
                        int16   *pw_BufOut, 
                        int32   BufInLen)
{
 int32 i,j,z=0;

 for(i=0; i<BufInLen; i++)
 {
  for(j=0; j<8; j++, z++)
  {
   pw_BufOut[z] = 
                    ( ((pc_BufIn[i] >> (7-j))&0x01) == 1? 
                    0x0081: 0x007f );
  }
 }
}

Please do not offer any library-, compiler specific or CPU/Hardware specific optimization, because it is a multi-platform project.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-15T05:50:39+00:00Added an answer on May 15, 2026 at 5:50 am

    I also have about 2Kbyte free RAM which I can use for lookup tables

    Your lookup tables can placed in a const array at compile time, so it could be in ROM – does this give you room for the straightforward 4KB table?

    If you can afford 4KB of ROM space, the only problem is building the table as an initialized array in a .c file – but that only has to be done once, and you can write a script to do it (which may help ensure it’s correct and may also help if you decide that the table needs to change for some reason in the future).

    You’d have to profile to ensure that the copy from ROM to the destination array is actually faster than calculating what needs to go into the destination – I wouldn’t be surprised if something along the lines of:

    /* untested code - please forgive any bonehead errors */
    void inline BitsToWords(int8    *pc_BufIn, 
                            int16   *pw_BufOut, 
                            int32   BufInLen)
    {
        while (BufInLen--) {
            unsigned int tmp = *pc_BufIn++;
    
            *pw_BufOut++ = (tmp & 0x80) ? 0x0081 : 0x007f;
            *pw_BufOut++ = (tmp & 0x40) ? 0x0081 : 0x007f;
            *pw_BufOut++ = (tmp & 0x20) ? 0x0081 : 0x007f;
            *pw_BufOut++ = (tmp & 0x10) ? 0x0081 : 0x007f;
            *pw_BufOut++ = (tmp & 0x08) ? 0x0081 : 0x007f;
            *pw_BufOut++ = (tmp & 0x04) ? 0x0081 : 0x007f;
            *pw_BufOut++ = (tmp & 0x02) ? 0x0081 : 0x007f;
            *pw_BufOut++ = (tmp & 0x01) ? 0x0081 : 0x007f; 
        }
    }
    

    ends up being faster. I’d expect that an optimized build of that function would have everything in registers or encoded into the instructions except for a single read of each input byte and a single write of each output word. Or pretty close to that.

    You might be able to further optimize by acting on more than one input byte at a time, but then you have to deal with alignment issues and how to handle input buffers that aren’t a multiple of the chunk size you’re dealing with. Those aren’t problems that are too hard to deal with, but they do complicate things, and it’s unclear what kind of improvement you might be able to expect.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.