Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8176913
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T23:22:12+00:00 2026-06-06T23:22:12+00:00

The game I’m converting is operating on 8-bit palette texture, and nearly every frame

  • 0

The game I’m converting is operating on 8-bit palette texture, and nearly every frame I have to update parts of that texture to OpenGL texture for rendering. It looks like this:

unsigned short RGB565PaletteLookupTable[256];   // Lookup table

unsigned char* Src;                             // Source data
unsigned short* Dst;                            // Destination buffer
int SrcPitch;                                   // Source data row length
int OriginX, OriginY, Width, Height;            // Subrectangle to copy

assert( Width % 4 == 0 );

int SrcOffset = SrcPitch-Width;
Src += OriginY*SrcPitch+OriginX;

int x, y;

for( y = OriginY; y < OriginY+Height; ++y, Src += SrcOffset )
{
    for( x = OriginX; x < OriginX+Width; x += 4 )
    {
        *Dst++ = RGB565PaletteLookupTable[*Src++];
        *Dst++ = RGB565PaletteLookupTable[*Src++];
        *Dst++ = RGB565PaletteLookupTable[*Src++];
        *Dst++ = RGB565PaletteLookupTable[*Src++];
    }
}

This code takes 17% of main thread time during the game, so I’m looking for ways to speed it up. Data goes directly to glTexSubImage2D(), so I can’t change anything in destination buffer. It comes from code in the game which is ancient and not documented, and no one knows how it works anymore, so I can’t mess much with it either. The lookup table is provided by this ancient code as well, and can change during game.

Would it be possible to speed up this code using Accelerate framework / assembly instructions / any other means? I read examples of direct conversion of RGB888 to RGB565, but these didn’t need to use lookup tables. Where should I look to learn how to speed it up optimally?

UPDATE: I found that OriginX is also 4-aligned, and was able to refine the code in this way:

unsigned long RGB565PaletteLookupTable[256];   // Lookup table

unsigned char* Src;                             // Source data
unsigned long* Dst;                            // Destination buffer
int SrcPitch;                                   // Source data row length
int OriginX, OriginY, Width, Height;            // Subrectangle to copy

assert( Width % 4 == 0 );

int SrcOffset = SrcPitch-Width;
Src += OriginY*SrcPitch+OriginX;
SrcOffset >>= 2;

int x, y;

unsigned long* LSrc = (unsigned long*)Src;

for( y = OriginY; y < OriginY+Height; ++y, LSrc += SrcOffset )
{
    for( x = OriginX; x < OriginX+Width; x += 4 )
    {
        unsigned long Indexes = *LSrc++;
        unsigned long Result = RGB565PaletteLookupTable[ Indexes & 0xFF ];
        Indexes >>= 8;
        Result |= ( RGB565PaletteLookupTable[ Indexes & 0xFF ] << 16 );
        *Dst++ = Result;
        Indexes >>= 8;
        Result = RGB565PaletteLookupTable[ Indexes & 0xFF ];
        Indexes >>= 8;
        Result |= ( RGB565PaletteLookupTable[ Indexes & 0xFF ] << 16 );
        *Dst++ = Result;
    }
}

This code doesn’t as far as I can tell, use any unaligned memory accesses. It improved performance a bit, that is, it now takes 15.5% of main thread time. I was hoping for more speedup though.

In theory, each one of there lookup table operations is independent from previous ones and subsequent ones (apart from the fact that each of them reads from the same lookup table), so I was expecting there would be some SIMD instruction, or perhaps assembly instructions that would allow to look-up many pixels in parallel. Something like

_mm_movemask_ps( _mm_cmpneq_ps( _mm_loadu_ps( cmp1 ), _mm_loadu_ps( cmp2 ) ) ) )

which on Macs does the same thing as memcmp( cmp1, cmp2, 16 ), only 8 times faster.

I’ll continue looking for it now.

UPDATE: I determined that there seems to be no way of speeding up the table lookup using NEON instruction set. The table needs to be 512-bytes big, there’s no way to fit it entirely in ARM registers, VTBX NEON instruction can process up to 32 bytes at a time, and it also assumes that the size of the lookup result must equal the size of the index. There’s something which might solve as a solution of similar problem described in http://forums.arm.com/index.php?/topic/15521-8bit-look-up-table-by-neon-code/ , but it won’t fit mine. So making sure the alignment of all operands is correct seems to be the best possible answer for this problem.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T23:22:17+00:00Added an answer on June 6, 2026 at 11:22 pm

    The problem is with the cache. You do a lot of reads from Src and if it is unaligned by four (which might be the case, since OriginX most likely is arbitrary) the (*Src++) wastes cycles on unaligned reads.

    Try to enforce (OriginX % 4 == 0) and copy the remaining (OriginX % 4) pixels outside the main loop.

    Same with “*Dst++ = ” – is Dst is unaligned, it is bad. Try to combine the RGB565 pairs (two sequential *Dst writes) into one 32-bit copy. You may even try to overwrite some more pixels to make the loop simpler and then handle the border pixels.

    Hope you get the idea.

    The second way: offload the conversion to GPU.

    Create the 1D-texture for RGB565PaletteLookupTable and write a simple fragment shader which takes the (Src + RGB565PaletteLookupTable) and outputs the Dst (the glTexImage2D will then update the Src texture, not the Dst as you do now)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a simple game that is in progress. As of right now all
Making game of life I need to a have a grid that is 30x20
I have a game that I wrote. I am about ready to call it
My game units have an angle variable, and a targetAngle variable. These variables range
So I have this game database, where I have several users with fields, id,
Game.prototype.run = function() { window.setInterval(function() { var thisLoop = new Date().getTime(); this.update(); this.render(); lastLoop
//Handle game logic mcPlayer.update(); //create question mcMathQu.update(); the first update function of external as
I have a game server than can take requests from a user. A user
I have a game which displays an array of colored blocks. The user can
My program is a game that uses RMI to allow users to connect to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.