Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 4098606
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 20, 20262026-05-20T20:19:27+00:00 2026-05-20T20:19:27+00:00

I have made a lookup table that allows you to blend two single-byte channels

  • 0

I have made a lookup table that allows you to blend two single-byte channels (256 colors per channel) using a single-byte alpha channel using no floating point values (hence no float to int conversions). Each index in the lookup table corresponds to the value of 256ths of a channel, as related to an alpha value.

In all, to fully calculate a 3-channel RGB blend, it would require two lookups into the array per channel, plus an addition. This is a total of 6 lookups and 3 additions. In the example below, I split the colors into separate values for ease of demonstration. This example shows how to blend three channels, R G and B by an alpha value ranging from 0 to 256.

BYTE r1, r2, rDest;
BYTE g1, g2, gDest;
BYTE b1, b2, bDest;

BYTE av; // Alpha value
BYTE rem = 255 - av; // Remaining fraction

rDest = _lookup[r1][rem] + _lookup[r2][av];
gDest = _lookup[g1][rem] + _lookup[g2][av];
bDest = _lookup[b1][rem] + _lookup[b2][av];

It works great. Precise as you can get using 256 color channels. In fact, you would get the same exact values using the actual floating point calculations. The lookup table was calculated using doubles to begin with. The lookup table is too big to fit in this post (65536 bytes). (If you would like a copy of it, email me at ten.turtle.toes@gmail.com, but don’t expect a reply until tomorrow because I am going to sleep now.)

So… what do you think? Is it worth it or not?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-20T20:19:27+00:00Added an answer on May 20, 2026 at 8:19 pm

    I would be interested in seeing some benchmarks.

    There is an algorithm that can do perfect alpha blending without any floating point calculations or lookup tables. You can find more info in the following document (the algorithm and code is described at the end)

    I also did an SSE implementation of this long ago, if you are interested…

    void PreOver_SSE2(void* dest, const void* source1, const void* source2, size_t size)
    {
        static const size_t STRIDE = sizeof(__m128i)*4;
        static const u32 PSD = 64;
    
        static const __m128i round = _mm_set1_epi16(128);
        static const __m128i lomask = _mm_set1_epi32(0x00FF00FF);
    
        assert(source1 != NULL && source2 != NULL && dest != NULL);
        assert(size % STRIDE == 0);
    
        const __m128i* source128_1 = reinterpret_cast<const __m128i*>(source1);
        const __m128i* source128_2 = reinterpret_cast<const __m128i*>(source2);
        __m128i*       dest128 = reinterpret_cast<__m128i*>(dest);  
    
        __m128i d, s, a, rb, ag, t;
    
        for(size_t k = 0, length = size/STRIDE; k < length; ++k)    
        {
            // TODO: put prefetch between calculations?(R.N)
            _mm_prefetch(reinterpret_cast<const s8*>(source128_1+PSD), _MM_HINT_NTA);
            _mm_prefetch(reinterpret_cast<const s8*>(source128_2+PSD), _MM_HINT_NTA);   
    
            // work on entire cacheline before next prefetch
            for(int n = 0; n < 4; ++n, ++dest128, ++source128_1, ++source128_2)
            {
                // TODO: assembly optimization use PSHUFD on moves before calculations, lower latency than MOVDQA (R.N) http://software.intel.com/en-us/articles/fast-simd-integer-move-for-the-intel-pentiumr-4-processor/
    
                // TODO: load entire cacheline at the same time? are there enough registers? 32 bit mode (special compile for 64bit?) (R.N)
                s = _mm_load_si128(source128_1);        // AABGGRR
                d = _mm_load_si128(source128_2);        // AABGGRR
    
                // PRELERP(S, D) = S+D - ((S*D[A]+0x80)>>8)+(S*D[A]+0x80))>>8
                // T = S*D[A]+0x80 => PRELERP(S,D) = S+D - ((T>>8)+T)>>8
    
                // set alpha to lo16 from dest_
                a = _mm_srli_epi32(d, 24);          // 000000AA 
                rb = _mm_slli_epi32(a, 16);         // 00AA0000
                a = _mm_or_si128(rb, a);            // 00AA00AA
    
                rb = _mm_and_si128(lomask, s);      // 00BB00RR     
                rb = _mm_mullo_epi16(rb, a);        // BBBBRRRR 
                rb = _mm_add_epi16(rb, round);      // BBBBRRRR
                t = _mm_srli_epi16(rb, 8);          
                t = _mm_add_epi16(t, rb);
                rb = _mm_srli_epi16(t, 8);          // 00BB00RR 
    
                ag = _mm_srli_epi16(s, 8);          // 00AA00GG     
                ag = _mm_mullo_epi16(ag, a);        // AAAAGGGG     
                ag = _mm_add_epi16(ag, round);
                t = _mm_srli_epi16(ag, 8);
                t = _mm_add_epi16(t, ag);
                ag = _mm_andnot_si128(lomask, t);   // AA00GG00     
    
                rb = _mm_or_si128(rb, ag);          // AABGGRR      pack
    
                rb = _mm_sub_epi8(s, rb);           // sub S-[(D[A]*S)/255]
                d = _mm_add_epi8(d, rb);            // add D+[S-(D[A]*S)/255]
    
                _mm_stream_si128(dest128, d);
            }
        }   
        _mm_mfence();   //ensure last WC buffers get flushed to memory      
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

So I have made a webservice that interfaces with a set of data contained
I have a JavaScript class that I have made and put it into its
I have made a TForm derivative that acts like the drop down part of
I have a script that appends some rows to a table. One of the
I have made a SVG image, or more like mini application, for viewing graphs
I have made some code which exports some details of a journal article to
I have made a custom UserControl i Vb.net (windows application). How can I add
I have made a little app for signing up for an event. User input
I have made a program in c and wanted to see, how much memory
I have made a new windows service which works fine using barebone code (just

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.