Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8383897
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T17:16:56+00:00 2026-06-09T17:16:56+00:00

In my code the following lines are currently the hotspot: int table1[256] = /*…*/;

  • 0

In my code the following lines are currently the hotspot:

int table1[256] = /*...*/;
int table2[512] = /*...*/;
int table3[512] = /*...*/;

int* result = /*...*/;
for(int r = 0; r < r_end; ++r)
{
    std::uint64_t bits = bit_reader.value(); // 64 bits, no assumption regarding bits.

    // The get_ functions are table lookups from the highest word of the bits variable.

    struct entry
    {
        int sign_offset : 5;
        int r_offset    : 4;        
        int x           : 7;        
    };

    // NOTE: We are only interested in the highest word in the bits variable.

    entry e;
    if(is_in_table1(bits)) // branch prediction should work well here since table1 will be hit more often than 2 or 3, and 2 more often than 3.
        e = reinterpret_cast<const entry&>(table1[get_table1_index(bits)]);
    else if(is_in_table2(bits))
        e = reinterpret_cast<const entry&>(table2[get_table2_index(bits)]);
    else
        e = reinterpret_cast<const entry&>(table3[get_table3_index(bits)]);

    r                 += e.r_offset; // r is 18 bits, top 14 bits are always 0.
    int x              = e.x; // x is 14 bits, top 18 bits are always 0.        
    int sign_offset    = e.sign_offset;

    assert(sign_offset <= 16 && sign_offset > 0);

    // The following is the hotspot.

    int sign    = 1 - (bits >> (63 - sign_offset) & 0x2);
    (*result++) = ((x << 18) * sign) | r; // 32 bits

    // End of hotspot

    bit_reader.skip(sign_offset); // sign_offset is the last bit used.
}

Though I haven’t figured out how to further optimize this, maybe something from intrinsics for Operations at Bit-Granularity, __shiftleft128 or _rot could be useful?

Note that I am also doing processing of the resulting data on the GPU, so the important thing is to get something into result which the GPU then can use to calculate the correct.

Suggestions?

EDIT:

Added table look-up.

EDIT:

            int sign = 1 - (bits >> (63 - e.sign_offset) & 0x2);
000000013FD6B893  and         ecx,1Fh  
000000013FD6B896  mov         eax,3Fh  
000000013FD6B89B  sub         eax,ecx  
000000013FD6B89D  movzx       ecx,al  
000000013FD6B8A0  shr         r8,cl  
000000013FD6B8A3  and         r8d,2  
000000013FD6B8A7  mov         r14d,1  
000000013FD6B8AD  sub         r14d,r8d  
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T17:16:58+00:00Added an answer on June 9, 2026 at 5:16 pm

    I think this is the fastest solution:

    *result++ = (_rotl64(bits, sign_offset) << 31) | (x << 18) | (r << 0); // 32 bits
    

    And then correct x depending on whether the sign bit is set or not on the GPU.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have the following lines of code: if(std::binary_search(face_verts.begin(), face_verts.end(), left_right_vert[0]) && std::binary_search(face_verts.begin(), face_verts.end(), left_right_vert[1]))
I currently have the following lines of code in a script: set -A ARRAY
Currently I have the following 2 lines of code errors.add_to_base I18n.t :error_message if value
Currently, the following code shows a blank line if Address2 (which comes from the
Encounter following lines of code, but couldn't understand it. What is this (/ ...
In the following lines of code, I need to adjust the pointer pm by
Take the following lines of code that would work fine in a c# asp.net
Given the following lines of code which is using JQTOUCH: $('#customers').bind('pageAnimationEnd', function(e, info){ if
I am using the following lines of code to download and save an html
I'm using the following lines of code in my .htaccess file to create redirects.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.