Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7810963
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T03:58:57+00:00 2026-06-02T03:58:57+00:00

I am currently experimenting with the creation of highly-optimized, reusable functions for a library

  • 0

I am currently experimenting with the creation of highly-optimized, reusable functions for a library of mine. For instance, I write the function “is power of 2” the following way:

template<class IntType>  
inline bool is_power_of_two( const IntType x )
{
    return (x != 0) && ((x & (x - 1)) == 0);
}

This is a portable, low-maintenance implementation as an inline C++ template. This code is compiled by VC++ 2008 to the following code with branches:

is_power_of_two PROC
    test    rcx, rcx
    je  SHORT $LN3@is_power_o
    lea rax, QWORD PTR [rcx-1]
    test    rax, rcx
    jne SHORT $LN3@is_power_o
    mov al, 1
    ret 0
$LN3@is_power_o:
    xor al, al
    ret 0
is_power_of_two ENDP

I found also the implementation from here: “The bit twiddler”, which would be coded in assembly for x64 as follows:

is_power_of_two_fast PROC
    test rcx, rcx
    je  SHORT NotAPowerOfTwo
    lea rax, [rcx-1]
    and rax, rcx
    neg rax
    sbb rax, rax
    inc rax
    ret
NotAPowerOfTwo:
    xor rax, rax
    ret
is_power_of_two_fast ENDP

I tested both subroutines written separately from C++ in an assembly module (.asm file), and the second one works about 20% faster!

Yet the overhead of the function call is considerable: if I compare the second assembly implementation “is_power_of_two_fast” to the inline’d-version of the template function, the latter is faster despite branches!

Unfortunately, the new conventions for x64 specify that no inline assembly is allowed. One should instead use “intrinsic functions”.

Now the question: can I implement the faster version “is_power_of_two_fast” as a custom intrinsic function or something similar, so that it can be used inline? Or alternatively, is it possible to somehow force the compiler to produce the low-branch version of the function?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T03:59:00+00:00Added an answer on June 2, 2026 at 3:59 am

    Even VC 2005 is capable of producing code with sbb instruction.

    for C code

    bool __declspec(noinline) IsPowOf2(unsigned int a)
    {
        return (a>=1)&((a&(a-1))<1);
    }
    

    compiles to the following

    00401000  lea         eax,[ecx-1] 
    00401003  and         eax,ecx 
    00401005  cmp         eax,1 
    00401008  sbb         eax,eax 
    0040100A  neg         eax  
    0040100C  cmp         ecx,1 
    0040100F  sbb         ecx,ecx 
    00401011  add         ecx,1 
    00401014  and         eax,ecx 
    00401016  ret          
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am currently experimenting with an avatar creation iOS app for a client, where
I'm currently experimenting with Yesod by following the tutorial on the Yesod Wiki .
I'm currently experimenting with using WCF Data Services as a way to open up
I'm currently experimenting with generics in C#, and came up with the following challenge
I'm currently experimenting with EJB3 as a prestudy for a major project at work.
I'm currently experimenting with the sample SoftKeyboard on my tablet (Android 3.2). When I
Currently, i'm experimenting with a very simple GUI drawing ... engine (i guess you
I am experimenting with filter_input and filter_var and I am currently trying to sanitize
I am experimenting with writing a toy compiler in ocaml. Currently, I am trying
I'm experimenting with Protocol Buffers in an existing, fairly vanilla Maven 2 project. Currently,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.