Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6126743
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T16:24:53+00:00 2026-05-23T16:24:53+00:00

I’m using a 128 bit integer counter in the very inner loops of my

  • 0

I’m using a 128 bit integer counter in the very inner loops of my C++ code. (Irrelevant background: The actual application is evaluating finite difference equations on a regular grid, which involves repetitively incrementing large integers, and even 64 bits isn’t enough precision because small rounding accumulates enough to affect the answers.)

I’ve represented the integer as two 64 bit unsigned longs. I now need to increment those values by a 128 bit constant. This isn’t hard, but you have to manually catch the carry from the low word to the high word.

I have working code something like this:

inline void increment128(unsigned long &hiWord, unsigned long &loWord)
  {
    const unsigned long hiAdd=0x0000062DE49B5241;
    const unsigned long loAdd=0x85DC198BCDD714BA;

    loWord += loAdd;
    if (loWord < loAdd) ++hiWord; // test_and_add_carry
    hiWord += hiAdd;
  }

This is tight and simple code. It works.

Unfortunately this is about 20% of my runtime. The killer line is that loWord test. If I remove it, I obviously get the wrong answers but the runtime overhead drops from 20% to 4%! So that carry test is especially expensive!

My question: Does C++ expose the hardware carry flags, even as an extension to GCC?
It seems like the additions could be done without the test-and-add-carry line above if the actual compiled instructions used an add using last carry instruction for the hiWord addition.
Is there a way to rewrite the test-and-add-carry line to get the compiler to use the intrinsic opcode?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T16:24:54+00:00Added an answer on May 23, 2026 at 4:24 pm

    Actually gcc will use the carry automatically if you write your code carefully…

    Current GCC can optimize hiWord += (loWord < loAdd); into add/adc (x86’s add-with-carry). This optimization was introduced in GCC5.3.

    • With separate uint64_t chunks in 64-bit mode: https://godbolt.org/z/S2kGRz.
    • And the same thing in 32-bit mode with uint32_t chunks: https://godbolt.org/z/9FC9vc

    (editor’s note: Of course the hard part is writing a correct full-adder with carry in and carry out; that’s hard in C and GCC doesn’t know how to optimize any that I’ve seen.)

    Also related: https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html can give you carry-out from unsigned, or signed-overflow detection.


    Older GCC, like GCC4.5, will branch or setc on the carry-out from an add, instead of using adc, and only used adc (add-with-carry) on the flag-result from an add if you used __int128. (Or uint64_t on a 32-bit target). See Is there a 128 bit integer in gcc? – only on 64-bit targets, supported since GCC4.1.

    I compiled this code with gcc -O2 -Wall -Werror -S:

    void increment128_1(unsigned long &hiWord, unsigned long &loWord)
    {
        const unsigned long hiAdd=0x0000062DE49B5241;
        const unsigned long loAdd=0x85DC198BCDD714BA;
    
        loWord += loAdd;
        if (loWord < loAdd) ++hiWord; // test_and_add_carry                                                                                                             
        hiWord += hiAdd;
    }
    
    void increment128_2(unsigned long &hiWord, unsigned long &loWord)
    {
        const unsigned long hiAdd=0x0000062DE49B5241;
        const unsigned long loAdd=0x85DC198BCDD714BA;
    
        loWord += loAdd;
        hiWord += hiAdd;
        hiWord += (loWord < loAdd); // test_and_add_carry                                                                                                               
    }
    

    This is the assembly for increment128_1:

    .cfi_startproc
            movabsq     $-8801131483544218438, %rax
            addq        (%rsi), %rax
            movabsq     $-8801131483544218439, %rdx
            cmpq        %rdx, %rax
            movq        %rax, (%rsi)
            ja  .L5
            movq        (%rdi), %rax
            addq        $1, %rax
    .L3:
            movabsq     $6794178679361, %rdx
            addq        %rdx, %rax
            movq        %rax, (%rdi)
            ret
    

    …and this is the assembly for increment128_2:

            movabsq     $-8801131483544218438, %rax
            addq        %rax, (%rsi)
            movabsq     $6794178679361, %rax
            addq        (%rdi), %rax
            movabsq     $-8801131483544218439, %rdx
            movq        %rax, (%rdi)
            cmpq        %rdx, (%rsi)
            setbe       %dl
            movzbl      %dl, %edx
            leaq        (%rdx,%rax), %rax
            movq        %rax, (%rdi)
            ret
    

    Note the lack of conditional branches in the second version.

    [edit]

    Also, references are often bad for performance, because GCC has to worry about aliasing… It is often better to just pass things by value. Consider:

    struct my_uint128_t {
        unsigned long hi;
        unsigned long lo;
    };
    
    my_uint128_t increment128_3(my_uint128_t x)
    {
        const unsigned long hiAdd=0x0000062DE49B5241;
        const unsigned long loAdd=0x85DC198BCDD714BA;
    
        x.lo += loAdd;
        x.hi += hiAdd + (x.lo < loAdd);
        return x;
    }
    

    Assembly:

            .cfi_startproc
            movabsq     $-8801131483544218438, %rdx
            movabsq     $-8801131483544218439, %rax
            movabsq     $6794178679362, %rcx
            addq        %rsi, %rdx
            cmpq        %rdx, %rax
            sbbq        %rax, %rax
            addq        %rcx, %rax
            addq        %rdi, %rax
            ret
    

    This is actually the tightest code of the three.

    …OK so none of them actually used the carry automatically :-). But they do avoid the conditional branch, which I bet is the slow part (since the branch prediction logic will get it wrong half the time).

    [edit 2]

    And one more, which I stumbled across doing a little searching. Did you know GCC has built-in support for 128-bit integers?

    typedef unsigned long my_uint128_t __attribute__ ((mode(TI)));
    
    my_uint128_t increment128_4(my_uint128_t x)
    {
        const my_uint128_t hiAdd=0x0000062DE49B5241;
        const unsigned long loAdd=0x85DC198BCDD714BA;
    
        return x + (hiAdd << 64) + loAdd;
    }
    

    The assembly for this one is about as good as it gets:

            .cfi_startproc
            movabsq     $-8801131483544218438, %rax
            movabsq     $6794178679361, %rdx
            pushq       %rbx
            .cfi_def_cfa_offset 16
            addq        %rdi, %rax
            adcq        %rsi, %rdx
            popq        %rbx
            .cfi_offset 3, -16
            .cfi_def_cfa_offset 8
            ret
    

    (Not sure where the push/pop of ebx came from, but this is still not bad.)

    All of these are with GCC 4.5.2, by the way.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm new to using the Perl treebuilder module for HTML parsing and can't figure
That's pretty much it. I'm using Nokogiri to scrape a web page what has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I am reading a book about Javascript and jQuery and using one of the
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and
I have this code to decode numeric html entities to the UTF8 equivalent character.
We're building an app, our first using Rails 3, and we're having to build
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
I have this code: - (void)parser:(NSXMLParser *)parser foundCDATA:(NSData *)CDATABlock { NSString *someString = [[NSString

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.