Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7805061
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 2, 20262026-06-02T02:06:10+00:00 2026-06-02T02:06:10+00:00

For a hobby project I’m working on, I need to emulate certain 64-bit integer

  • 0

For a hobby project I’m working on, I need to emulate certain 64-bit integer operations on a x86 CPU, and it needs to be fast.

Currently, I’m doing this via MMX instructions, but that’s really a pain to work with, because I have to flush the fp register state all the time (and because most MMX instructions deal with signed integers, and I need unsigned behavior).

So I’m wondering if the SSE/optimization gurus here on SO can come up with a better implementation using SSE.

The operations I need are the following (quite specific) ones:

uint64_t X, Y;

X = 0;
X = 1;
X << 1;
X != Y;
X + 1;
X & 0x1 // get lsb
X | 0x1 // set lsb
X > Y;

Specifically, I don’t need general-purpose addition or shifting, for example, just add one and left-shift one. Really, just the exact operations shown here.

Except, of course, on x86, uint64_t is emulated by using two 32-bit scalars, which is slow (and, in my case, simply doesn’t work, because I need loads/stores to be atomic, which they won’t be when loading/storing two separate registers).

Hence, I need a SIMD solution.
Some of these operations are trivial, supported by SSE2 already. Others (!= and <) require a bit more work.

Suggestions?
SSE and SSE2 are fine. It’d take some persuasion to permit SSE3, and SSE4 is probably out of the question (A CPU which supports SSE4 is likely to run 64-bit anyway, and so I don’t need these workarounds)

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-02T02:06:11+00:00Added an answer on June 2, 2026 at 2:06 am

    SSE2 has direct support for some 64-bit integer operations:

    Set both elements to 0:

    __m128i z = _mm_setzero_si128();
    

    Set both elements to 1:

    __m128i z = _mm_set1_epi64x(1);      // also works for variables.
    __m128i z = _mm_set_epi64x(hi, lo);  // elements can be different
    
    __m128i z = _mm_set_epi32(0,1,0,1);  // if any compilers refuse int64_t in 32-bit mode.  (None of the major ones do.)
    

    Set/load the low 64 bits, zero-extending to __m128i

    // supported even in 32-bit mode, and listed as an intrinsic for MOVQ
    // so it should be atomic on aligned integers.
    _mm_loadl_epi64((const __m128i*)p);     // movq or movsd 64-bit load
    
    _mm_cvtsi64x_si128(a);      // only ICC, others refuse in 32-bit mode
    _mm_loadl_epi64((const __m128i*)&a);  // portable for a value instead of pointer
    

    Things based on _mm_set_epi32 can get compiled into a mess by some compilers, so _mm_loadl_epi64 appears to be the best bet across MSVC and ICC as well as gcc/clang, and should actually be safe for your requirement of atomic 64-bit loads in 32-bit mode. See it on the Godbolt compiler explorer

    Vertically add/subtract each 64-bit integer:

    __m128i z = _mm_add_epi64(x,y)
    __m128i z = _mm_sub_epi64(x,y)
    

    http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_integer_arithmetic.htm#intref_sse2_integer_arithmetic

    Left Shift:

    __m128i z = _mm_slli_epi64(x,i)   // i must be an immediate
    

    http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_int_shift.htm

    Bitwise operators:

    __m128i z = _mm_and_si128(x,y)
    __m128i z = _mm_or_si128(x,y)
    

    http://software.intel.com/sites/products/documentation/studio/composer/en-us/2011/compiler_c/intref_cls/common/intref_sse2_integer_logical.htm

    SSE doesn’t have increments, so you’ll have to use a constant with 1.


    Comparisons are harder since there’s no 64-bit support until SSE4.1 pcmpeqq and SSE4.2 pcmpgtq

    Here’s the one for equality:

    __m128i t = _mm_cmpeq_epi32(a,b);
    __m128i z = _mm_and_si128(t,_mm_shuffle_epi32(t,177));
    

    This will set the each 64-bit element to 0xffffffffffff (aka -1) if they are equal. If you want it as a 0 or 1 in an int, you can pull it out using _mm_cvtsi32_si128() and add 1. (But sometimes you can do total -= cmp_result; instead of converting and adding.)

    And Less-Than: (not fully tested)

    a = _mm_xor_si128(a,_mm_set1_epi32(0x80000000));
    b = _mm_xor_si128(b,_mm_set1_epi32(0x80000000));
    __m128i t = _mm_cmplt_epi32(a,b);
    __m128i u = _mm_cmpgt_epi32(a,b);
    __m128i z = _mm_or_si128(t,_mm_shuffle_epi32(t,177));
    z = _mm_andnot_si128(_mm_shuffle_epi32(u,245),z);
    

    This will set the each 64-bit element to 0xffffffffffff if the corresponding element in a is less than b.


    Here’s are versions of “equals” and “less-than” that return a bool. They return the result of the comparison for the bottom 64-bit integer.

    inline bool equals(__m128i a,__m128i b){
        __m128i t = _mm_cmpeq_epi32(a,b);
        __m128i z = _mm_and_si128(t,_mm_shuffle_epi32(t,177));
        return _mm_cvtsi128_si32(z) & 1;
    }
    inline bool lessthan(__m128i a,__m128i b){
        a = _mm_xor_si128(a,_mm_set1_epi32(0x80000000));
        b = _mm_xor_si128(b,_mm_set1_epi32(0x80000000));
        __m128i t = _mm_cmplt_epi32(a,b);
        __m128i u = _mm_cmpgt_epi32(a,b);
        __m128i z = _mm_or_si128(t,_mm_shuffle_epi32(t,177));
        z = _mm_andnot_si128(_mm_shuffle_epi32(u,245),z);
        return _mm_cvtsi128_si32(z) & 1;
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm currently working on a hobby project, written in Java, containing about two different
I'm working on a hobby project involving a rather CPU-intensive calculation. The problem is
I'm working on a hobby project where I need to call a managed delegate
I'm currently working on a hobby project in Rails, and stumbled upon a what
I am working on a simple hobby project that checks when users connect/disconnect from
I'm working on a Minecraft-like engine as a hobby project to see how far
I'm writing an iPhone app as a hobby project and it will need a
I'm new to JavaScript, and working on a hobby project with a few developers.
I'm working on a small hobby project for personal use, which requires a way
I'm working on a hobby project of the game Baroque Chess . For those

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.