Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7445877
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 29, 20262026-05-29T12:09:16+00:00 2026-05-29T12:09:16+00:00

Assume I have two vectors represented by two arrays of type double , each

  • 0

Assume I have two vectors represented by two arrays of type double, each of size 2. I’d like to add corresponding positions. So assume vectors i0 and i1, I’d like to add i0[0] + i1[0] and i0[1] + i1[1] together.

Since the type is double, I would need two registers. The trick would be to put i0[0] and i1[0] , and i0[1] and i1[1] in another and just add the register with itself.

My question is, if I call _mm_load_ps(i0[0]) and then _mm_load_ps(i1[0]), will that place them in the lower and upper 64-bits separately, or will it replace the register with the second load? How would I place both doubles in the same register, so I can call add_ps after?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-29T12:09:19+00:00Added an answer on May 29, 2026 at 12:09 pm

    I think what you want is this:

    double i0[2];
    double i1[2];
    
    __m128d x1 = _mm_load_pd(i0);
    __m128d x2 = _mm_load_pd(i1);
    __m128d sum = _mm_add_pd(x1, x2);
    // do whatever you want to with "sum" now
    

    When you do a _mm_load_pd, it puts the first double into the lower 64 bits of the register and the second into the upper 64 bits. So, after the loads above, x1 holds the two double values i0[0] and i0[1] (and similar for x2). The call to _mm_add_pd vertically adds the corresponding elements in x1 and x2, so after the addition, sum holds i0[0] + i1[0] in its lower 64 bits and i0[1] + i1[1] in its upper 64 bits.

    Edit: I should point out that there is no benefit to using _mm_load_pd instead of _mm_load_ps. As the function names indicate, the pd variety explicitly loads two packed doubles and the ps version loads four packed single-precision floats. Since these are purely bit-for-bit memory moves and they both use the SSE floating-point unit, there is no penalty to using _mm_load_ps to load in double data. And, there is a benefit to _mm_load_ps: its instruction encoding is one byte shorter than _mm_load_pd, so it is more efficient from an instruction cache sense (and potentially instruction decoding; I’m not an expert on all of the intricacies of modern x86 processors). The above code using _mm_load_ps would look like:

    double i0[2];
    double i1[2];
    
    __m128d x1 = (__m128d) _mm_load_ps((float *) i0);
    __m128d x2 = (__m128d) _mm_load_ps((float *) i1);
    __m128d sum = _mm_add_pd(x1, x2);
    // do whatever you want to with "sum" now
    

    There is no function implied by the casts; it simply makes the compiler reinterpret the SSE register’s contents as holding doubles instead of floats so that it can be passed into the double-precision arithmetic function _mm_add_pd.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Let's assume we have two arrays of the same size - A and B
Assume I have two tables, Student Test Id Name TestId Type StudentId -- ----
Assume that we have two sets: A=(a_1,a_2,...,a_m) and B=(b_1,b_2,...,a_n) (Not necessarily of same size).
Assume that I have two arrays as follow: $array1 = array(1, 3, 5); $array2
assume I have two objects article and comment . What I would like to
Assume we have two numeric vectors x and y . The Pearson correlation coefficient
Let's assume I have two entities - project team and employee. Each employee could
Assume I have two strings (or byte arrays) A and B which both have
Lets assume we have two string arrays string[] array1 = {aa, bb, cc}; string[]
Assume I have a template (called ExampleTemplate) that takes two arguments: a container type

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.