Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 3394894
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 18, 20262026-05-18T04:13:29+00:00 2026-05-18T04:13:29+00:00

I have this function which uses SSE2 to add some values together it’s supposed

  • 0

I have this function which uses SSE2 to add some values together it’s supposed to add lhs and rhs together and store the result back into lhs:

template<typename T>
void simdAdd(T *lhs,T *rhs)
{
    asm volatile("movups %0,%%xmm0"::"m"(lhs));
    asm volatile("movups %0,%%xmm1"::"m"(rhs));

    switch(sizeof(T))
    {
        case sizeof(uint8_t):
        asm volatile("paddb %%xmm0,%%xmm1":);
        break;

        case sizeof(uint16_t):
        asm volatile("paddw %%xmm0,%%xmm1":);
        break;

        case sizeof(float):
        asm volatile("addps %%xmm0,%%xmm1":);
        break;

        case sizeof(double):
        asm volatile("addpd %%xmm0,%%xmm1":);
        break;

        default:
        std::cout<<"error"<<std::endl;
        break;
    }

    asm volatile("movups %%xmm0,%0":"=m"(lhs));
}

and my code uses the function like this:

float *values=new float[4];
float *values2=new float[4];

values[0]=1.0f;
values[1]=2.0f;
values[2]=3.0f;
values[3]=4.0f;

values2[0]=1.0f;
values2[1]=2.0f;
values2[2]=3.0f;
values2[3]=4.0f;

simdAdd(values,values2);
for(uint32_t count=0;count<4;count++) std::cout<<values[count]<<std::endl;

However this isn’t working because when the code runs it outputs 1,2,3,4 instead of 2,4,6,8

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-18T04:13:30+00:00Added an answer on May 18, 2026 at 4:13 am

    I’ve found that inline assembly support isn’t reliable in most modern compilers (as in, the implementations are just plain buggy). You are generally better off using compiler intrinsics which are declarations that look like C functions, but actually compile to a specific opcode.

    Intrinsics let you specify an exact sequence of opcodes, but leave the register coloring to the compiler. It’s much more reliable than trying to move data between C variables and asm registers, which is where inline assemblers have always fallen down for me. It also lets the compiler schedule your instructions, which can provide better performance if it works around pipeline hazards. Ie, in this case you could do

    void simdAdd(float *lhs,float *rhs)
    {
       _mm_storeu_ps( lhs, _mm_add_ps(_mm_loadu_ps( lhs ), _mm_loadu_ps( rhs )) );
    }
    

    In your case, anyway, you’ve two problems:

    1. The terrible GCC inline assembly syntax which makes great confusion of the difference between pointers and values. Use *lhs and *rhs instead of just lhs and rhs; apparently the “=m” syntax means “implicitly use a pointer to this thing that I’m passing you instead of the thing itself.”
    2. GCC has a source,destination syntax — The addps stores its result in the second parameter, so you you need to output xmm1, not xmm0.

    I’ve put a fixed example on codepad (to avoid cluttering up this answer, and to demonstrate that it works).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

This is a Windows Forms application. I have a function which captures some mouse
I have this function: RegisterGlobalHotKey(Keys.F6, MOD_SHIFT | MOD_CONTROL); which call an API to register
I have this jQuery which works fine $(li[id^='shop_id']).click( function () { alert(I clicked on
I have a bash script file which starts with a function definition, like this:
I have this function in my Javascript Code that updates html fields with their
I have this function in my head: <head> window.onload = function(){ var x =
I have this function from a plugin (from a previous post) // This method
I have this function... private string dateConvert(string datDate) { System.Globalization.CultureInfo cultEnGb = new System.Globalization.CultureInfo(en-GB);
I have this function to read in all ints from the file. The problem
I have this little function function makewindows(){ child1 = window.open (about:blank); child1.document.write(<?php echo htmlspecialchars(json_encode($row2['ARTICLE_DESC']),

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.