Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6885651
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T05:39:40+00:00 2026-05-27T05:39:40+00:00

Sorry I don’t have a good title… I was reading this thread: Vector Matrix

  • 0

Sorry I don’t have a good title…

I was reading this thread: Vector Matrix Multiplication In SSE

The original poster had the following code

// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]

// xmm0 = (v0,v0,v0,v0)
// xmm1 = (v1,v1,v1,v1)
// xmm2 = (v2,v2,v2,v2)
// xmm3 = (v3,v3,v3,v3)
shufps xmm3, xmm0, 255
shufps xmm2, xmm0, 170
shufps xmm1, xmm0, 85
shufps xmm0, xmm0, 0

Someone said the followings:

But what really happens according to the manual: (a, b, c, d) means a are bits 0 to 31, b are bits 32 to 63 and so on

// xmm0 = (v0,v1,v2,v3)
movups xmm0, [eax]

// xmm0 = (v0, v0, v0, v0)
shufps xmm0, xmm0, 0

This makes sense to me since in linear array model [elt0, elt1, elt2, ….] elt0 is Array[0].

What confuses me is, according to the manual the bitmap of xmm register is [127…0] (see the picture below).

I was like the original poster looking at the bitmap and thought the leftmost of [elt0, elt2, elt3, elt4] was the bit “11”.

So if I want xmm0 contains only v0

shufps xmm0, xmm0, 0xFF  // 11 11 11 11  === 0xFF

Which explanation is correct?

enter image description here

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T05:39:41+00:00Added an answer on May 27, 2026 at 5:39 am

    There may be some confusion because bits in xmm registers (and all other registers BTW) are numbered right-to-left, i.e. the lowest bit is on the right, and the highest bit is on the left:

    xmm0 = [bit 127, bit 126, ..., bit 1, bit 0]
    

    If you consider the content of xmm register as 32-bit dwords, they are also arranged right-to-left:

    xmm0 = [dword 3, dword 2, dword 1, dword 0]
    

    The source of this confusion is that if you have an array in memory

    float A[4] = { 0.0f, 1.0f, 2.0f, 3.0f };
    

    and you load this array into xmm register, the elements appear in the xmm register in the reversed order:

    ; xmm0 = (A3 = 3.0f, A2 = 2.0f, A1 = 1.0f, A0 = 0.0f) after the load
    movups xmm0, [A]
    

    Therefore, the right way to copy the first dword into all dwords in an xmm register is

    shufps xmm0, xmm0, 0
    

    Also, if you want to do load-and-broadcast of a single float into all elements of an xmm register, for performance reasons it is better to use

    ; MOVSS can be much faster than MOVUPS, and is never slower
    ; Load A[0] into low dword of xmm0
    movss xmm0, [A]
    ; Copy low dword of xmm0 to all dwords of xmm0
    shufps xmm0, xmm0, 0
    

    AVX instruction set (supported in the recent Intel Sandy Bridge and AMD Bulldozer CPUs) has a special instruction vbroadcastss which performs load-and-broadcast:

    ; xmm0 = (A[0], A[0], A[0], A[0]) after execution of vbroadcastss
    vbroadcastss xmm0, [A]
    

    SSE3 instruction set includes a similar instruction MOVDDUP, which, however, only works for doubles

    const double B = 2.718281828459045;
    
    ; xmm0 = ( 2.718281828459045, 2.718281828459045 ) after execution of movddup
    movddup xmm0, [B]
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Sorry, this's my first time to ask a question here. So, I don't have
sorry - I should know this but I don't. I have computed the position
Sorry, I don't have the exact code with me, but hopefully this works with
Sorry, I really don't know how to summarize the title of this question. So,
Sorry for the title but I don't know other way of asking. EDITED FOR
Sorry for the basic question - I'm a .NET developer and don't have much
Sorry for the poor title,I'm new to OOP so I don't know what is
sorry for the bad title, I don't know how to describe my problem. I
Sorry the title isn't more help. I have a database of media-file URLs that
Sorry that I don't have an example. However I'm creating a scrolling box (scrolls

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.