Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8007573
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T17:52:28+00:00 2026-06-04T17:52:28+00:00

The problem can be described as follow. Input __m256d a, b, c, d Output

  • 0

The problem can be described as follow.

Input

__m256d a, b, c, d

Output

__m256d s = {a[0]+a[1]+a[2]+a[3], b[0]+b[1]+b[2]+b[3], 
             c[0]+c[1]+c[2]+c[3], d[0]+d[1]+d[2]+d[3]}

Work I have done so far

It seemed easy enough: two VHADD with some shuffling in-between but in fact combining all permutations featured by AVX can’t generate the very permutation needed to achieve that goal. Let me explain:

VHADD x, a, b => x = {a[0]+a[1], b[0]+b[1], a[2]+a[3], b[2]+b[3]}
VHADD y, c, d => y = {c[0]+c[1], d[0]+d[1], c[2]+c[3], d[2]+d[3]}

Were I able to permute x and y in the same manner to get

x1 = {a[0]+a[1], a[2]+a[3], c[0]+c[1], c[2]+c[3]}
y1 = {b[0]+b[1], b[2]+b[3], d[0]+d[1], d[2]+d[3]}

then

VHADD s, x1, y1 => s1 = {a[0]+a[1]+a[2]+a[3], b[0]+b[1]+b[2]+b[3], 
                         c[0]+c[1]+c[2]+c[3], d[0]+d[1]+d[2]+d[3]}

which is the result I wanted.

Thus I just need to find how to perform

x,y => {x[0], x[2], y[0], y[2]}, {x[1], x[3], y[1], y[3]}

Unfortunately I came to the conclusion that this is provably impossible using any combination of VSHUFPD, VBLENDPD, VPERMILPD, VPERM2F128, VUNPCKHPD, VUNPCKLPD. The crux of the matter is that it is impossible to swap u[1] and u[2] in an instance u of __m256d.

Question

Is this really a dead end? Or have I missed a permutation instruction?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T17:52:31+00:00Added an answer on June 4, 2026 at 5:52 pm

    VHADD instructions are meant to be followed by regular VADD. The following code should give you what you want:

    // {a[0]+a[1], b[0]+b[1], a[2]+a[3], b[2]+b[3]}
    __m256d sumab = _mm256_hadd_pd(a, b);
    // {c[0]+c[1], d[0]+d[1], c[2]+c[3], d[2]+d[3]}
    __m256d sumcd = _mm256_hadd_pd(c, d);
    
    // {a[0]+a[1], b[0]+b[1], c[2]+c[3], d[2]+d[3]}
    __m256d blend = _mm256_blend_pd(sumab, sumcd, 0b1100);
    // {a[2]+a[3], b[2]+b[3], c[0]+c[1], d[0]+d[1]}
    __m256d perm = _mm256_permute2f128_pd(sumab, sumcd, 0x21);
    
    __m256d sum =  _mm256_add_pd(perm, blend);
    

    This gives the result in 5 instructions. I hope I got the constants right.

    The permutation that you proposed is certainly possible to accomplish, but it takes multiple instructions. Sorry that I’m not answering that part of your question.

    Edit: I couldn’t resist, here’s the complete permutation. (Again, did my best to try to get the constants right.) You can see that swapping u[1] and u[2] is possible, just takes a bit of work. Crossing the 128bit barrier is difficult in the first gen. AVX. I also want to say that VADD is preferable to VHADD because VADD has twice the throughput, even though it’s doing the same number of additions.

    // {x[0],x[1],x[2],x[3]}
    __m256d x;
    
    // {x[1],x[0],x[3],x[2]}
    __m256d xswap = _mm256_permute_pd(x, 0b0101);
    
    // {x[3],x[2],x[1],x[0]}
    __m256d xflip128 = _mm256_permute2f128_pd(xswap, xswap, 0x01);
    
    // {x[0],x[2],x[1],x[3]} -- not imposssible to swap x[1] and x[2]
    __m256d xblend = _mm256_blend_pd(x, xflip128, 0b0110);
    
    // repeat the same for y
    // {y[0],y[2],y[1],y[3]}
    __m256d yblend;
    
    // {x[0],x[2],y[0],y[2]}
    __m256d x02y02 = _mm256_permute2f128_pd(xblend, yblend, 0x20);
    
    // {x[1],x[3],y[1],y[3]}
    __m256d x13y13 = _mm256_permute2f128_pd(xblend, yblend, 0x31);
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Hello! My problem can be described the following way: I have some data which
This is the problem described in Programming pearls . I can not understand binary
I'm in search of an algorithm, which can handle the problem described below. I
My problem can be simplified down to making the following script work (which takes
Problem How can I do work in one thread and update a progressbar on
I have a problem with the load function described below. For some reason even
I've got a problem concerning combinatorics. Unfortunately, I can't describe it abstractly so I
My problem can be summed up by making this simple command works : nice
I think that this problem can be sorted using reflection (a technology which I'm
Seems like the slow Tomcat 7 startup problem can be resolved with metadata-complete set

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.