Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8860435
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T15:15:58+00:00 2026-06-14T15:15:58+00:00

I am trying to convert the following code to SSE/AVX: float x1, x2, x3;

  • 0

I am trying to convert the following code to SSE/AVX:

float x1, x2, x3;
float a1[], a2[], a3[], b1[], b2[], b3[];
for (i=0; i < N; i++)
{
    if (x1 > a1[i] && x2 > a2[i] && x3 > a3[i] && x1 < b1[i] && x2 < b2[i] && x3 < b3[i])
    {
        // do something with i
    }
}

Here N is a small constant, let’s say 8. The if(…) statement evaluates to false most of the time.

First attempt:

__m128 x; // x1, x2, x3, 0
__m128 a[N]; // packed a1[i], a2[i], a3[i], 0 
__m128 b[N]; // packed b1[i], b2[i], b3[i], 0

for (int i = 0; i < N; i++)
{
    __m128 gt_mask = _mm_cmpgt_ps(x, a[i]);
    __m128 lt_mask = _mm_cmplt_ps(x, b[i]);
    __m128 mask = _mm_and_ps(gt_mask, lt_mask);
    if (_mm_movemask_epi8 (_mm_castps_si128(mask)) == 0xfff0)
    {
        // do something with i
    }
}

This works, and is reasonably fast. The question is, is there be a more efficient way of doing this? In particular, if there is a register with results from SSE or AVX comparisons on floats (which put 0xffff or 0x0000 in that slot), how can the results of all the comparisons be (for example) and-ed or or-ed together, in general? Is PMOVMSKB (or the corresponding _mm_movemask intrinsic) the standard way to do this?

Also, how can AVX 256-bit registers be used instead of SSE in the code above?

EDIT:

Tested and benchmarked a version using VPTEST (from _mm_test* intrinsic) as suggested below.

__m128 x; // x1, x2, x3, 0
__m128 a[N]; // packed a1[i], a2[i], a3[i], 0
__m128 b[N]; // packed b1[i], b2[i], b3[i], 0
__m128i ref_mask = _mm_set_epi32(0xffff, 0xffff, 0xffff, 0x0000);

for (int i = 0; i < N; i++)
{
    __m128 gt_mask = _mm_cmpgt_ps(x, a[i]);
    __m128 lt_mask = _mm_cmplt_ps(x, b[i]);
    __m128 mask = _mm_and_ps(gt_mask, lt_mask);
    if (_mm_testc_si128(_mm_castps_si128(mask), ref_mask))
    {
        // do stuff with i
    }
}

This also works, and is fast. Benchmarking this (Intel i7-2630QM, Windows 7, cygwin 1.7, cygwin gcc 4.5.3 or mingw x86_64 gcc 4.5.3, N=8) shows this to be identical speed to the code above (within less than 0.1%) on 64bit. Either version of the inner loop runs in about 6.8 clocks average on data which is all in cache and for which the comparison returns always false.

Interestingly, on 32bit, the _mm_test version runs about 10% slower. It turns out that the compiler spills the masks after loop unrolling and has to re-read them back; this is probably unnecessary and can be avoided in hand-coded assembly.

Which method to choose? It seems that there is no compelling reason to prefer VPTEST over VMOVMSKPS. Actually, there is a slight reason to prefer VMOVMSKPS, namely it frees up a xmm register which would otherwise be taken up by the mask.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T15:15:59+00:00Added an answer on June 14, 2026 at 3:15 pm

    If you’re working with floats, you generally want to use MOVMSKPS (and the corresponding AVX instruction VMOVMSKPS) instead of PMOVMSKB.

    That aside, yes, this is one standard way of doing this; you can also use PTEST (VPTEST) to directly update the condition flags based on the result of an SSE or AVX AND or ANDNOT.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am trying to convert the following LINQ code from C# to VB.NET. Here
I am trying to convert the following code from Trefethen's Spectral Methods in MATLAB
I m trying to convert the following code to another AJAX call, in order
i am trying to convert the following code to vb.net but online converters return
I am trying to convert the following code to work with jquery: var req
I'm trying to convert the following code snippet from PHP to C# or VB.NET
I am trying to convert the following code into a Ternary Operator, but it
I'm trying to convert the following scriptlet code to EL. I tried the following
I was trying to convert the following c# code to vb.net. I see the
I am trying to convert string into NSDate using following code. NSString *old =

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.