Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 571069
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 13, 20262026-05-13T13:28:06+00:00 2026-05-13T13:28:06+00:00

I need some idea how to write a C++ cross platform implementation of a

  • 0

I need some idea how to write a C++ cross platform implementation of a few parallelizable problems in a way so I can take advantage of SIMD (SSE, SPU, etc) if available. As well as I want to be able at run time to switch between SIMD and not SIMD.

How would you suggest me to approach this problem?
(Of course I don’t want to implement the problem multiple times for all possible options)

I can see how this might not be very easy task with C++ but I believe that I’m missing something. So far my idea looks like this…
A class cStream will be array of a single field. Using multiple cStreams I can achieve SoA (Structure of Arrays). Then using a few Functors I can fake Lambda function that I need to be executed over the whole cStream.

// just for example I'm not expecting this code to compile
cStream a; // something like float[1024]
cStream b;
cStream c;

void Foo()
{
    for_each(
        AssignSIMD(c, MulSIMD(AddSIMD(a, b), a)));
}

Where for_each will be responsible for incrementing the current pointer of the streams as well as inlining the functors’ body with SIMD and without SIMD.

something like so:

// just for example I'm not expecting this code to compile
for_each(functor<T> f)
{
#ifdef USE_SIMD
    if (simdEnabled)
        real_for_each(f<true>()); // true means use SIMD
    else
#endif
        real_for_each(f<false>());
}

Notice that if the SIMD is enabled is checked once and that the loop is around the main functor.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-13T13:28:06+00:00Added an answer on May 13, 2026 at 1:28 pm

    If someone is interested this is the dirty code I come with to test a new idea that I came with while reading about the library that Paul posted.

    Thanks Paul!

    // This is just a conceptual test
    // I haven't profile the code and I haven't verified if the result is correct
    #include <xmmintrin.h>
    
    
    // This class is doing all the math
    template <bool SIMD>
    class cStreamF32
    {
    private:
        void*       m_data;
        void*       m_dataEnd;
        __m128*     m_current128;
        float*      m_current32;
    
    public:
        cStreamF32(int size)
        {
            if (SIMD)
                m_data = _mm_malloc(sizeof(float) * size, 16);
            else
                m_data = new float[size];
        }
        ~cStreamF32()
        {
            if (SIMD)
                _mm_free(m_data);
            else
                delete[] (float*)m_data;
        }
    
        inline void Begin()
        {
            if (SIMD)
                m_current128 = (__m128*)m_data;
            else
                m_current32 = (float*)m_data;
        }
    
        inline bool Next()
        {
            if (SIMD)
            {
                m_current128++;
                return m_current128 < m_dataEnd;
            }
            else
            {
                m_current32++;
                return m_current32 < m_dataEnd;
            }
        }
    
        inline void operator=(const __m128 x)
        {
            *m_current128 = x;
        }
        inline void operator=(const float x)
        {
            *m_current32 = x;
        }
    
        inline __m128 operator+(const cStreamF32<true>& x)
        {
            return _mm_add_ss(*m_current128, *x.m_current128);
        }
        inline float operator+(const cStreamF32<false>& x)
        {
            return *m_current32 + *x.m_current32;
        }
    
        inline __m128 operator+(const __m128 x)
        {
            return _mm_add_ss(*m_current128, x);
        }
        inline float operator+(const float x)
        {
            return *m_current32 + x;
        }
    
        inline __m128 operator*(const cStreamF32<true>& x)
        {
            return _mm_mul_ss(*m_current128, *x.m_current128);
        }
        inline float operator*(const cStreamF32<false>& x)
        {
            return *m_current32 * *x.m_current32;
        }
    
        inline __m128 operator*(const __m128 x)
        {
            return _mm_mul_ss(*m_current128, x);
        }
        inline float operator*(const float x)
        {
            return *m_current32 * x;
        }
    };
    
    // Executes both functors
    template<class T1, class T2>
    void Execute(T1& functor1, T2& functor2)
    {
        functor1.Begin();
        do
        {
            functor1.Exec();
        }
        while (functor1.Next());
    
        functor2.Begin();
        do
        {
            functor2.Exec();
        }
        while (functor2.Next());
    }
    
    // This is the implementation of the problem
    template <bool SIMD>
    class cTestFunctor
    {
    private:
        cStreamF32<SIMD> a;
        cStreamF32<SIMD> b;
        cStreamF32<SIMD> c;
    
    public:
        cTestFunctor() : a(1024), b(1024), c(1024) { }
    
        inline void Exec()
        {
            c = a + b * a;
        }
    
        inline void Begin()
        {
            a.Begin();
            b.Begin();
            c.Begin();
        }
    
        inline bool Next()
        {
            a.Next();
            b.Next();
            return c.Next();
        }
    };
    
    
    int main (int argc, char * const argv[]) 
    {
        cTestFunctor<true> functor1;
        cTestFunctor<false> functor2;
    
        Execute(functor1, functor2);
    
        return 0;
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have three buttons and need to save some data. I have a idea,
I need some advice as to how I easily can separate test runs for
I need some info on how to use margins and how exactly padding works.
I need some software to explore and modify some SQLite databases. Does anything similar
I need some help from the shell-script gurus out there. I have a .txt
We need some input on what is a good design pattern on using AJAX
I need some sort of interactive chart control for my .NET-based web app. I
I need some basic CMS functionality with rich text editing. On stack overflow there
I need some information about localization. I am using .net 2.0 with C# 2.0
I need some pointers on how to detect unknown hardware using .NET and C++/C#.

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.