Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6671025
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T03:20:18+00:00 2026-05-26T03:20:18+00:00

I have some classes implementing some computations which I have to optimize for different

  • 0

I have some classes implementing some computations which I have
to optimize for different SIMD implementations e.g. Altivec and
SSE. I don’t want to polute the code with #ifdef ... #endif blocks
for each method I have to optimize so I tried a couple of other
approaches, but unfotunately I’m not very satisfied of how it turned
out for reasons I’ll try to clarify. So I’m looking for some advice
on how I could improve what I have already done.

1.Different implementation files with crude includes

I have the same header file describing the class interface with different
“pseudo” implementation files for plain C++, Altivec and SSE only for the
relevant methods:

// Algo.h
#ifndef ALGO_H_INCLUDED_
#define ALGO_H_INCLUDED_
class Algo
{
public:
    Algo();
    ~Algo();

    void process();
protected:
    void computeSome();
    void computeMore();
};
#endif

// Algo.cpp
#include "Algo.h"
Algo::Algo() { }

Algo::~Algo() { }

void Algo::process()
{
    computeSome();
    computeMore();
}

#if defined(ALTIVEC)
#include "Algo_Altivec.cpp" 
#elif defined(SSE)
#include "Algo_SSE.cpp"
#else
#include "Algo_Scalar.cpp"
#endif

// Algo_Altivec.cpp
void Algo::computeSome()
{
}
void Algo::computeMore()
{
}
... same for the other implementation files

Pros:

  • the split is quite straightforward and easy to do
  • there is no “overhead”(don’t know how to say it better) to objects of my class
    by which I mean no extra inheritance, no addition of member variables etc.
  • much cleaner than #ifdef-ing all over the place

Cons:

  • I have three additional files for maintenance; I could put the Scalar
    implementation in the Algo.cpp file though and end up with just two but the
    inclusion part will look and fell a bit dirtier
  • they are not compilable units per-se and have to be excluded from the
    project structure
  • if I do not have the specific optimized implementation yet for let’s say
    SSE I would have to duplicate some code from the plain(Scalar) C++ implementation file
  • I cannot fallback to the plain C++ implementation if nedded; ? is it even possible
    to do that in the described scenario ?
  • I do not feel any structural cohesion in the approach

2.Different implementation files with private inheritance

// Algo.h
class Algo : private AlgoImpl
{
 ... as before
}

// AlgoImpl.h
#ifndef ALGOIMPL_H_INCLUDED_
#define ALGOIMPL_H_INCLUDED_
class AlgoImpl
{
protected:
    AlgoImpl();
    ~AlgoImpl();

   void computeSomeImpl();
   void computeMoreImpl();
};
#endif

// Algo.cpp
...
void Algo::computeSome()
{
    computeSomeImpl();
}
void Algo::computeMore()
{
    computeMoreImpl();
}

// Algo_SSE.cpp
AlgoImpl::AlgoImpl()
{
}
AlgoImpl::~AlgoImpl()
{
}
void AlgoImpl::computeSomeImpl()
{
}
void AlgoImpl::computeMoreImpl()
{
}

Pros:

  • the split is quite straightforward and easy to do
  • much cleaner than #ifdef-ing all over the place
  • still there is no “overhead” to my class – EBCO should kick in
  • the semantic of the class is much more cleaner at least comparing to the above
    that is private inheritance == is implemented in terms of
  • the different files are compilable, can be included in the project
    and selected via the build system

Cons:

  • I have three additional files for maintenance
  • if I do not have the specific optimized implementation yet for let’s say
    SSE I would have to duplicate some code from the plain(Scalar) C++ implementation file
  • I cannot fallback to the plain C++ implementation if nedded

3.Is basically method 2 but with virtual functions in the AlgoImpl class. That
would allow me to overcome the duplicate implementation of plain C++ code if needed
by providing an empty implementation in the base class and override in the derived
although I will have to disable that behavior when I actually implement the optimized
version. Also the virtual functions will bring some “overhead” to objects of my class.

4.A form of tag dispatching via enable_if<>

Pros:

  • the split is quite straightforward and easy to do
  • much cleaner than #ifdef ing all over the place
  • still there is no “overhead” to my class
  • will eliminate the need for different files for different implementations

Cons:

  • templates will be a bit more “cryptic” and seem to bring an unnecessary
    overhead(at least for some people in some contexts)
  • if I do not have the specific optimized implementation yet for let’s say
    SSE I would have to duplicate some code from the plain(Scalar) C++ implementation
  • I cannot fallback to the plain C++ implementation if needed

What I couldn’t figure out yet for any of the variants is how to properly and
cleanly fallback to the plain C++ implementation.

Also I don’t want to over-engineer things and in that respect the first variant
seems the most “KISS” like even considering the disadvantages.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T03:20:18+00:00Added an answer on May 26, 2026 at 3:20 am

    As requested in the comments, here’s a summary of what I did:

    Set up policy_list helper template utility

    This maintains a list of policies, and gives them a “runtime check” call before calling the first suitable implementaiton

    #include <cassert>
    
    template <typename P, typename N=void>
    struct policy_list {
      static void apply() {
        if (P::runtime_check()) {
          P::impl();
        }
        else {
          N::apply();
        }
      }
    };
    
    template <typename P>
    struct policy_list<P,void> {
      static void apply() {
        assert(P::runtime_check());
        P::impl();
      }
    };
    

    Set up specific policies

    These policies implement a both a runtime test and an actual implementation of the algorithm in question. For my actual problem impl took another template parameter that specified what exactly it was they were implementing, here though the example assumes there is only one thing to be implemented. The runtime tests are cached in a static bool for some (e.g. the Altivec one I used) the test was really slow. For others (e.g. the OpenCL one) the test is actually “is this function pointer NULL?” after one attempt at setting it with dlsym().

    #include <iostream>
    
    // runtime SSE detection (That's another question!)
    extern bool have_sse();
    
    struct sse_policy {
      static void impl() {
        std::cout << "SSE" << std::endl;
      }
    
      static bool runtime_check() {
        static bool result = have_sse();
        // have_sse lives in another TU and does some cpuid asm stuff
        return result;
      }
    };
    
    // Runtime OpenCL detection
    extern bool have_opencl();
    
    struct opencl_policy {
      static void impl() {
        std::cout << "OpenCL" << std::endl;
      }
    
      static bool runtime_check() {
        static bool result = have_opencl();
        // have_opencl lives in another TU and does some LoadLibrary or dlopen()
        return result;
      }
    };
    
    struct basic_policy {
      static void impl() {
        std::cout << "Standard C++ policy" << std::endl;
      }
    
      static bool runtime_check() { return true; } // All implementations do this
    };
    

    Set per architecture policy_list

    Trivial example sets one of two possible lists based on ARCH_HAS_SSE preprocessor macro. You might generate this from your build script, or use a series of typedefs, or hack support for “holes” in the policy_list that might be void on some architectures skipping straight to the next one, without trying to check for support. GCC sets some preprocessor macors for you that might help, e.g. __SSE2__.

    #ifdef ARCH_HAS_SSE
    typedef policy_list<opencl_policy,
            policy_list<sse_policy,
            policy_list<basic_policy
                        > > > active_policy;
    #else
    typedef policy_list<opencl_policy,
            policy_list<basic_policy
                        > > active_policy;
    #endif
    

    You can use this to compile multiple variants on the same platform too, e.g. and SSE and no-SSE binary on x86.

    Use the policy list

    Fairly straightforward, call the apply() static method on the policy_list. Trust that it will call the impl() method on the first policy that passes the runtime test.

    int main() {
      active_policy::apply();
    }
    

    If you take the “per operation template” approach I mentioned earlier it might be something more like:

    int main() {
      Matrix m1, m2;
      Vector v1;
    
      active_policy::apply<matrix_mult_t>(m1, m2);
      active_policy::apply<vector_mult_t>(m1, v1);
    }
    

    In that case you end up making your Matrix and Vector types aware of the policy_list in order that they can decide how/where to store the data. You can also use heuristics for this too, e.g. “small vector/matrix lives in main memory no matter what” and make the runtime_check() or another function test the appropriateness of a particular approach to a given implementation for a specific instance.

    I also had a custom allocator for containers, which produced suitably aligned memory always on any SSE/Altivec enabled build, regardless of if the specific machine had support for Altivec. It was just easier that way, although it could be a typedef in a given policy and you always assume that the highest priority policy has the strictest allocator needs.

    Example have_altivec():

    I’ve included a sample have_altivec() implementation for completeness, simply because it’s the shortest and therefore most appropriate for posting here. The x86/x86_64 CPUID one is messy because you have to support the compiler specific ways of writing inline ASM. The OpenCL one is messy because we check some of the implementation limits and extensions too.

    #if HAVE_SETJMP && !(defined(__APPLE__) && defined(__MACH__))
    jmp_buf jmpbuf;
    
    void illegal_instruction(int sig) {
       // Bad in general - https://www.securecoding.cert.org/confluence/display/seccode/SIG32-C.+Do+not+call+longjmp%28%29+from+inside+a+signal+handler
       // But actually Ok on this platform in this scenario
       longjmp(jmpbuf, 1);
    }
    #endif
    
    bool have_altivec()
    {
        volatile sig_atomic_t altivec = 0;
    #ifdef __APPLE__
        int selectors[2] = { CTL_HW, HW_VECTORUNIT };
        int hasVectorUnit = 0;
        size_t length = sizeof(hasVectorUnit);
        int error = sysctl(selectors, 2, &hasVectorUnit, &length, NULL, 0);
        if (0 == error)
            altivec = (hasVectorUnit != 0);
    #elif HAVE_SETJMP_H
        void (*handler) (int sig);
        handler = signal(SIGILL, illegal_instruction);
        if (setjmp(jmpbuf) == 0) {
            asm volatile ("mtspr 256, %0\n\t" "vand %%v0, %%v0, %%v0"::"r" (-1));
            altivec = 1;
        }
        signal(SIGILL, handler);
    #endif
    
        return altivec;
    }
    

    Conclusion

    Basically you pay no penalty for platforms that can never support an implementation (the compiler generates no code for them) and only a small penalty (potentially just a very predictable by the CPU test/jmp pair if your compiler is half-decent at optimising) for platforms that could support something but don’t. You pay no extra cost for platforms that the first choice implementation runs on. The details of the runtime tests vary between the technology in question.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some classes which extends a superclass, and in the JSP I want
I have some classes in my app that don't require an ID to be
If I have some php classes inside a namespace com\test and want to import
I have: - an interface : IMyType - some classes implementing it : MyType1
I'm implementing a DAL using the Entity Framework. We have some DAL classes (I
I have some classes layed out like this class A { public virtual void
I have some classes that will do something based on some conditions . The
I have some classes that represent immutable objects (Quantity, Price, Probability). Is there some
I have some classes that, for one reason or another, cannot be or need
Let's say I have some classes like this: abstract class View(val writer: XMLStreamWriter) {

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.