Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 333405
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 12, 20262026-05-12T09:57:30+00:00 2026-05-12T09:57:30+00:00

Summary: I want to take advantage of compiler optimizations and processor instruction sets, but

  • 0

Summary: I want to take advantage of compiler optimizations and processor instruction sets, but still have a portable application (running on different processors). Normally I could indeed compile 5 times and let the user choose the right one to run.

My question is: how can I can automate this, so that the processor is detected at runtime and the right executable is executed without the user having to chose it?


I have an application with a lot of low level math calculations. These calculations will typically run for a long time.

I would like to take advantage of as much optimization as possible, preferably also of (not always supported) instruction sets. On the other hand I would like my application to be portable and easy to use (so I would not like to compile 5 different versions and let the user choose).

Is there a possibility to compile 5 different versions of my code and run dynamically the most optimized version that’s possible at execution time? With 5 different versions I mean with different instruction sets and different optimizations for processors.

I don’t care about the size of the application.

At this moment I’m using gcc on Linux (my code is in C++), but I’m also interested in this for the Intel compiler and for the MinGW compiler for compilation to Windows.

The executable doesn’t have to be able to run on different OS’es, but ideally there would be something possible with automatically selecting 32 bit and 64 bit as well.

Edit: Please give clear pointers how to do it, preferably with small code examples or links to explanations. From my point of view I need a super generic solution, which is applicable on any random C++ project I have later.

Edit I assigned the bounty to ShuggyCoUk, he had a great number of pointers to look out for. I would have liked to split it between multiple answers but that is not possible. I’m not having this implemented yet, so the question is still ‘open’! Please, still add and/or improve answers, even though there is no bounty to be given anymore.

Thanks everybody!

  • 1 1 Answer
  • 3 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-12T09:57:30+00:00Added an answer on May 12, 2026 at 9:57 am

    If you wish this to cleanly work on Windows and take full advantage in 64bit capable platforms of the additional 1. Addressing space and 2. registers (likely of more use to you) you must have at a minimum a separate process for the 64bit ones.

    You can achieve this by having a separate executable with the relevant PE64 header. Simply using CreateProcess will launch this as the relevant bitness (unless the executable launched is in some redirected location there is no need to worry about WoW64 folder redirection

    Given this limitation on windows it is likely that simply ‘chaining along’ to the relevant executable will be the simplest option for all different options, as well as making testing an individual one simpler.

    It also means you ‘main’ executable is free to be totally separate depending on the target operating system (as detecting the cpu/OS capabilities is, by it’s nature, very OS specific) and then do most of the rest of your code as shared objects/dlls.
    Also you can ‘share’ the same files for two different architectures if you currently do not feel that there is any point using the differing capabilities.

    I would suggest that the main executable is capable of being forced into making a specific choice so you can see what happens with ‘lesser’ versions on a more capable machine (or what errors come up if you try something different).

    Other possibilities given this model are:

    • Statically linking to different versions of the standard runtimes (for ones with/without thread safety) and using them appropriately if you are running without any SMP/SMT capabilities.
    • Detect if multiple cores are present and whether they are real or hyper threading (also whether the OS knows how the schedule effectively in those cases)
    • checking the performance of things like the system timer/high performance timers and using code optimized to this behaviour, say if you do anything where you look for a certain amount of time to expire and thus can know your best possible granularity.
    • If you wish to optimize you choice of code based on cache sizing/other load on the box. If you are using unrolled loops then more aggressive unrolling options may depend on having a certain amount level 1/2 cache.
    • Compiling conditionally to use doubles/floats depending on the architecture. Less important on intel hardware but if you are targetting certain ARM cpu’s some have actual floating point hardware support and others require emulation. The optimal code would change heavily, even to the extent you just use conditional compilation rather than using the optimizing compiler(1).
    • Making use of co-processor hardware like CUDA capable graphics cards.
    • detect virtualization and alter behaviour (perhaps trying to avoid file system writes)

    As to doing this check you have a few options, the most useful one on Intel being the the cpuid instruction.

    • Windows
      • Use someone else’s implementation but you’ll have to pay
      • Use a free open source one
    • Linux
      • Use the built in one
      • You could also look at open source software doing the same thing
      • Pixman does a fair amount of this and is a permissive licence.

    Alternatively re-implement/update an existing one using available documentation on the features you need.

    Quite a lot of separate documents to work out how to detect things:

    • Intel:
      • SSE 4.1/4.2
      • SSE3
      • MMX

    A large part of what you would be paying for in the CPU-Z library is someone doing all this (and the nasty little issues involved) for you.


    1. be careful with this – it is hard to beat decent optimizing compilers on this
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Summary: I want the prettiness of Silverlight/WPF in part of my current Winforms application.
I have an existing application that I'm supposed to take and create a mini
BRIEF SUMMARY OF WHAT I WANT: If I take a file name as argument
Summary I want to be able to call a JavaScript function from a Flex
This is what i want to do: I want /summary.php to include 5 latest
I want to generate a summary of the files that are in one tree
Data table structure is: id1,id2,id3,id4,... (some other fields). I want to create summary query
Summary: I'm developing a persistent Java web application, and I need to make sure
I have a bunch of measurements over time and I want to plot them
I want to change the title and summary of my website on google search

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.