Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8203091
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 7, 20262026-06-07T07:25:31+00:00 2026-06-07T07:25:31+00:00

Can the compiler make automatic use of SSE2 while optimisations are disabled? When optimisations

  • 0

Can the compiler make automatic use of SSE2 while optimisations are disabled?

When optimisations are disabled, does the /arch:SSE2 flag mean anything?

I’ve been given the task of squeezing more performance out of our software. Unfortunately, release builds are done using the debug settings, and attempts to argue for the case of optimisation have been unsuccessful so far.

Compiling for x86 with compiler flags /ZI /Od /arch:SSE2 /FAs. The generated assembly shows that the compiler is not making use of SSE2. Is this because optimisation is disabled?

In the code, there are a few situations similar to this:

char* begin = &bufferObject;
char* end   = begin + sizeof(bufferObject);
char  result;
while ( begin != end ) {
    result ^= *begin++;
}

I’d like to have the compiler vectorise this operation for me, but it doesn’t; I suspect optimisation needs to be enabled.

I hand-coded two solutions: one using an inline __asm block, and the other using the SSE2 intrinsicts defined in <emmintrin.h>. I’d prefer not to rely on this.

Update

Further to the questions above, I would like calls to library functions, like memcpy, to use the provided vectorised versions when appropriate. Looking at the assembly code for memcpy, I can see that there is a function called _VEC_memcpy which makes use of SSE2 for faster copying. The block which decides whether to branch to this routine or not is this:

    ; First, see if we can use a "fast" copy SSE2 routine
    ; block size greater than min threshold?
    cmp     ecx,080h
    jb      Dword_align
    ; SSE2 supported?
    cmp     DWORD PTR __sse2_available,0
    je      Dword_align
    ; alignments equal?
    push    edi
    push    esi
    and     edi,15
    and     esi,15
    cmp     edi,esi
    pop     esi
    pop     edi
    jne     Dword_align

    ; do fast SSE2 copy, params already set
    jmp     _VEC_memcpy

I don’t think that _VEC_memcpy is being called… ever.

Should the /arch:SSE2 flag be defining this __sse2_available symbol?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-07T07:25:33+00:00Added an answer on June 7, 2026 at 7:25 am

    Visual Studio 2010 and earlier has no support for automatic vectorization at all.

    The purpose of /arch:SSE2 is to allow the compiler to use scalar SSE for floating-point operations instead of the x87 FPU.

    So you may get some speedup with /arch:SSE2 since it allows you to access more registers on x64. But keep it mind that it is not from vectorization.

    If you want vectorization on VS2010, you pretty much have to do it manually with intrinsics.


    Visual Studio 2012 has support for auto-vectorization:

    http://msdn.microsoft.com/en-us/library/hh872235%28v=vs.110%29.aspx

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Can anyone recommend a Prolog compiler for .Net. The ones that I have been
In C# (or VB .NET), does the compiler make attempts to optimize property accesses?
How can I use configure and make tools to specify to use 64 bit
How can a C++ or Java compiler make sure that none of the member
I was reading through Automatic Dependency Generation in the make manual and I can't
How can I make sure setup.py compiles projects PO files and include them whenever
I have multimodule project. Can I make it somehow work that when calling compile
Where I can change compiler options for C# project in VisualStudio 2008 (without command
Can a C++ compiler produce a not so good binary? You can think here
Can any Flash compiler put specific scripts on specific frames of the Flash movie

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.