Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7622245
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 31, 20262026-05-31T04:24:13+00:00 2026-05-31T04:24:13+00:00

I tried to compare the performance of inline assembly language and C++ code, so

  • 0

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here’s the code:

#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
    for(int i = 0; i < TIMES; i++)
    {
        for(int j = 0; j < length; j++)
            x[j] += y[j];
    }
}


void calcuAsm(int *x,int *y,int lengthOfArray)
{
    __asm
    {
        mov edi,TIMES
        start:
        mov esi,0
        mov ecx,lengthOfArray
        label:
        mov edx,x
        push edx
        mov eax,DWORD PTR [edx + esi*4]
        mov edx,y
        mov ebx,DWORD PTR [edx + esi*4]
        add eax,ebx
        pop edx
        mov [edx + esi*4],eax
        inc esi
        loop label
        dec edi
        cmp edi,0
        jnz start
    };
}

Here’s main():

int main() {
    bool errorOccured = false;
    setbuf(stdout,NULL);
    int *xC,*xAsm,*yC,*yAsm;
    xC = new int[2000];
    xAsm = new int[2000];
    yC = new int[2000];
    yAsm = new int[2000];
    for(int i = 0; i < 2000; i++)
    {
        xC[i] = 0;
        xAsm[i] = 0;
        yC[i] = i;
        yAsm[i] = i;
    }
    time_t start = clock();
    calcuC(xC,yC,2000);

    //    calcuAsm(xAsm,yAsm,2000);
    //    for(int i = 0; i < 2000; i++)
    //    {
    //        if(xC[i] != xAsm[i])
    //        {
    //            cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
    //            errorOccured = true;
    //            break;
    //        }
    //    }
    //    if(errorOccured)
    //        cout<<"Error occurs!"<<endl;
    //    else
    //        cout<<"Works fine!"<<endl;

    time_t end = clock();

    //    cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"\n";

    cout<<"time = "<<end - start<<endl;
    return 0;
}

Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.

And here comes the result.

Function of assembly version:

Debug   Release
---------------
732        668
733        680
659        672
667        675
684        694
Average:   677

Function of C++ version:

Debug     Release
-----------------
1068      168
 999      166
1072      231
1002      166
1114      183
Average:  182

The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?

I guess that the assembly code I wrote is not as effective as those generated by GCC. It’s hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-31T04:24:14+00:00Added an answer on May 31, 2026 at 4:24 am

    Yes, most times.

    First of all you start from wrong assumption that a low-level language (assembly in this case) will always produce faster code than high-level language (C++ and C in this case). It’s not true. Is C code always faster than Java code? No because there is another variable: programmer. The way you write code and knowledge of architecture details greatly influence performance (as you saw in this case).

    You can always produce an example where handmade assembly code is better than compiled code but usually it’s a fictional example or a single routine not a true program of 500.000+ lines of C++ code). I think compilers will produce better assembly code 95% times and sometimes, only some rare times, you may need to write assembly code for few, short, highly used, performance critical routines or when you have to access features your favorite high-level language does not expose. Do you want a touch of this complexity? Read this awesome answer here on SO.

    Why this?

    First of all because compilers can do optimizations that we can’t even imagine (see this short list) and they will do them in seconds (when we may need days).

    When you code in assembly you have to make well-defined functions with a well-defined call interface. However they can take in account whole-program optimization and inter-procedural optimization such
    as register allocation, constant propagation, common subexpression elimination, instruction scheduling and other complex, not obvious optimizations (Polytope model, for example). On RISC architecture guys stopped worrying about this many years ago (instruction scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.

    For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.

    Compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don’t use them you simply can’t compare (other answers already reviewed your assembly code very well).
    Just for loops this is a short list of loop optimizations of what is commonly checked for by a compiler (do you think you could do it by yourself when your schedule has been decided for a C# program?) If you write something in assembly, I think you have to consider at least some simple optimizations. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again.

    These days it’s also really uncommon to need to use assembly language for another reason: the plethora of different CPUs. Do you want to support them all? Each has a specific microarchitecture and some specific instruction sets. They have different number of functional units and assembly instructions should be arranged to keep them all busy. If you write in C you may use PGO but in assembly you will then need a great knowledge of that specific architecture (and rethink and redo everything for another architecture). For small tasks the compiler usually does it better, and for complex tasks usually the work isn’t repaid (and compiler may do better anyway).

    If you sit down and you take a look at your code probably you’ll see that you’ll gain more to redesign your algorithm than to translate to assembly (read this great post here on SO), there are high-level optimizations (and hints to compiler) you can effectively apply before you need to resort to assembly language. It’s probably worth to mention that often using intrinsics you will have performance gain your’re looking for and compiler will still be able to perform most of its optimizations.

    All this said, even when you can produce a 5~10 times faster assembly code, you should ask your customers if they prefer to pay one week of your time or to buy a 50$ faster CPU. Extreme optimization more often than not (and especially in LOB applications) is simply not required from most of us.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I tried to compare RedGate performance profile on two different machines and to my
I tried profiling the EIG function on MATLAB and NumPy to compare performance on
I am trying to compare two assembly files where one was written all caps
I just have tried to compare performance of lambda expressions in C++11, so I
I tried to compare the performance of a hash table lookup and linear search.
I tried to compare date and time of two timeStamp long values as follows.
Tried this: $('.link').click(function(e) { $.getScript('http://www.google.com/uds/api?file=uds.js&amp;v=1.0', function() { $('body').append('<p>GOOGLE API (UDS) is loaded</p>'); }); return
tried to research on that but sometimes I seem to lack some googling skills...
I'am trying to measure the performance of a computer vision program that tries to
How can I compare the performance of an Index on a table using the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.