Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6088389
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T11:58:45+00:00 2026-05-23T11:58:45+00:00

I am writing a compiler (more for fun than anything else), but I want

  • 0

I am writing a compiler (more for fun than anything else), but I want to try to make it as efficient as possible. For example I was told that on Intel architecture the use of any register other than EAX for performing math incurs a cost (presumably because it swaps into EAX to do the actual piece of math). Here is at least one source that states the possibility (http://www.swansontec.com/sregisters.html).

I would like to verify and measure these differences in performance characteristics. Thus, I have written this program in C++:

#include "stdafx.h"
#include <intrin.h>
#include <iostream>

using namespace std;

int _tmain(int argc, _TCHAR* argv[])
{
    __int64 startval;
    __int64 stopval;
    unsigned int value; // Keep the value to keep from it being optomized out

    startval = __rdtsc(); // Get the CPU Tick Counter using assembly RDTSC opcode

    // Simple Math: a = (a << 3) + 0x0054E9
    _asm {
        mov ebx, 0x1E532 // Seed
        shl ebx, 3
        add ebx, 0x0054E9
        mov value, ebx
    }

    stopval = __rdtsc();
    __int64 val = (stopval - startval);
    cout << "Result: " << value << " -> " << val << endl;

    int i;
    cin >> i;

    return 0;
}

I tried this code swapping eax and ebx but I’m not getting a “stable” number. I would hope that the test would be deterministic (the same number every time) because it’s so short that it’s unlikely a context switch is occurring during the test. As it stands there is no statistical difference but the number fluctuates so wildly that it would be impossible to make that determination. Even if I take a large number of samples the number is still impossibly varied.

I’d also like to test xor eax, eax vs mov eax, 0, but have the same problem.

Is there any way to do these kinds of performance tests on Windows (or anywhere else)? When I used to program Z80 for my TI-Calc I had a tool where I could select some assembly and it would tell me how many clock cycles to execute the code — can that not be done with our new-fangeled modern processors?

EDIT: There are a lot of answers indicating to run the loop a million times. To clarify, this actually makes things worse. The CPU is much more likely to context switch and the test becomes about everything but what I am testing.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T11:58:45+00:00Added an answer on May 23, 2026 at 11:58 am

    To even have a hope of repeatable, determinstic timing at the level that RDTSC gives, you need to take some extra steps. First, RDTSC is not a serializing instruction, so it can be executed out of order, which will usually render it meaningless in a snippet like the one above.

    You normally want to use a serializing instruction, then your RDTSC, then the code in question, another serializing instruction, and the second RDTSC.

    Nearly the only serializing instruction available in user mode is CPUID. That, however, adds one more minor wrinkle: CPUID is documented by Intel as requiring varying amounts of time to execute — the first couple of executions can be slower than others.

    As such, the normal timing sequence for your code would be something like this:

    XOR EAX, EAX
    CPUID
    XOR EAX, EAX
    CPUID
    XOR EAX, EAX
    CPUID            ; Intel says by the third execution, the timing will be stable.
    RDTSC            ; read the clock
    push eax         ; save the start time
    push edx
    
        mov ebx, 0x1E532 // Seed // execute test sequence
        shl ebx, 3
        add ebx, 0x0054E9
        mov value, ebx
    
    XOR EAX, EAX      ; serialize
    CPUID   
    rdtsc             ; get end time
    pop ecx           ; get start time back
    pop ebp
    sub eax, ebp      ; find end-start
    sbb edx, ecx
    

    We’re starting to get close, but there’s on last point that’s difficult to deal with using inline code on most compilers: there can also be some effects from crossing cache lines, so you normally want to force your code to be aligned to a 16-byte (paragraph) boundary. Any decent assembler will support that, but inline assembly in a compiler usually won’t.

    Having said all that, I think you’re wasting your time. As you can guess, I’ve done a fair amount of timing at this level, and I’m quite certain what you’ve heard is an outright myth. In reality, all recent x86 CPUs use a set of what are called “rename registers”. To make a long story short, this means the name you use for a register doesn’t really matter much — the CPU has a much larger set of registers (e.g., around 40 for Intel) that it uses for the actual operations, so your putting a value in EBX vs. EAX has little effect on the register that the CPU is really going to use internally. Either could be mapped to any rename register, depending primarily on which rename registers happen to be free when that instruction sequence starts.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Possible Duplicate: Custom Compiler Warnings Duplicate: Custom Compiler Warnings I'm new to writing my
I'm writing a virtual machine in C just for fun. Lame, I know, but
I'm writing my own compiler for the fun of it xD. It's compiling to
I'm implementing a simplistic JIT compiler in a VM I'm writing for fun (mostly
Is there a way to make Java Exceptions more informative? For example, take this
Possible Duplicate: GNU compiler warning “class has virtual functions but non-virtual destructor” I am
I'm currently writing an iPhone application that uses a UITabBarController with more than 5
I'm writing some C++ codes for fun and practice, to learn more about language
I'm writing a compiler which uses C as an intermediate code which is (Currently)
I'm writing a compiler in C and need to get the ASCII value of

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.