Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8351947
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T08:48:21+00:00 2026-06-09T08:48:21+00:00

I am learning assembly and making some inlining in my Digital Mars C++ compiler.

  • 0

I am learning assembly and making some inlining in my Digital Mars C++ compiler. I searched some things to make a program better and had these parameters to tune the programs:

use better C++ compiler//thinking of GCC or intel compiler

use assembly only in critical part of program 

find better algorithm

Cache miss, cache contention.

Loop-carried dependency chain.

Instruction fetching time.

Instruction decoding time.

Instruction retirement.

Register read stalls.

Execution port throughput.

Execution unit throughput.

Suboptimal reordering and scheduling of micro-ops.

Branch misprediction.

Floating point exception.

I understood all except “register read stalls”.

Question: Can anybody tell me how is this happening in CPU and the “superscalar” form of the “out of order execution”?
Normal “out of order” seemed logical but i couldnt find a logical explanation of “superscalar” form.

Question 2: Can someone alse give some good instruction list of SSE SSE2 and newer CPU’s prefarably with micro-ops table, port throughputs, units and some calculation table for the latencies to find the real bottle-neck of a piece of code?

I would be happy with a small example like this:

//loop carried dependency chain breaking:
__asm
{
loop_begin:
....
.... 
sub edx,05h //rather than taking i*5 in each iteration, we sub 5 each iteration
sub ecx,01h //i-- counter
...
...
jnz loop_begin//edit: sub ecx must have been after the sub edx for jnz
}
//while sub edx makes us get rid of a multiplication also makes that independent of ecx, making independent

Thank you.

Computer: Pentium-M 2GHz , Windows XP-32 bit

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T08:48:23+00:00Added an answer on June 9, 2026 at 8:48 am

    You should take a look at Agner Fogs optimization manuals: Optimizing software in C++: An optimization guide for Windows, Linux and Mac platforms or Optimizing subroutines in assembly language: An optimization guide for x86 platforms.

    But to really be able to outsmart a modern compiler, you need some good background knowledge of the arch you want to optimize for: The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm currently learning Windows/DOS assembly. I'm just making a small program that adds two
i just started learning assembly and making some custom loop for swapping two variables
I am learning assembly language in my spare time to become a better developer.
I'm a beginner learning some assembly, when preserving the ESP register before a function
I'm write a simple graphic-based program in Assembly for learning purpose; for this, I
I've been learning compiler theory and assembly and have managed to create a compiler
Where could one start learning assembly language from? Could you suggest some place that
Hi I just started learning assembly in IA32. Can anyone tell me what these
I've just begun learning some x86 assembly on win32, and I've used masm with
I am learning assembly programming using 8086 emu (Its a software program . Emulator

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.