Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5939905
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T15:53:37+00:00 2026-05-22T15:53:37+00:00

In 32bit, we had 8 general purpose registers. With 64bit, the amount doubles, but

  • 0

In 32bit, we had 8 “general purpose” registers. With 64bit, the amount doubles, but it seems independent of the 64bit change itself.
Now, if registers are so fast (no memory access), why aren’t there more of them naturally? Shouldn’t CPU builders work as many registers as possible into the CPU? What is the logical restriction to why we only have the amount we have?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T15:53:37+00:00Added an answer on May 22, 2026 at 3:53 pm

    There’s many reasons you don’t just have a huge number of registers:

    • They’re highly linked to most pipeline stages. For starters, you need to track their lifetime, and forward results back to previous stages. The complexity gets intractable very quickly, and the number of wires (literally) involved grows at the same rate. It’s expensive on area, which ultimately means it’s expensive on power, price and performance after a certain point.
    • It takes up instruction encoding space. 16 registers takes up 4 bits for source and destination, and another 4 if you have 3-operand instructions (e.g ARM). That’s an awful lot of instruction set encoding space taken up just to specify the register. This eventually impacts decoding, code size and again complexity.
    • There’s better ways to achieve the same result…

    These days we really do have lots of registers – they’re just not explicitly programmed. We have “register renaming”. While you only access a small set (8-32 registers), they’re actually backed by a much larger set (e.g 64-256). The CPU then tracks the visibility of each register, and allocates them to the renamed set. For example, you can load, modify, then store to a register many times in a row, and have each of these operations actually performed independently depending on cache misses etc. In ARM:

    ldr r0, [r4]
    add r0, r0, #1
    str r0, [r4]
    ldr r0, [r5]
    add r0, r0, #1
    str r0, [r5]
    

    Cortex A9 cores do register renaming, so the first load to “r0” actually goes to a renamed virtual register – let’s call it “v0”. The load, increment and store happen on “v0”. Meanwhile, we also perform a load/modify/store to r0 again, but that’ll get renamed to “v1” because this is an entirely independent sequence using r0. Let’s say the load from the pointer in “r4” stalled due to a cache miss. That’s ok – we don’t need to wait for “r0” to be ready. Because it’s renamed, we can run the next sequence with “v1” (also mapped to r0) – and perhaps that’s a cache hit and we just had a huge performance win.

    ldr v0, [v2]
    add v0, v0, #1
    str v0, [v2]
    ldr v1, [v3]
    add v1, v1, #1
    str v1, [v3]
    

    I think x86 is up to a gigantic number of renamed registers these days (ballpark 256). That would mean having 8 bits times 2 for every instruction just to say what the source and destination is. It would massively increase the number of wires needed across the core, and its size. So there’s a sweet spot around 16-32 registers which most designers have settled for, and for out-of-order CPU designs, register renaming is the way to mitigate it.

    Edit: The importance of out-of-order execution and register renaming on this. Once you have OOO, the number of registers doesn’t matter so much, because they’re just “temporary tags” and get renamed to the much larger virtual register set. You don’t want the number to be too small, because it gets difficult to write small code sequences. This is a problem for x86-32, because the limited 8 registers means a lot of temporaries end up going through the stack, and the core needs extra logic to forward reads/writes to memory. If you don’t have OOO, you’re usually talking about a small core, in which case a large register set is a poor cost/performance benefit.

    So there’s a natural sweet spot for register bank size which maxes out at about 32 architected registers for most classes of CPU. x86-32 has 8 registers and it’s definitely too small. ARM went with 16 registers and it’s a good compromise. 32 registers is slightly too many if anything – you end up not needing the last 10 or so.

    None of this touches on the extra registers you get for SSE and other vector floating point coprocessors. Those make sense as an extra set because they run independently of the integer core, and don’t grow the CPU’s complexity exponentially.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

In xp 32bit this line compiles with not problem however in vista 64bit this
Porting code from 32bit to 64bit. Lots of places with int len = strlen(pstr);
When trying to install my 32bit service on a Windows 64bit machine my installer
Accidentally I mixed up between 32Bit and 64Bit library files on my x86 embedded
I just upgraded to Vista 64bit. I was running the 32bit version without any
Possible Duplicate: Can I install Python 2.7.1 64bit along side of an exsiting 32bit
HI people. I had SQL Server 2005 on Windows XP 32bit and just used
I had no issue with Django + uWSGI + Cherokee. BUt I would like
I have recently migrated from a 32bit environment to a 64bit one, and it
I recently had a need to interpret a DEC 32-bit floating point representation. It

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.