Is there a one-book for it all .. the sad part is I can hold a superficial conversation about all these things. I’ve gone to Uni, and got A’s in all these subjects, yet I frigging don’t understand how a stack or memory really looks like.
I don’t “get” what a thread really is. How a CPU cache line works, and how it gets invalidated with read/write barriers. Things like TLB, etc.
Any book or maybe a small collection of books to read will really help.
I’m assuming you’ve read “Computer Architecture” by Hennessy & Patterson. But that may not answer your questions. Personally, although I’m an expert in computer architecture, I didn’t learn it from any one place. In fact, I learned a lot from reading Ars Technica and Phoronix articles about every new architecture that came out in the past decade or so.
As for what they REALLY look like, you’ll need to learn chip design. There are two viewpoints you’ll want to explore. One is a CAD-like perspective, where you do schematic capture. You lay out and connect logic blocks together to form digital circuits. The physical layout you make will correspond roughly to the layout you get in hardware. The other angle is to learn to code in a hardware description language like Verilog, although that’s rather abstract, and it requires a lot of intuition about the hardware to relate what you’re coding to how it’s going to turn into hardware.
I googled for images of “static ram structure”, and I found lots of interesting pages that demonstrate how memories work. There are some good images here “http://www.iis.ee.ethz.ch/~kgf/aries/5.html” for instance. You can get into dynamic RAMs later. A static RAM is a rectangular array of 6-transistor blocks. See “http://lwn.net/Articles/250967/” and specifically “http://lwn.net/images/cpumemory/cpumemory.7.png”. Also, “http://www.freepatentsonline.com/7095663-0-large.jpg”. Four of the transistors form two back-to-back inverters, holding a bit value. Two allow access to the signal lines between the inverters, allowing you to coerce them into a different state. These 6T cells are arranged in large rectangular arrays. To read a row, a decoder circuit translates an address into a single signal and asserts that row’s word line, which activates the access transistors, connecting each cell in that row to its bit lines. The two bit lines for each column hold opposite values, which are interpreted by differential sense amplifiers, and you’re read out a row. To write, you do the same but force the bit lines to the correct values.
A stack is just memory addressed in a particular way. Even in specially-dedicated stack structures in chips, they’re just memory blocks, along with a logic block that increments and decrements an address appropriately.
A cache is another generic memory array, associate with a tag array, which is a particular kind of content addressable memory. A TLB is a special kind of cache. Spending some time googling, you can learn all about these things. The hurdle you have to get over is knowing what search terms to use. I’m happy to help with that.