I’m doing something where I have to interpret the values of raw memory addresses, and I’m trying to find how, essentially, larger than int integral types are stored.
For the sake of this argument, let’s assume a 32 bit architecture, where register size is 32 bits, an int is 32 bits, and a long is 64 bits. Is there some rule of thumb or guideline about how the two register values are aligned?
Let’s just get the “it depends on the architecture / OS / language” arguments out of the way. I’m aware that nothing in this area is remotely portable or standardized (indeed, intentionally not specified in standards). I’m just curious if there’s a general pattern here. I won’t shoot anyone, any answer may erase my hard drive, standard caveats. I’m just curious.
I’m familiar enough with e.g. struct packing and how the compiler will (usually) ensure that types are aligned correctly when they take less space than fits the address boundary – but what happens when they are larger than address boundaries?
I can see a couple of reasonable layouts:
Assuming we’re using address 0x10 and 0x11 to store a long of value, I dunno, 99. I’ll make up some terminology here: lowint is the least significant half of the long, and highint is the most significant half.
If the layout is 0x10 => lowint, 0x11 => highint, then a read of 0x10 would yield a consistent value if the type were either an int or long and doesn’t overflow MAX_INT, which is a nice feature – 0x10 is “99” in either case. We also take up two addresses growing upwards of our base address. On the other hand, the low and high parts seem “reversed”.
If the layout is 0x10 => highint, 0x11 => lowint then the high and low parts are in order, but a read of 0x10 will yield “99” if it were an int and “0” if it were a long – so some investigation into the following address 0x11 is needed to assume that we’re dealing with one long, and not, say, two ints. We still grow upwards.
If the layout is 0x10 => lowint, 0x0F => highint then we’ve grown backwards (i.e. taken up a strange previous address to get our second int of storage), but our ints are in order and reads of 0x10 are consistent across data types. This seems the least likely to me, as it would totally screw with caches and data line reads, but still, it’s another option.
So what is most likely? My gut tells me that option 2 – in-order high and low parts with two ints storage space – would be the most convenient for the hardware, but I’m not positive that this is true. Anyone out there with some anecdotal evidence for any layout?
What you are asking here is referring to endianess. It turns out that big-endian vs little-endian vs mixed-endian is a major issue in computer architecture. The first one you mentions (ie 0x10 as lowint) is known as little endian and the second one is big endian.
We do not need to confuse ourselves on the direction in which things are loaded If we load 16-bits from 0x10 then we will get two bytes from 0x10 and 0x11 no matter the endianess scheme. Pointers rarely point downwards, so your last example, in my experience, never occurrs.
Registers do matter because the way the processor interprets the high and low bytes (endianess) determines how it is stored in memory. In portable software that depends on specific data structures which are moved across architectures it is necessary to reverse the byte order when the endianess of the processor is different with that of the data structure it is manipulating.