Possible Duplicate:
When does Endianness become a factor?
reading this tuto on endianess, i fall on this example where endianess does matter. It is about writting a char* filled with 1 and 0. it can then be converted to a short, and results depends on endianess, little or big. Here is the example, quoted.
unsigned char endian[2] = {1, 0};
short x;x = *(short *) endian;What would be the value of x? Let’s look at what this code is doing.
You’re creating an array of two bytes, and then casting that array of
two bytes into a single short. By using an array, you basically forced
a certain byte order, and you’re going to see how the system treats
those two bytes. If this is a little-endian system, the 0 and 1 is
interpreted backwards and seen as if it is 0,1. Since the high byte is
0, it doesn’t matter and the low byte is 1, so x is equal to 1. On the
other hand, if it’s a big-endian system, the high byte is 1 and the
value of x is 256.
i wonder: when you are instantiating an array with a given number of memory bytes allocation (here, two bytes), how can conversion be done to any type (short, int…) as long as the array has been allocated the number of bytes corresponding to this byte? if not enough memory has been allocated to ‘contain this type’, will the next memory address still be read ? for instance if i want to cast endian to a long, will this be performed, reading four bytes from the beginning of endian, or will this fail ?
Then, a question on endianess: this is a characteristic of processor regarding habits to write bytes in memory with most significative byte at lowest memory location (big endian)or at highest memory location (little endian). in this case, an array with two one-byte element has been allocated. why is it that 1 is said the most significative byte ?
Don’t forget that the compiler will only write assembly code. If you ignore all the warnings that the compiler, you can examine the assembly code produced by the compiler and figure out what really happens.
I took this simple program:
and I extracted this code using
objdump. Let’s decipher it.These are lines are just the prologue of the function, ignore them.
From those 2 lines, you can see that (0x14)%esp is initialized with 0. So you know that the array
endianis on the stack, at the address in the register %ESP (stack pointer) + 0x14.LEA is just an arithmetic operation. EAX now contains %ESP+0x14, which is the address of the array on the stack.
And at the address ESP + 0x1c (which is the location of the variable
casted_endian) we put EAX, so the address of first byte of endian.Then we prepare the call to operator << with the relevant argument without any more checks. So that’s it, the program won’t make any more checks. The type of the variable is completely irrelevant to the machine.
Now two things can happen when
operator<<will read the part of*casted_endianthat are not in the array.Either its address is in a memory page that is currently mapped, or it is not. In the first case,
operator<<will read whatever is at that address without complaining. This will probably write on screen something weird. In the second case, your OS will complain about the program trying to read something that he can’t read, and provoke an interruption. This is the famous segmentation fault.