How do you compute the execution time of instructions? Is it just done by checking what the chip manufacturers say in terms of how many clock cycles an action may take to complete? Is there anything else i should know about this? Feels like i’m missing something….
Share
The RDTSC instruction is extremely accurate as far as I know.
I think if you are seeking the exact cycle counts, then in the case of short boostable sections you may run into the issues of simultaneity that Mysticial mentioned…
But if ultra-ultra-ultra-ultra-precision is not an obstacle… that is to say, if you can survive knowing that for certain scenarios your result is off by… I dunno… say 9 to 80 cycles… then I’m pretty sure you can still get very accurate results with RDTSC… especially when one considers that 9 to 80 divided by 3.2 billion is a very tiny number ๐
The numbers 9 and 80 were chosen a bit arbitrarily (and maybe you aren’t on a 3.2ghz cpu speed either) since I dunno exactly what the error amount is… but I’m pretty sure its in that ballpark ๐
Here’s the RDTSC excerpt of a timer function I use:
actually I’ll go ahead and post the whole thing… this code assumes that the type “double” is a 64-bit floating point number… which might not be a universal compiler / architecture assumption:
You wanna call Set_AbsoluteTime somewhere before using the Get functions… without the first initial call to Set, the Gets will return erroneous results… but once that onetime call is made you are good to go…
here’s an example:
if for some reason you wanted time measurements to flow backwards at half-speed (maybe handy for game-programming), the initial call would be:
Set_AbsoluteTime (0.000, -0.500);
the first parameter to Set is the base time that gets added to all results
I’m pretty sure these functions are more accurate than the most high-rez Windows API timers that currently publicly exist… I think on fast processors they have an error smaller than 1 nanosecond but I’m not 100% sure on that ๐
they are accurate enough for my purposes, but do note that the standard initialization of the 40 pre-amble bytes (composed of ‘current’, ‘constant’, ‘lower’, ‘upper’, ‘timelow’, ‘timehigh’) that most C compilers would set to 0xCC or 0xCD will eat some cycles… as will the math performed at the bottom of every Get_AbsoluteTime call…
so for really pristine accuracy you would be best framing whatever it is you want to profile in RDTSC “inlines”… I would make use of the extended x64 registers to store the answer for later subtraction operations instead of messing around with slower memory access…
like for example something like this… this is mainly the concept by the way, because technically VC2010 doesn’t allow you to emit x64-Assembly via the __asm keyword ๐
…but I think it will give you the conceptual road to travel:
with that code I think the final answer of the number of cycles that elapsed will be in r10, a 64bit register… or in Cycles, a 64bit unsigned integer… with just a handful of cycles of error caused by the bit shifting and stack operations… provided that the code being profiled doesn’t shred r9 and r10 hehe… I forget what the most stable extended-x64 registers are…
also the “and rax, 0xFFFFFFFF” may be extraneous because I can’t remember if RDTSC zeroes out the upper 32bits of RAX or not… so I included that AND operation just in case ๐