I saw this post on SO which contains C code to get the latest CPU Cycle count:
CPU Cycle count based profiling in C/C++ Linux x86_64
Is there a way I can use this code in C++ (windows and linux solutions welcome)? Although written in C (and C being a subset of C++) I am not too certain if this code would work in a C++ project and if not, how to translate it?
I am using x86-64
EDIT2:
Found this function but cannot get VS2010 to recognise the assembler. Do I need to include anything? (I believe I have to swap uint64_t to long long for windows….?)
static inline uint64_t get_cycles()
{
uint64_t t;
__asm volatile ("rdtsc" : "=A"(t));
return t;
}
EDIT3:
From above code I get the error:
“error C2400: inline assembler syntax error in ‘opcode’; found ‘data
type'”
Could someone please help?
Starting from GCC 4.5 and later, the
__rdtsc()intrinsic is now supported by both MSVC and GCC.But the include that’s needed is different:
Here’s the original answer before GCC 4.5.
Pulled directly out of one of my projects:
This GNU C Extended asm tells the compiler:
volatile: the outputs aren’t a pure function of the inputs (so it has to re-run every time, not reuse an old result)."=a"(lo)and"=d"(hi): the output operands are fixed registers: EAX and EDX. (x86 machine constraints). The x86rdtscinstruction puts its 64-bit result in EDX:EAX, so letting the compiler pick an output with"=r"wouldn’t work: there’s no way to ask the CPU for the result to go anywhere else.((uint64_t)hi << 32) | lo– zero-extend both 32-bit halves to 64-bit (because lo and hi areunsigned), and logically shift + OR them together into a single 64-bit C variable. In 32-bit code, this is just a reinterpretation; the values still just stay in a pair of 32-bit registers. In 64-bit code you typically get an actual shift + OR asm instructions, unless the high half optimizes away.(editor’s note: this could probably be more efficient if you used
unsigned longinstead ofunsigned int. Then the compiler would know thatlowas already zero-extended into RAX. It wouldn’t know that the upper half was zero, so|and+are equivalent if it wanted to merge a different way. The intrinsic should in theory give you the best of both worlds as far as letting the optimizer do a good job.)https://gcc.gnu.org/wiki/DontUseInlineAsm if you can avoid it. But hopefully this section is useful if you need to understand old code that uses inline asm so you can rewrite it with intrinsics. See also https://stackoverflow.com/tags/inline-assembly/info