All the Intel CPUs in the last decade (at least) include a set of performance monitors that count a variety of events. Do the latest Intel CPUs, Core i3, i5 and i7 (aka Nehalem) provide a mechanism to count Instructions Per Clock (IPC)? If so, how are they used?
If this is possible, I’ll probably be writing the code for this in Assembly, but Windows or Linux system calls may also come in useful.
Yes, the Vtune from Intel (linux and windows) can measure IPC.
If you want to measure it by yourself with precise counters for some part of code, you need to use some performance api like PAPI or perfctr (both for linux).
They uses hardware performance counters, described in intel manuals http://www.intel.com/products/processor/manuals/
Volume 3D, Chapter 30 & appendix A.
http://www.intel.com/Assets/PDF/manual/253669.pdf
Vtune uses the ratio of “Instructions Retired” and “Non-sleep clockticks ” to compute CPI (“Cycles per instructions retired”). For Core2 the performance counters used are: “CPU_CLK_UNHALTED.CORE”,”INST_RETIRED.ANY”
This counters are the same for all Core* CPUs:
Appendix A1 of Volume 3B, page384: