I am writing a c++ benchmarking program, which involves timing a number of function calls. The functions are called repeatedly and each time is recorded for statistical analysis later. It is required that the functions be run simultaneously on multiple threads and thus to ensure accuracy and fairness of the benchmark, it is run on a real-time OS, with the scheduling behavior being controlled. The following are my concerns:
Are there deterministic ways of collecting the timing data? I have looked at printf and stringstream but neither seems to have deterministic behavior due to memory & buffer operations. They also do not perform in O(1) for the same reason, am I right? Currently I am using a large char array and a custom strcat function so that each time value can be collected in O(1). This array is then printed at the end of the test, when all data has been collected.
I am using clock_gettime for timings and clock_getres gives me a resolution of 1ns. Can this value be trusted?
Am I doing things right so far, and are there any other issues that I should be aware of when writing the benchmark?
Calling high-frequency timers and writing samples into an output stream is a perfectly sensible way to get performance data. But there are a few tricky gotchas to be careful of.
CLOCK_PROCESS_CPUTIME_ID) should be reliable if the person who wrote your kernel wasn’t a dunce. You can look into the Performance Application Programming Interface library if you want to query the CPU timers directly, but that shouldn’t be necessary.Or, if you truly need to have 100% determinism, you’ll need to ensure that your threads schedule in the same order, run for the same quanta, and put their data in the same memory addresses for each run.