I am attempting to obtain an average MFLOPS/S rate over many iterations for the

Question

0

Asked: June 16, 20262026-06-16T05:23:54+00:00 2026-06-16T05:23:54+00:00

I am attempting to obtain an average MFLOPS/S rate over many iterations for the

0

I am attempting to obtain an average MFLOPS/S rate over many iterations for the cblas_dgemm function from the Accelerate Mac OS X framework. This is the code I am using (it calls cblas_dgemm via the function pointer afp):

double benchmark_cblas_matmul(dgemm_fp afp,
   const CBLAS_ORDER Order,
   const CBLAS_TRANSPOSE TransA,
   const CBLAS_TRANSPOSE TransB,
   const int M,
   const int N,
   const int K,
   const double alpha,
   const double *A,
   const int lda,
   const double *B,
   const int ldb,
   const double beta,
   double *C,
   const int ldc)
{
    double mflops_s,seconds = -1.0;
    for(int n_iterations = 1; seconds < 0.1;  n_iterations *= 2)
    {
        seconds = read_timer(); 
        for(int i = 0; i < n_iterations; ++i) 
        {
            (*afp)(Order,TransA,TransB,M,N,K,alpha,A,lda,B,ldb,beta,C,ldc); 
        }
        seconds = read_timer() - seconds;
        mflops_s = (2e-6*n_iterations*N*N*N)/seconds;
    }
    return mflops_s;
}

The timer routine is:

double read_timer( )
{
    static bool initialized = false;
    static struct timeval start;
    struct timeval end;
    if( !initialized )
    {
        gettimeofday( &start, NULL );
        initialized = true;
    }

    gettimeofday( &end, NULL );

    return (end.tv_sec - start.tv_sec) + 1.0e-6 * (end.tv_usec - start.tv_usec);
}

The code typically runs a multiply of two 1000×1000 matrices. My problem is that consecutive timings of this code are extremely unreliable; even when the timing limit in the outer loop is increased to five seconds, the final rate varies between 20000 and 30000 mflops/s. I am on a 2011 Macbook Pro with OS X 10.8.2, with a quad core i5 with hyperthreading turned off with this kernel extension and no applications running except for Terminal when I benchmark. Does anyone have any suggestion for how to obtain more stable timings?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-16T05:23:56+00:00

There are some confounds that you haven’t controlled.

The processor in question has turbo modes that allow it to run faster than nominal clock rate so long as it is not thermally constrained. However, running a sustained GEMM benchmark keeps the cores pinned at nearly peak arithmetic throughput, which will eventually result in the cores reaching the limit of their thermal envelope, and the clock will be throttled down to the nominal rate, then to even slower frequencies.

Assuming that you’re seeing a downward trend in the measured performance, this may be responsible.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am attempting to obtain an average MFLOPS/S rate over many iterations for the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply