Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9022019
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 16, 20262026-06-16T05:23:54+00:00 2026-06-16T05:23:54+00:00

I am attempting to obtain an average MFLOPS/S rate over many iterations for the

  • 0

I am attempting to obtain an average MFLOPS/S rate over many iterations for the cblas_dgemm function from the Accelerate Mac OS X framework. This is the code I am using (it calls cblas_dgemm via the function pointer afp):

double benchmark_cblas_matmul(dgemm_fp afp,
   const CBLAS_ORDER Order,
   const CBLAS_TRANSPOSE TransA,
   const CBLAS_TRANSPOSE TransB,
   const int M,
   const int N,
   const int K,
   const double alpha,
   const double *A,
   const int lda,
   const double *B,
   const int ldb,
   const double beta,
   double *C,
   const int ldc)
{
    double mflops_s,seconds = -1.0;
    for(int n_iterations = 1; seconds < 0.1;  n_iterations *= 2)
    {
        seconds = read_timer(); 
        for(int i = 0; i < n_iterations; ++i) 
        {
            (*afp)(Order,TransA,TransB,M,N,K,alpha,A,lda,B,ldb,beta,C,ldc); 
        }
        seconds = read_timer() - seconds;
        mflops_s = (2e-6*n_iterations*N*N*N)/seconds;
    }
    return mflops_s;
}

The timer routine is:

double read_timer( )
{
    static bool initialized = false;
    static struct timeval start;
    struct timeval end;
    if( !initialized )
    {
        gettimeofday( &start, NULL );
        initialized = true;
    }

    gettimeofday( &end, NULL );

    return (end.tv_sec - start.tv_sec) + 1.0e-6 * (end.tv_usec - start.tv_usec);
}

The code typically runs a multiply of two 1000×1000 matrices. My problem is that consecutive timings of this code are extremely unreliable; even when the timing limit in the outer loop is increased to five seconds, the final rate varies between 20000 and 30000 mflops/s. I am on a 2011 Macbook Pro with OS X 10.8.2, with a quad core i5 with hyperthreading turned off with this kernel extension and no applications running except for Terminal when I benchmark. Does anyone have any suggestion for how to obtain more stable timings?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-16T05:23:56+00:00Added an answer on June 16, 2026 at 5:23 am

    There are some confounds that you haven’t controlled.

    The processor in question has turbo modes that allow it to run faster than nominal clock rate so long as it is not thermally constrained. However, running a sustained GEMM benchmark keeps the cores pinned at nearly peak arithmetic throughput, which will eventually result in the cores reaching the limit of their thermal envelope, and the clock will be throttled down to the nominal rate, then to even slower frequencies.

    Assuming that you’re seeing a downward trend in the measured performance, this may be responsible.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Attempting to use the data series from this example no longer passes the JSONLint
I am attempting to obtain class-data associated with a mouse-clicked ImageButton; which ImageButton is
Attempting to use the amazon API to obtain product data and currently failing miserably.
I'm new at programming, new on this site too, so hello... I'm attempting to
I am using Reflections to obtain a method from a class that is annotated
I am attempting to find a logging framework for a Cocoa application, written in
I am attempting to reformat the data set my.data to obtain the output shown
How can one programmatically obtain a KeyStore from a PEM file containing both a
The problem: I need to obtain the selected text from a window in a
Attempting to learn from doing an implementation of Quicksort, I cannot find out why

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.