Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8806153
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 14, 20262026-06-14T02:03:41+00:00 2026-06-14T02:03:41+00:00

I benchmarked Eigen SGEMM operation using one thread and using 8 threads and what

  • 0

I benchmarked Eigen SGEMM operation using one thread and using 8 threads and what I got was that the performance peaked at 512×512 but then droped when exceding that size. I was wondering if there was any specific reason for this perhaps something with complexety of the larger matrix’s? I looked at the benchmark on the website of Eigen for matrix-matrix operations but didn’t see anything similar.

At 512×512 I got like 4x faster in parallel. But in 4096×4096 I got barely 2x faster. I am using openMP for parallelism and to down it to one thread I set num_of_threads to two.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-14T02:03:44+00:00Added an answer on June 14, 2026 at 2:03 am

    Your results suggest that this algorithm is primarily memory bandwidth bound at large matrix size. 4Kx4K matrix (float?) exceeds cache size of any CPU available to mere mortals, while 512×512 will comfortably fit into L3 cache on most modern CPUs.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I recently read in a presentation on Scribd that Facebook had benchmarked a variety
I have been benchmarked my multihreaded program using -agentlib:hprof=cpu=samples and was surprised to find
I'am trying to measure the performance of a computer vision program that tries to
So basically I have just benchmarked my update_feeds controller and found that the amount
Has anyone benchmarked the effect of using AWS/Cloudfront as a CDN on response times?
Due to performance issues (benchmarked) I'm trying to use another autoloader than the default
I would like to know the performance difference in updating a table using the
I was reading this SO article and was amazed to find that no one
Okay, this'll be a long one, I apologise for that in advance. =) I
I recently benchmarked the .NET 4 garbage collector, allocating intensively from several threads. When

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.