I benchmarked Eigen SGEMM operation using one thread and using 8 threads and what

Question

0

Asked: June 14, 20262026-06-14T02:03:41+00:00 2026-06-14T02:03:41+00:00

I benchmarked Eigen SGEMM operation using one thread and using 8 threads and what

0

I benchmarked Eigen SGEMM operation using one thread and using 8 threads and what I got was that the performance peaked at 512×512 but then droped when exceding that size. I was wondering if there was any specific reason for this perhaps something with complexety of the larger matrix’s? I looked at the benchmark on the website of Eigen for matrix-matrix operations but didn’t see anything similar.

At 512×512 I got like 4x faster in parallel. But in 4096×4096 I got barely 2x faster. I am using openMP for parallelism and to down it to one thread I set num_of_threads to two.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T02:03:44+00:00

Editorial Team

2026-06-14T02:03:44+00:00Added an answer on June 14, 2026 at 2:03 am

Your results suggest that this algorithm is primarily memory bandwidth bound at large matrix size. 4Kx4K matrix (float?) exceeds cache size of any CPU available to mere mortals, while 512×512 will comfortably fit into L3 cache on most modern CPUs.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I benchmarked Eigen SGEMM operation using one thread and using 8 threads and what

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply