When looking at the name of the performance counters in NVIDIA Fermi architecture (the

Question

0

Asked: May 24, 20262026-05-24T13:32:48+00:00 2026-05-24T13:32:48+00:00

When looking at the name of the performance counters in NVIDIA Fermi architecture (the

0

When looking at the name of the performance counters in NVIDIA Fermi architecture (the file Compute_profiler.txt in the doc folder of cuda), I noticed that for L2 cache misses, there are two performance counters, l2_subp0_read_sector_misses and l2_subp1_read_sector_misses. They said that these are for two slices of L2.

Why do they have two slices of L2? Is there any relation with the Streaming Multi-processor architecture? What would be the effect of this division to the performance?

Thanks

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T13:32:49+00:00

Editorial Team

2026-05-24T13:32:49+00:00Added an answer on May 24, 2026 at 1:32 pm

I don’t think there is any direct relation with the streaming multiprocessor.

I just think that slice is equivalent of bank memory.

Just sum the values of the two to get the “total” L2 read misses.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When looking at the name of the performance counters in NVIDIA Fermi architecture (the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply