Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9236383
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T07:16:46+00:00 2026-06-18T07:16:46+00:00

I’ve recently started using Mathematica’s CUDALink with a GT430 and am using CUDADot to

  • 0

I’ve recently started using Mathematica’s CUDALink with a GT430 and am using CUDADot to multiply a 150000×1038 matrix (encs) by a 1038×1 matrix (probe). Both encs and probe are registered with the memory manager:

mmEncs = CUDAMemoryLoad[encs];
mmProbe = CUDAMemoryLoad[probe];

I figured that a dot product of these would max out the GT430, so I tested with the following:

For[i = 0, i < 10, i++,
 CUDADot[mmEncs, mmProbe];
]

While it runs, I use MSI’s “Afterburner” utility to monitor GPU usage. The following screenshot shows the result:

enter image description here

There’s a distinct peak for each CUDADot operation and, overall, I’d say this picture indicates that I’m utilizing less than 1/4 of GPU capacity. Two questions:

Q1: Why do peaks max out at 50%? Seems low.

Q2: Why are there are such significant periods of inactivity between peaks?

Thanks in advance for any hints! I have no clue w.r.t. Q1 but maybe Q2 is because of unintended memory transfers between host and device?

Additional info since original posting: CUDAInformation[] reports “Core Count -> 64” but NVIDIA Control Panel reports “CUDA Cores: 96”. Is there any chance that CUDALink will under-utilize the GT430 if it’s operating on the false assumption that it has 64 cores?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T07:16:47+00:00Added an answer on June 18, 2026 at 7:16 am

    I am going to preface this answer by noting that I have no idea what “MSI Afterburner” is really measuring, or at what frequency it is sampling that quantity which it measures, and I don’t believe you do either. That means we don’t know what either the units of x or y axis in your screenshot are. This makes any quantification of performance pretty much impossible.

    1.Why do peaks max out at 50%? Seems low.

    I don’t believe you can say it “seems low” if you don’t know what it is really measuring. If, for example, it measures instruction throughput, it could be that the Mathematica dot kernel is memory bandwidth limited on your device. That means the throughput bottleneck of the code would be memory bandwidth, rather than SM instruction throughput. If you were to plot memory throughput, you would see 100%. I would expect a gemv operation to be memory bandwidth bound, so this result is probably not too surprising.

    2.Why are there are such significant periods of inactivity between peaks?

    The CUDA API has device and host side latency. On a WDDM platform (so Windows Vist, 7, 8, and whatever server versions are derived from them), this host side latency is rather high and the CUDA driver does batching of operations to help amortise that latency. This batching can lead to “gaps” or “pauses” in GPU operations. I think that is what you are seeing here. NVIDIA have a dedicated computation driver (TCC) for Telsa cards on the Windows platform to overcome these limitations.

    A much better way to evaluate the performance of this operation would be to time the loop yourself, compute an average time per call, calculate the operation count (a dot product has a known lower bound you can work out from the dimensions of the matrix and vector), and compute a FLOP/s value. You can compare that to the specifications of your GPU to see how well or badly it is performing.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

link Im having trouble converting the html entites into html characters, (&# 8217;) i
That's pretty much it. I'm using Nokogiri to scrape a web page what has
I am using JSon response to parse title,date content and thumbnail images and place
I am using the SimpleRSS gem to parse a WordPress RSS feed. The only
I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this
We're building an app, our first using Rails 3, and we're having to build
We are using XSLT to translate a RIXML file to XML. Our RIXML contains
I'm trying to convert HTML to plain text. I get many &\#8217; &\#8220; etc.
I have thousands of HTML files to process using Groovy/Java and I need to

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.