Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8554213
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T14:49:47+00:00 2026-06-11T14:49:47+00:00

I am working with CUDA on the windows platform. On the windows platform we

  • 0

I am working with CUDA on the windows platform. On the windows platform we have access to both Parallel Nsight and Visual Profiler. Both are pretty good but then they have almost similar features for profiling and tracing. Can someone say me how are they both different and which one is better for the windows platform ?? I will basically be needing a tool for profiling.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T14:49:48+00:00Added an answer on June 11, 2026 at 2:49 pm

    Nsight Visual Studio Edition 2.2 offers the following advantages over the Visual Profiler:

    OVERALL

    1. Integration into Visual Studio 2008 SP1 and 2010 (requires Professional Edition as VS Express Edition does not support integration packages).

    2. Local and remote analysis sessions. Remote sessions can also be configured to copy the application and resources to the remote system.

    3. Collect information from a target application or from a process tree.

    4. Report views support more advanced grouping and filtering. Data tables can be exported to excel.

    TRACE ACTIVITY

    1. Trace OS activity including process, thread, and module lifetime, thread context switching, thread wait reasons, CPU utilization, process CPU utilization, and thread utilization.

    2. Collect API and GPU work trace for CUDA, OpenGL 2.x-3.x, DirectX 9-11, and OpenCL 1.1 and show all information on the timeline.

    3. Collection of call stack traces on all traced API calls or only when traced API calls return errors.

    4. CUDA software counters to show allocated memory per context.

    5. Additional control over what information is traced. This is critical as tracing too much information can cause the application to become CPU bound.

    6. Timeline and tree display for user annotations from NVIDIA Tools Extensions Library and D3D Performance Markers.

    CUDA PROFILING ACTIVITY

    1. The CUDA profiler provides a method to capture your kernel and replay it many times transparent to your application. This allows collection of profiling data in non-deterministic applications and with only 1 launch of your applications. The Visual Profiler <= 5 requires the application to be deterministic so that it can relaunch the application many times.

    2. Supports collection of many useful metrics not yet support by the Visual Profiler including warps eligible which is the most critical metric for understanding if you have sufficient occupancy and warp stall reasons to help you understand what is limiting the performance of the application.

    The Visual Profiler has the following advantages:

    1. Cross platform.

    2. Provides expert system to review the collected information.

    3. Links in the results to the CUDA Best Practices Guide.

    4. Timeline can show correlation between CPU and GPU events when you click on an event.

    5. CUDA 5.0 supports new command line profiler (nvprof).

    6. CUDA 5.0 supports source correlation for branch divergence and memory access with bad access patterns.

    7. CUDA 5.0 profiler is integrated into Nsight Eclipse Edition.

    8. Better support for Tesla PM counters.

    Visual Profiler in CUDA 5.0 adds a number of the features available in Nsight 1.5 and 2.x including

    • NVIDIA Tools Extension Library for annotating your application with ranges and markers that can be displayed in the timeline.

    • Concurrent kernel trace on Fermi and Kepler GPUs.

    Both tools will provide your very helpful information for analyzing your application. I recommend that you use the latest version of each of the tools.

    The upcoming version of Nsight VSE will have many new features for investigating the execution of your CUDA kernel. For more information see http://developer.download.nvidia.com/GTC/PDF/GTC2012/PresentationPDF/S0430-GTC2012-Developing-CUDA-Nsight.pdf.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm working on a C++ cross-platform OpenGL application (Windows, Linux and MacOS) and I
I am working on CUDA and I have a problem related to thread synchronization.
I am working with cuda and using openMP for parallel threading: #pragma omp parallel
Do you have any experience working with CUDA on an enabled GPU that uses
I have a CUDA application I'm working on with an array of Objects; each
I'm working on Windows Phone 7 which does not support features like CUDA or
I've just recently started working with Visual Studio this summer, primarily on CUDA and
I have four CUDA kernels working on matrices in the following way: convolution<<<>>>(A,B); multiplybyElement1<<<>>>(B);
I have done this for other apps but for some reason its not working
I am new to CUDA programming, and I am working on a problem that

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.