Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7189149
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 28, 20262026-05-28T19:14:25+00:00 2026-05-28T19:14:25+00:00

I am using Tesla C2050, which has a compute capability 2.0 and has 48KB

  • 0

I am using Tesla C2050, which has a compute capability 2.0 and has 48KB shared memory . But when I try to use this shared memory the nvcc compiler gives me the following error

Entry function '_Z4SAT3PhPdii' uses too much shared data (0x8020 bytes + 0x10 bytes system, 0x4000 max)

SAT1 is the naive implementation of a scan algorithm, and because I am operating on images sizes of the order 4096x2160 I have to use double to calculate the cumulative sum. Though Tesla C2050 does not support double, but it nevertheless does the task by demoting it to float. But for an image width of 4096 the shared memory size comes out to be greater 16KB but it is well within the 48KB limit.

Can anybody help me understand what is happening here. I am using CUDA Toolkit 3.0.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-28T19:14:26+00:00Added an answer on May 28, 2026 at 7:14 pm

    By default, Fermi cards run in a compatibility mode, with 16kb shared memory and 48kb L1 cache per multiprocessor. The API call cudaThreadSetCacheConfig can be used to change the GPU to run with 48kb shared memory and 16kb L1 cache, if you require it. You then must compile the code for compute capability 2.0 to avoid the code generation error you are seeing.

    Also, your Telsa C2050 does support double precision. If you are getting compiler warnings about demoting doubles, it means you are not compiling your code for the correct architecture. Add

    --arch=sm_20
    

    to your nvcc arguments and the GPU toolchain will compile for your Fermi card, and will include double precision support and other Fermi specific hardware features, including larger shared memory size.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Using android 2.3.3, I have a background Service which has a socket connection. There's
Using PyObjC , you can use Python to write Cocoa applications for OS X.
Using C#, I need a class called User that has a username, password, active
Using the navigator.geolocation object in JavaScript. Trying to establish accurate ranges, but wondering exactly
the sample progame when i try to run using the expandometaclass technique it give
I'm new at C#, but I guess I'm using a simple command in Directory.CreateDirectory,
Using C++ preprocessor directives, is it possible to test if a preprocessor symbol has
Using Xcode4.2.1, with a basic PhoneGap template based app. (I say template, but I
this is the output of ginfo using Jacket/matlab: Detected CUDA-capable GPUs: CUDA driver 270.81,
I am using code::blocks, with, I believe gcc. Given the example code (this is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.