I’m building a toolkit that offers different algorithms in CUDA. However, many of these

Question

0

Asked: May 20, 20262026-05-20T00:23:48+00:00 2026-05-20T00:23:48+00:00

I’m building a toolkit that offers different algorithms in CUDA. However, many of these

0

I’m building a toolkit that offers different algorithms in CUDA. However, many of these algorithms use static constant global data that will be used by all threads, declared this way for example:

static __device__ __constant__ real buf[MAX_NB];

My problem is that if I include all the .cuh files in the library, when the library will be instantiated all this memory will be allocated on the device, even though the user might want to use only one of these algorithms. Is there any any way around this? Will I absolutely have to use the typical dynamically allocated memory?

I want the fastest constant memory possible that can be used by all threads at runtime. Any ideas?

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-20T00:23:48+00:00

Editorial Team

2026-05-20T00:23:48+00:00Added an answer on May 20, 2026 at 12:23 am

All the constant memory in a .cu file is allocated at launch (When the .cubin is generated and run, each .cu belongs to a different module)! Therefore, to use many different kernels that use constant memory, you have to divide them in .cu files as to not get a const memory overflow. The usual max is 64kb. Source: http://forums.nvidia.com/index.php?showtopic=185993

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m building a toolkit that offers different algorithms in CUDA. However, many of these

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply