Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8343023
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 9, 20262026-06-09T05:56:17+00:00 2026-06-09T05:56:17+00:00

My intended program flow would look like the following if it were possible: typedef

  • 0

My intended program flow would look like the following if it were possible:

typedef struct structure_t
{
  [...]
  /* device function pointer. */
  __device__ float (*function_pointer)(float, float, float[]);
  [...]
} structure;

[...]

/* function to be assigned. */
__device__ float
my_function (float a, float b, float c[])
{
  /* do some stuff on the device. */
  [...]
}

void
some_structure_initialization_function (structure *st)
{
  /* assign. */
  st->function_pointer = my_function;
  [...]
}

This is not possible, and ends in a familiar error during compilation regarding the placement of __device__ in the structure.

 error: attribute "device" does not apply here

There are some examples of similar types of problems here on stackoverflow, but they all involve the use of static pointers outside the structure. Examples are device function pointers as struct members and device function pointers. I’ve taken a similar approach with success previously in other codes where it’s easy for me to use static device pointers and define them outside of any structures. Currently though this is a problem. It’s written as an API of sorts and the user may define one or two or dozens of structures which need to include a device function pointer. So, defining static device pointers outside of the structure is a major problem.

I’m fairly certain the answer exists within the posts I have linked above, through use symbol copies, but I’ve not been able to put them to successful use.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-09T05:56:18+00:00Added an answer on June 9, 2026 at 5:56 am

    What you are trying to do is possible, but you have made a few mistakes in the way you are declaring and defining the structures that will hold and use the function pointer.

    This is not possible, and ends in a familiar error during compilation
    regarding the placement of __device__ in the structure.

     error: attribute "device" does not apply here
    

    This is only because you are attempting to assign a memory space to a structure or class data member, which is illegal in CUDA. The memory space of the all class or structure data members are implicitly set when you define or instantiate a class. So something only slighlty different (and more concrete):

    typedef float (* fp)(float, float, float4);
    
    struct functor
    {
        float c0, c1;
        fp f;
    
        __device__ __host__
        functor(float _c0, float _c1, fp _f) : c0(_c0), c1(_c1), f(_f) {};
    
        __device__ __host__
        float operator()(float4 x) { return f(c0, c1, x); };
    };
    
    __global__
    void kernel(float c0, float c1, fp f, const float4 * x, float * y, int N)
    {
        int tid = threadIdx.x + blockIdx.x * blockDim.x;
    
        struct functor op(c0, c1, f);
        for(int i = tid; i < N; i += blockDim.x * gridDim.x) {
            y[i] = op(x[i]);
        }
    }
    

    is perfectly valid. The function pointer fp in functor is implicitly a __device__ function when an instance of functor is instantiated in device code. If it were instantiated in host code, the function pointer would implicitly be a host function. In the kernel, a device function pointer passed as argument is used to instantiate a functor instance. All perfectly legal.

    I believe I am correct in saying that there is no direct way to get the address of a __device__ function in host code, so you still require some static declarations and symbol manipulation. This might be different in CUDA 5, but I have not tested it to see. If we flesh out the device code above with a couple of __device__ functions and some supporting host code:

    __device__ __host__ 
    float f1 (float a, float b, float4 c)
    {
        return a + (b * c.x) +  (b * c.y) + (b * c.z) + (b * c.w);
    }
    
    __device__ __host__
    float f2 (float a, float b, float4 c)
    {
        return a + b + c.x + c.y + c.z + c.w;
    }
    
    __constant__ fp function_table[] = {f1, f2};
    
    int main(void)
    {
        const float c1 = 1.0f, c2 = 2.0f;
        const int n = 20;
        float4 vin[n];
        float vout1[n], vout2[n];
        for(int i=0, j=0; i<n; i++) {
            vin[i].x = j++; vin[i].y = j++;
            vin[i].z = j++; vin[i].w = j++;
        }
    
        float4 * _vin;
        float * _vout1, * _vout2;
        size_t sz4 = sizeof(float4) * size_t(n);
        size_t sz1 = sizeof(float) * size_t(n);
        cudaMalloc((void **)&_vin, sz4);
        cudaMalloc((void **)&_vout1, sz1);
        cudaMalloc((void **)&_vout2, sz1);
        cudaMemcpy(_vin, &vin[0], sz4, cudaMemcpyHostToDevice);
    
        fp funcs[2];
        cudaMemcpyFromSymbol(&funcs, "function_table", 2 * sizeof(fp));
    
        kernel<<<1,32>>>(c1, c2, funcs[0], _vin, _vout1, n);
        cudaMemcpy(&vout1[0], _vout1, sz1, cudaMemcpyDeviceToHost); 
    
        kernel<<<1,32>>>(c1, c2, funcs[1], _vin, _vout2, n);
        cudaMemcpy(&vout2[0], _vout2, sz1, cudaMemcpyDeviceToHost); 
    
        struct functor func1(c1, c2, f1), func2(c1, c2, f2); 
        for(int i=0; i<n; i++) {
            printf("%2d %6.f %6.f (%6.f,%6.f,%6.f,%6.f ) %6.f %6.f %6.f %6.f\n", 
                    i, c1, c2, vin[i].x, vin[i].y, vin[i].z, vin[i].w,
                    vout1[i], func1(vin[i]), vout2[i], func2(vin[i]));
        }
    
        return 0;
    }
    

    you get a fully compilable and runnable example. Here two __device__ functions and a static function table provide a mechanism for the host code to retrieve __device__ function pointers at runtime. The kernel is called once with each __device__ function and the results displayed, along with the exact same functor and functions instantiated and called from host code (and thus running on the host) for comparison:

    $ nvcc -arch=sm_30 -Xptxas="-v" -o function_pointer function_pointer.cu 
    
    ptxas info    : Compiling entry function '_Z6kernelffPFfff6float4EPKS_Pfi' for 'sm_30'
    ptxas info    : Function properties for _Z6kernelffPFfff6float4EPKS_Pfi
        16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
    ptxas info    : Function properties for _Z2f1ff6float4
        24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
    ptxas info    : Function properties for _Z2f2ff6float4
        24 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
    ptxas info    : Used 16 registers, 356 bytes cmem[0], 16 bytes cmem[3]
    
    $ ./function_pointer 
     0      1      2 (     0,     1,     2,     3 )     13     13      9      9
     1      1      2 (     4,     5,     6,     7 )     45     45     25     25
     2      1      2 (     8,     9,    10,    11 )     77     77     41     41
     3      1      2 (    12,    13,    14,    15 )    109    109     57     57
     4      1      2 (    16,    17,    18,    19 )    141    141     73     73
     5      1      2 (    20,    21,    22,    23 )    173    173     89     89
     6      1      2 (    24,    25,    26,    27 )    205    205    105    105
     7      1      2 (    28,    29,    30,    31 )    237    237    121    121
     8      1      2 (    32,    33,    34,    35 )    269    269    137    137
     9      1      2 (    36,    37,    38,    39 )    301    301    153    153
    10      1      2 (    40,    41,    42,    43 )    333    333    169    169
    11      1      2 (    44,    45,    46,    47 )    365    365    185    185
    12      1      2 (    48,    49,    50,    51 )    397    397    201    201
    13      1      2 (    52,    53,    54,    55 )    429    429    217    217
    14      1      2 (    56,    57,    58,    59 )    461    461    233    233
    15      1      2 (    60,    61,    62,    63 )    493    493    249    249
    16      1      2 (    64,    65,    66,    67 )    525    525    265    265
    17      1      2 (    68,    69,    70,    71 )    557    557    281    281
    18      1      2 (    72,    73,    74,    75 )    589    589    297    297
    19      1      2 (    76,    77,    78,    79 )    621    621    313    313
    

    If I have understood your question correctly, the above example should give you pretty much all the design patterns you need to implement your ideas in device code.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I would like to ask if there existed a program, which were intended to
The following program does not work as I intended. #include <string.h> #include <stdio.h> int
I have a problem where the fog works like intended on a desktop program
I realize this sounds like something a malware program would do, so I understand
I intended for this function to call my MVC action method that returns a
Is this an intended behavior or is it a bug? Consider the following trait
I'm having some trouble with a program that is intended to be a String
I'm writing a C++ program intended to manage the resources and duties in a
The folowing program is intended to store words alphabetically in the Binary Search Tree
The short program below is intended to iterate over argv passed from the command

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.