Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8309141
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T19:01:07+00:00 2026-06-08T19:01:07+00:00

Given a CUDA vector type int4 , how can I load 128 bits of

  • 0

Given a CUDA vector type int4, how can I load 128 bits of data from constant memory.

This doesn’t seem to work:

#include <stdio.h>
#include <cuda.h>

__constant__ int constant_mem[4];
__global__ void kernel(){
    int4 vec;
    vec = constant_mem[0];
}
int main(void){return 0;}

On the seventh line I’m trying to load all 4 integer values in the constant memory into the 128-bit vector type. This operation results in the following compilation error:

vectest.cu(7): error: no operator "=" matches these operands
            operand types are: int4 = int

Also, is it possible to access the vector type directly without having to cast it, like so:

int data = vec[0];

Switch statement in PTX assembly:

    @%p1 bra    BB1_55;

    setp.eq.s32     %p26, %r1, 1;
    @%p26 bra   BB1_54;

    setp.eq.s32     %p27, %r1, 2;
    @%p27 bra   BB1_53;

    setp.ne.s32     %p28, %r1, 3;
    @%p28 bra   BB1_55;

    mov.u32     %r961, %r61;
    bra.uni     BB1_56;

BB1_53:
    mov.u32     %r961, %r60;
    bra.uni     BB1_56;

BB1_54:
    mov.u32     %r961, %r59;
    bra.uni     BB1_56;

BB1_55:
    mov.u32     %r961, %r58;

BB1_56:
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T19:01:09+00:00Added an answer on June 8, 2026 at 7:01 pm

    In the first case, casting is probably the simplest solution, so something like this:

    __constant__ int constant_mem[4];
    __global__ void kernel(){
        int4 vec = * reinterpret_cast<int4 *>(&constant_mem);
    }
    

    (disclaimer written in browser, not compiled or tested, use at own risk)

    Using the C++ reinterpret_cast operator will force compiler will emit a 128 bit load instruction.

    In the second case, it sounds like you want to directly address 32 bit words stored in an array of 128 bit vector types, using 128 bit memory transactions. That requires some helper functions, perhaps something like:

    __inline__ __device__ int fetch4(const int4 val, const int n)
    {
         (void) val.x; (void) val.y; (void) val.z; (void) val.w;
         switch(n) {
             case 3:
                return val.w;
             case 2: 
                return val.z;
             case 1:
                return val.y;
             case 0:
             default:
                return val.x;
        }
    }
    
    __device__ int index4(const int4 * array, const int n)
    {
        int div = n / 4;
        int mod = n - (div * 4);
    
        int4 val = array[div]; // 128 bit load here
    
        return fetch4(val, mod);
    }
    
    __constant__ int constant_mem[128];
    __global__ void kernel(){
        int val = index4(constant_mem, threadIdx.x);
    }
    

    (disclaimer written in browser, not compiled or tested, use at own risk)

    Here we force a 128 bit transaction by reading whole int4 values and parsing their contents (the casts to void are an incantation necessary for older versions of the open64 compiler which was prone to optimize vector loads if it thought members were unused). There are a few IOPs of overhead to do the indexing, but they are potentially worth it if the load bandwidth of the resulting transaction is higher. The switch statement is probably compiled using conditional execution, so there shouldn’t be a branch divergence penalty. Be aware that very random access to an array of int4 values can potentially waste a lot of bandwidth and cause warp serialization. There is potentially a big negative performance impact in doing so.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Given HTML like this: <body> <form name=myForm action=savedata.php method=post> <input type=text name=myName /> </form>
Given: http://www.foo.com/bar.html#baz How does one get the baz ? I can't find this as
I'm developing simple CUDA app. I followed steps given on http://www.ademiller.com/blogs/tech/2010/10/visual-studio-2010-adding-intellisense-support-for-cuda-c/ but still there
Can anyone give me a good explanation as to the nature of CUDA C
Given that the web application doesn't have su privileges, I'd like to execute a
Given this method to work on a HTML page in a webbrowser: bool semaphoreForDocCompletedEvent;
Given a MySQL table of real estate data, I would like to generate a
How can I create global variables in CUDA?? Could you please give me an
Was looking into some GPU CUDA samples and trying some samples out from Nvidia's
I'm trying to load cuda driver api functions on runtime with dlsym, and i

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.