Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8735379
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T10:02:27+00:00 2026-06-13T10:02:27+00:00

I have this low level for loop I’ve written in C that a friend

  • 0

I have this low level for loop I’ve written in C that a friend suggested I write in CUDA. I’ve set up my CUDA enviroment and have been looking at the docs, but i’m still struggling with the syntax for what’s been well over 2 weeks now. Can anyone help me out? What would this look like in CUDA?

float* red = new float [N];
float* green = new float [N];
float* blue = new float [N];

for (int y = 0; y < h; y++)
{
    // Get row ptr from the color image
    const unsigned char* src = rowptr<unsigned char>(color, 0, y, w);

    // Get row ptrs for the destination channel features
    float* rptr = rowptr<float>(red, 0, y, w);
    float* gptr = rowptr<float>(green, 0, y, w);
    float* bptr = rowptr<float>(blue, 0, y, w);

    for (int x = 0; x < w; x++)
    {
        *rptr++ = (float)*src++;
        *gptr++ = (float)*src++;
        *bptr++ = (float)*src++;
    }
}
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T10:02:28+00:00Added an answer on June 13, 2026 at 10:02 am

    Here is some sample code. I don’t know if it will really answer your questions. Probably you will need to learn more about CUDA. If you can spare the time, taking this webinar and this webinar from the nvidia webinar page would be 2 hours well spent. Also the cuda C programmers manual is a good readable reference.

    #include <stdio.h>
    
    #define N      256
    #define NUMROW   N
    #define NUMCOL   N
    #define PIXSIZE  3
    #define REDOFF   0
    #define GREENOFF 1
    #define BLUEOFF  2
    #define nTPB    16
    #define GRNVAL   5
    #define REDVAL   7
    #define BLUVAL   9
    
    #define cudaCheckErrors(msg) \
        do { \
            cudaError_t __err = cudaGetLastError(); \
            if (__err != cudaSuccess) { \
                fprintf(stderr, "Fatal error: %s (%s at %s:%d)\n", \
                    msg, cudaGetErrorString(__err), \
                    __FILE__, __LINE__); \
                fprintf(stderr, "*** FAILED - ABORTING\n"); \
                exit(1); \
            } \
        } while (0)
    
    __global__ void kern(const unsigned numrow, const unsigned numcol, const unsigned char* src, float* rptr, float* gptr, float* bptr){
    
      unsigned idx = threadIdx.x + (blockDim.x*blockIdx.x);
      unsigned idy = threadIdx.y + (blockDim.y*blockIdx.y);
      if ((idx < numcol) && (idy < numrow)){
    
        rptr[(idy*numcol)+idx] = (float)src[(((idy*numcol)+idx)*PIXSIZE)+REDOFF];
        gptr[(idy*numcol)+idx] = (float)src[(((idy*numcol)+idx)*PIXSIZE)+GREENOFF];
        bptr[(idy*numcol)+idx] = (float)src[(((idy*numcol)+idx)*PIXSIZE)+BLUEOFF];
        }
    }
    
    int main (){
    
      float *h_red, *h_green, *h_blue;
      float *d_red, *d_green, *d_blue;
      unsigned char *h_img, *d_img;
    
      if ((h_img =(unsigned char*)malloc(NUMROW*NUMCOL*PIXSIZE*sizeof(unsigned char))) == 0) {printf("malloc fail\n"); return 1;}
      if ((h_red =(float*)malloc(NUMROW*NUMCOL*sizeof(float))) == 0) {printf("malloc fail\n"); return 1;}
      if ((h_green =(float*)malloc(NUMROW*NUMCOL*sizeof(float))) == 0) {printf("malloc fail\n"); return 1;}
      if ((h_blue =(float*)malloc(NUMROW*NUMCOL*sizeof(float))) == 0) {printf("malloc fail\n"); return 1;}
    
      cudaMalloc((void **)&d_img, (NUMROW*NUMCOL*PIXSIZE)*sizeof(unsigned char));
      cudaCheckErrors("cudaMalloc1 fail");
      cudaMalloc((void **)&d_red, (NUMROW*NUMCOL)*sizeof(float));
      cudaCheckErrors("cudaMalloc2 fail");
      cudaMalloc((void **)&d_green, (NUMROW*NUMCOL)*sizeof(float));
      cudaCheckErrors("cudaMalloc3 fail");
      cudaMalloc((void **)&d_blue, (NUMROW*NUMCOL)*sizeof(float));
      cudaCheckErrors("cudaMalloc4 fail");
    
      for (int i=0; i<NUMROW*NUMCOL; i++){
        h_img[(i*PIXSIZE)+ REDOFF]   = REDVAL;
        h_img[(i*PIXSIZE)+ GREENOFF] = GRNVAL;
        h_img[(i*PIXSIZE)+ BLUEOFF]  = BLUVAL;
        }
    
      cudaMemcpy(d_img, h_img, (NUMROW*NUMCOL*PIXSIZE)*sizeof(unsigned char), cudaMemcpyHostToDevice);
      cudaCheckErrors("cudaMemcpy1 fail");
    
      dim3 block(nTPB, nTPB);
      dim3 grid(((NUMCOL+nTPB-1)/nTPB),((NUMROW+nTPB-1)/nTPB));
      kern<<<grid,block>>>(NUMROW, NUMCOL, d_img, d_red, d_green, d_blue);
      cudaMemcpy(h_red, d_red, (NUMROW*NUMCOL)*sizeof(float), cudaMemcpyDeviceToHost);
      cudaCheckErrors("cudaMemcpy2 fail");
      cudaMemcpy(h_green, d_green, (NUMROW*NUMCOL)*sizeof(float), cudaMemcpyDeviceToHost);
      cudaCheckErrors("cudaMemcpy3 fail");
      cudaMemcpy(h_blue, d_blue, (NUMROW*NUMCOL)*sizeof(float), cudaMemcpyDeviceToHost);
      cudaCheckErrors("cudaMemcpy4 fail");
    
      for (int i=0; i<(NUMROW*NUMCOL); i++){
        if (h_red[i] != REDVAL) {printf("Red mismatch at offset %d\n", i); return 1;}
        if (h_green[i] != GRNVAL) {printf("Green mismatch at offset %d\n", i); return 1;}
        if (h_blue[i] != BLUVAL) {printf("Blue mismatch at offset %d\n", i); return 1;}
        }
      printf("Success!\n");
      return 0;
    }
    

    In response to a question posed in the comments, here is a modified kernel that shows how to use the rowptr<> template as defined in the comments. Just replace the kernel code above with this:

    template <typename T> T* rowptr(T* start, int x, int y, int w) __device__ __host__ { return start + y*w + x; }
    
    __global__ void kern(const unsigned numrow, const unsigned numcol, unsigned char* isrc, float* rptr, float* gptr, float* bptr){
    
    
      unsigned idx = threadIdx.x + (blockDim.x*blockIdx.x);
      unsigned idy = threadIdx.y + (blockDim.y*blockIdx.y);
      if ((idx < numcol) && (idy < numrow)){
        unsigned char *src = rowptr<unsigned char>(isrc, (idx*PIXSIZE), idy, (numcol*PIXSIZE));
    
        rptr[(idy*numcol)+idx] = (float)*src++;
        gptr[(idy*numcol)+idx] = (float)*src++;
        bptr[(idy*numcol)+idx] = (float)*src;
        }
    }
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have no experience with low level programing and I need this piece of
The ultimate goal of this project is to send low level input (so that
We have some code that must access low level windows XP os calls which
At start, we have this basic enum. public enum E_Levels { [ValueOfEnum(Low level)] LOW,
I'm low level as3 programmer and I need help whit this code: I have
I have a Silverlight app that needs low-level socket support. It's an Out-of-Browser Trusted
I have searched high and low for documentation on how to use this feature.
I have this VM with tomcat, java, and grails in it. I've been getting
I have successfully gotten my low-level mouse hook code to work, but there are
I know this is a low-level question but, not being a database person, I

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.