Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8302749
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T17:21:41+00:00 2026-06-08T17:21:41+00:00

I am running some image processing operations on GPU and I need the histogram

  • 0

I am running some image processing operations on GPU and I need the histogram of the output.
I have written and tested the processing kernels. Also I have tested the histogram kernel for samples of the output pictures separately. They both work fine but when I put all of them in one loop I get nothing.

This is my histogram kernel:

__global__ void histogram(int n, uchar* color, uchar* mask, int* bucket, int ch, int W, int bin)
{
    unsigned int X = blockIdx.x*blockDim.x+threadIdx.x;
    unsigned int Y = blockIdx.y*blockDim.y+threadIdx.y;

    int l = (256%bin==0)?256/bin: 256/bin+1;
    int c;

    if (X+Y*W < n && mask[X+Y*W])
    {
        c = color[(X+Y*W)*3]/bin;
        atomicAdd(&bucket[c], 1);

        c = color[(X+Y*W)*3+1]/bin;
        atomicAdd(&bucket[c+l], 1);

        c = color[(X+Y*W)*3+2]/bin;
        atomicAdd(&bucket[c+l*2], 1);
    }
}

It is updating histogram vectors for red, green, and blue.(‘l’ is the length of the vectors)
When I comment out atomicAdds it again produces the output but of course not the histogram.
Why don’t they work together?

Edit:

This is the loop:

    cudaMemcpy(frame_in_gpu,frame_in.data, W*H*3*sizeof(uchar),cudaMemcpyHostToDevice);
    cuda_process(frame_in_gpu, frame_out_gpu, W, H, dimGrid,dimBlock);
    cuda_histogram(W*H, frame_in_gpu, mask_gpu, hist, 3, W, bin, dimg_histogram, dimb_histogram);

Then I copy the output to host memory and write it to a video.
These are c codes that only call their kernels with dimGrid and dimBlock that are given as inputs. Also:

dim3 dimBlock(32,32);
dim3 dimGrid(W/32,H/32);
dim3 dimb_Histogram(16,16);
dim3 dimg_Histogram(W/16,H/16);

I changed this for histogram because it worked better with it. Does it matter?

Edit2:
I am using -arch=sm_11 option for compilation. I just read it somewhere. Could anyone tell me how I should choose it?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T17:21:45+00:00Added an answer on June 8, 2026 at 5:21 pm

    perhaps you should try to compile without -arch=sm_11 flag.
    sm 1.1 is the first architecture which supported atomic operations on global memory while your GPU supports SM 2.0. Hence there is no reason to compile for SM 1.1 unless for backward compatibility.

    One possible issue could be that SM 1.1 does not support atomic operations on 64-bit ints in global memory. So I would suggest you recompile the code without -arch option, or use
    -arch=sm_20 if you like

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have some image processing code that runs on a background thread and updates
I'm running a QTConcurrent::Map on a list of items to perform some image processing
I have an application that does some image processing on each new frame, recently
I was running into the following issue: // $data contains some image data $gm
When using GetModuleFileNameEx to query the image path of a running process, some processes
I'm currently implementing some image processing code for Android. I'm aware of the memory
I am running a stack of image filters and seem to be hitting some
I have a strange situation. I have a fairly memory intense process (image processing)
I've got some image processing code in C++ which calculates gradients and finds straight
My problem is the following: I have an image in which I detect some

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.