Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7986877
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 4, 20262026-06-04T11:58:47+00:00 2026-06-04T11:58:47+00:00

I have Tried to Implement the HAAR wavelet transform in CUDA for a 1D

  • 0

I have Tried to Implement the HAAR wavelet transform in CUDA for a 1D array.

ALGORITHM

I have 8 indices in the input array

With this condition if(x_index>=o_width/2 || y_index>=o_height/2) I will have 4 threads which should be 0,2,4,6 and I plan to handletwo indices in the input with each one of them.

I calculate the avg.EG: if my thread id is ‘0’-then avg is (input[0]+input[1])/2 and then at the same time i get the diff which would be input[0]-avg and so on for the rest of the threads.

NOW important thing is the placement of the output.I created a separate thread_id for the output as using indices 0,2,4,6 was creating difficulties with placement of the output in the correct index.

My avgs should be placed in the first 4 indices i.e 0,1,2,3 of the output-and o_thread_id should be 0,1,2,3.
Similarly,to place differences at 4,5,6,7 I have incremented 0,1,2,3 with ‘4’ as shown in the code

PROBLEM

My output comes out as all zero!!! No matter what I change I am getting that.

CODE

__global__ void cal_haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height)
{

    int x_index=blockIdx.x*blockDim.x+threadIdx.x;
    int y_index=blockIdx.y*blockDim.y+threadIdx.y;

    if(x_index>=o_width/2 || y_index>=o_height/2) return;

    int i_thread_id=y_index*i_widthstep+(2*x_index);
    int o_thread_id=y_index*o_widthstep+x_index;

    float avg=(input[i_thread_id]+input[i_thread_id+1])/2;
    float diff=input[i_thread_id]-avg;
    output[o_thread_id]=avg;
    output[o_thread_id+4]=diff;

}

void haar(int input[],float output [],int i_widthstep,int o_widthstep,int o_width,int o_height)
{

    int * d_input;
    float * d_output;

    cudaMalloc(&d_input,i_widthstep*o_height);
    cudaMalloc(&d_output,o_widthstep*o_height);

    cudaMemcpy(d_input,input,i_widthstep*o_height,cudaMemcpyHostToDevice);

    dim3 blocksize(16,16);
    dim3 gridsize;
    gridsize.x=(o_width+blocksize.x-1)/blocksize.x;
    gridsize.y=(o_height+blocksize.y-1)/blocksize.y;

    cal_haar<<<gridsize,blocksize>>>(d_input,d_output,i_widthstep,o_widthstep,o_width,o_height);


    cudaMemcpy(output,d_output,o_widthstep*o_height,cudaMemcpyDeviceToHost);

    cudaFree(d_input);
    cudaFree(d_output);

}

The following is my main function:-

void main()
{
    int in_arr[8]={1,2,3,4,5,6,7,8};
    float out_arr[8];
    int i_widthstep=8*sizeof(int);
    int o_widthstep=8*sizeof(float);
    haar(in_arr,out_arr,i_widthstep,o_widthstep,8,1);

    for(int c=0;c<=7;c++)
    {cout<<out_arr[c]<<endl;}
    cvWaitKey();

}

Can you tell me where I am going wrong that it gives me zeros as output?
Thank you.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-04T11:58:48+00:00Added an answer on June 4, 2026 at 11:58 am

    The problem with your code is the following condition:

    if(x_index>=o_width/2 || y_index>=o_height/2) return;
    

    Given o_height = 1, we have o_height/2 = 0 (o_height is int, so we have integer division here with rounding down), so no threads perform any operations. To achieve what you want you can either do floating-point arithmetics here, or use (o_height+1)/2 and (o_width+1)/2: it would perform division with “arithmetic” rounding (you will have ( x_index >= (8+1)/2 /*= 4*/ && y_index >= (1+1)/2 /*= 1*/ ))

    Besides, there is problem with addressing when you have more than 1 thread in Y-dimension, since then you i_thread_id and o_thread_id calculations would be wrong (_withstep is size in bytes, yet you use it as array index).

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have tried to implement binary insertion sort algorithm. Here is my code: public
I have tried to implement this jQuery active menu code: http://docs.jquery.com/Tutorials:Auto-Selecting_Navigation $(function(){ var path
I have tried to implement this scenario . I have created Code First model,
I have tried to implement Stack Overflow question C++ Data Member Alignment and Array Packing
I have tried to implement css sticky footer on my page but it doesn't
I have been looking around for solutions, and tried to implement what is often
Have tried to find solutions for this and can't really come up with anything.
I have tried this but it does not work (even if I specify .wav
i have tried implement two methods recursive and dynamic method and both took 0
I have tried to implement the Head First Duck problem with Startegy. I am

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.