Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9218421
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T02:51:38+00:00 2026-06-18T02:51:38+00:00

I am trying to compare performance in CPU and GPU. I have CPU :

  • 0

I am trying to compare performance in CPU and GPU. I have

  • CPU : Intel® Core™ i5 CPU M 480 @ 2.67GHz × 4
  • GPU : NVidia GeForce GT 420M

I can confirm that GPU is configured and works correctly with CUDA.

I am implementing Julia set computation. http://en.wikipedia.org/wiki/Julia_set
Basically for every pixel, if the co-ordinate is in the set it will paint it red
else paint it white.

Although, I get identical answer with both CPU and GPU but instead of getting a
performance improvement, I get a performance penalty by using GPU.

Running times

  • CPU : 0.052s
  • GPU : 0.784s

I am aware that transferring data from device to host can take up some time.
But still, how do I know if use of GPU is actually beneficial?

Here is the relevant GPU code

    #include <stdio.h>
    #include <cuda.h>

    __device__ bool isJulia( float x, float y, float maxX_2, float maxY_2 )
    {
        float z_r = 0.8 * (float) (maxX_2 - x) / maxX_2;
        float z_i = 0.8 * (float) (maxY_2 - y) / maxY_2;

        float c_r = -0.8;
        float c_i = 0.156;
        for( int i=1 ; i<100 ; i++ )
        {
        float tmp_r = z_r*z_r - z_i*z_i + c_r;
        float tmp_i = 2*z_r*z_i + c_i;

        z_r = tmp_r;
        z_i = tmp_i;

        if( sqrt( z_r*z_r + z_i*z_i ) > 1000 )
            return false;
        }
        return true;
    }

    __global__ void kernel( unsigned char * im, int dimx, int dimy )
    {
        //int tid = blockIdx.y*gridDim.x + blockIdx.x;
        int tid = blockIdx.x*blockDim.x + threadIdx.x;
        tid *= 3;
        if( isJulia((float)blockIdx.x, (float)threadIdx.x, (float)dimx/2, (float)dimy/2)==true )
        {
        im[tid] = 255;
        im[tid+1] = 0;
        im[tid+2] = 0;
        }
        else
        {
        im[tid] = 255;
        im[tid+1] = 255;
        im[tid+2] = 255;
        }

    }

    int main()
    {
        int dimx=768, dimy=768;

        //on cpu
        unsigned char * im = (unsigned char*) malloc( 3*dimx*dimy );

        //on GPU
        unsigned char * im_dev;

        //allocate mem on GPU
        cudaMalloc( (void**)&im_dev, 3*dimx*dimy ); 

        //launch kernel. 
**for( int z=0 ; z<10000 ; z++ ) // loop for multiple times computation**
{
        kernel<<<dimx,dimy>>>(im_dev, dimx, dimy);
}

        cudaMemcpy( im, im_dev, 3*dimx*dimy, cudaMemcpyDeviceToHost );

        writePPMImage( im, dimx, dimy, 3, "out_gpu.ppm" ); //assume this writes a ppm file

        free( im );
        cudaFree( im_dev );
    }

Here is the CPU code

    bool isJulia( float x, float y, float maxX_2, float maxY_2 )
    {
        float z_r = 0.8 * (float) (maxX_2 - x) / maxX_2;
        float z_i = 0.8 * (float) (maxY_2 - y) / maxY_2;

        float c_r = -0.8;
        float c_i = 0.156;
        for( int i=1 ; i<100 ; i++ )
        {
        float tmp_r = z_r*z_r - z_i*z_i + c_r;
        float tmp_i = 2*z_r*z_i + c_i;

        z_r = tmp_r;
        z_i = tmp_i;

        if( sqrt( z_r*z_r + z_i*z_i ) > 1000 )
            return false;
        }
        return true;
    }


    #include <stdlib.h>
    #include <stdio.h>

    int main(void)
    {
      const int dimx = 768, dimy = 768;
      int i, j;

      unsigned char * data = new unsigned char[dimx*dimy*3];

**for( int z=0 ; z<10000 ; z++ ) // loop for multiple times computation**
{
      for (j = 0; j < dimy; ++j)
      {
        for (i = 0; i < dimx; ++i)
        {
          if( isJulia(i,j,dimx/2,dimy/2) == true )
          {
          data[3*j*dimx + 3*i + 0] = (unsigned char)255;  /* red */
          data[3*j*dimx + 3*i + 1] = (unsigned char)0;  /* green */
          data[3*j*dimx + 3*i + 2] = (unsigned char)0;  /* blue */
          }
          else
          {
          data[3*j*dimx + 3*i + 0] = (unsigned char)255;  /* red */
          data[3*j*dimx + 3*i + 1] = (unsigned char)255;  /* green */
          data[3*j*dimx + 3*i + 2] = (unsigned char)255;  /* blue */
          }
        }
      }
}

      writePPMImage( data, dimx, dimy, 3, "out_cpu.ppm" ); //assume this writes a ppm file
      delete [] data


      return 0;
    }

Further, following suggestions from @hyde I have looped the computation-only part to generate 10,000 images. I am not bothering to write all those images though. Computation only is what I am doing.

Here are the running times

  • CPU : more than 10min and code still running
  • GPU : 1m 14.765s
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T02:51:39+00:00Added an answer on June 18, 2026 at 2:51 am

    Turning comments to answer:

    To get relevant figures, you needs to calculate more than one image, so that execution time is seconds or tens of seconds at least. Also, including file saving time in results is going to add noise and hide the actual CPU vs GPU difference.

    Another way to get real results is to select a Julia set which has lot points belonging to the set, then upping the iteration count so high it takes many seconds to calculate just one image. Then there is only one single calculation setup, so this is likely to be the most advantageous scenario for GPU/CUDA.

    To measure how much overhead there is, change image size to 1×1 and iteration limit 1, and then calculate enough images that it takes at least a few seconds. In this scenario, GPU is likely significantly slower.

    To get most relevant timings for your use case, select image size and iteration count you are really going to use, and then measure the image count, where both versions are equally fast. That will give you a rough rule-of-thumb to decide which you should use when.

    Alternative approach for practical results, if you are going to get just one image: find the iteration limit for single worst-case image, where CPU and GPU are equally fast. If that many or more iterations would be advantageous, choose GPU, otherwise choose CPU.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to compare GPU to CPU performance. For the NVIDIA GPU I've been
I am trying to compare the performance of several Javascript libraries. Although measuring transaction
I am trying to see if the performance of both can be compared based
I am trying to compare two tools execution time, which I have installed in
I am trying to compare the performance of a specific F# benchmark running on
I am trying to compare the performance of Haskell lists and arrays and run
I am trying to compare how an i7 dual core 2.7Ghz would perform vs.
I am trying to compare performance between raw pointers, boost shared_ptr and boost weak_ptr.
I am trying to compare an actual portfolio's performance to the performances of hypothetical
I am trying to compare the hg remote operation performance (pull, push, clone) using

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.