Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8154999
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 6, 20262026-06-06T16:32:16+00:00 2026-06-06T16:32:16+00:00

I am to learn OpenCL and wanted to start out easy. I found and

  • 0

I am to learn OpenCL and wanted to start out easy.

I found and modified this hello world example abit but nothing essential.
http://developer.apple.com/library/mac/#samplecode/OpenCL_Hello_World_Example/Introduction/Intro.html

//
// File:       hello.c
//
// Abstract:   A simple "Hello World" compute example showing basic usage of OpenCL which
//             calculates the mathematical square (X[i] = pow(X[i],2)) for a buffer of
//             floating point values.
//             
//
// Version:    <1.0>
//
// Disclaimer: IMPORTANT:  This Apple software is supplied to you by Apple Inc. ("Apple")
//             in consideration of your agreement to the following terms, and your use,
//             installation, modification or redistribution of this Apple software
//             constitutes acceptance of these terms.  If you do not agree with these
//             terms, please do not use, install, modify or redistribute this Apple
//             software.
//
//             In consideration of your agreement to abide by the following terms, and
//             subject to these terms, Apple grants you a personal, non - exclusive
//             license, under Apple's copyrights in this original Apple software ( the
//             "Apple Software" ), to use, reproduce, modify and redistribute the Apple
//             Software, with or without modifications, in source and / or binary forms;
//             provided that if you redistribute the Apple Software in its entirety and
//             without modifications, you must retain this notice and the following text
//             and disclaimers in all such redistributions of the Apple Software. Neither
//             the name, trademarks, service marks or logos of Apple Inc. may be used to
//             endorse or promote products derived from the Apple Software without specific
//             prior written permission from Apple.  Except as expressly stated in this
//             notice, no other rights or licenses, express or implied, are granted by
//             Apple herein, including but not limited to any patent rights that may be
//             infringed by your derivative works or by other works in which the Apple
//             Software may be incorporated.
//
//             The Apple Software is provided by Apple on an "AS IS" basis.  APPLE MAKES NO
//             WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION THE IMPLIED
//             WARRANTIES OF NON - INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A
//             PARTICULAR PURPOSE, REGARDING THE APPLE SOFTWARE OR ITS USE AND OPERATION
//             ALONE OR IN COMBINATION WITH YOUR PRODUCTS.
//
//             IN NO EVENT SHALL APPLE BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL OR
//             CONSEQUENTIAL DAMAGES ( INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
//             SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
//             INTERRUPTION ) ARISING IN ANY WAY OUT OF THE USE, REPRODUCTION, MODIFICATION
//             AND / OR DISTRIBUTION OF THE APPLE SOFTWARE, HOWEVER CAUSED AND WHETHER
//             UNDER THEORY OF CONTRACT, TORT ( INCLUDING NEGLIGENCE ), STRICT LIABILITY OR
//             OTHERWISE, EVEN IF APPLE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//
// Copyright ( C ) 2008 Apple Inc. All Rights Reserved.
//

////////////////////////////////////////////////////////////////////////////////

#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
//#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <CL/opencl.h>

#define Warning(...)    fprintf(stderr, __VA_ARGS__)
////////////////////////////////////////////////////////////////////////////////

// Use a static data size for simplicity
//
#define DATA_SIZE (1024)

////////////////////////////////////////////////////////////////////////////////

// Simple compute kernel which computes the square of an input array 
//
const char *KernelSource = "\n" \
"__kernel void square(                                                  \n" \
"   __global float* input,                                              \n" \
"   __global float* output,                                             \n" \
"   const unsigned int count)                                           \n" \
"{                                                                      \n" \
"   int i = get_global_id(0);                                           \n" \
"   if(i < count)                                                       \n" \
"       output[i] = input[i] * input[i];                                \n" \
"}                                                                      \n" \
"\n";

////////////////////////////////////////////////////////////////////////////////


/*
 * CLErrString --
 *
 *      Utility function that converts an OpenCL status into a human
 *      readable string.
 *
 * Results:
 *      const char * pointer to a static string.
 */

static const char *
CLErrString(cl_int status) {
   static struct { cl_int code; const char *msg; } error_table[] = {
      { CL_SUCCESS, "success" },
      { CL_DEVICE_NOT_FOUND, "device not found", },
      { CL_DEVICE_NOT_AVAILABLE, "device not available", },
      { CL_COMPILER_NOT_AVAILABLE, "compiler not available", },
      { CL_MEM_OBJECT_ALLOCATION_FAILURE, "mem object allocation failure", },
      { CL_OUT_OF_RESOURCES, "out of resources", },
      { CL_OUT_OF_HOST_MEMORY, "out of host memory", },
      { CL_PROFILING_INFO_NOT_AVAILABLE, "profiling not available", },
      { CL_MEM_COPY_OVERLAP, "memcopy overlaps", },
      { CL_IMAGE_FORMAT_MISMATCH, "image format mismatch", },
      { CL_IMAGE_FORMAT_NOT_SUPPORTED, "image format not supported", },
      { CL_BUILD_PROGRAM_FAILURE, "build program failed", },
      { CL_MAP_FAILURE, "map failed", },
      { CL_INVALID_VALUE, "invalid value", },
      { CL_INVALID_DEVICE_TYPE, "invalid device type", },
      { 0, NULL },
   };
   static char unknown[25];
   int ii;

   for (ii = 0; error_table[ii].msg != NULL; ii++) {
      if (error_table[ii].code == status) {
         return error_table[ii].msg;
      }
   }

   _snprintf(unknown, sizeof unknown, "unknown error %d", status);
   return unknown;
}
/*
 * PrintDevice --
 *
 *      Dumps everything about the given device ID.
 *
 * Results:
 *      void.
 */

static void
PrintDevice(cl_device_id device) {
#define LONG_PROPS \
  defn(VENDOR_ID), \
  defn(MAX_COMPUTE_UNITS), \
  defn(MAX_WORK_ITEM_DIMENSIONS), \
  defn(MAX_WORK_GROUP_SIZE), \
  defn(PREFERRED_VECTOR_WIDTH_CHAR), \
  defn(PREFERRED_VECTOR_WIDTH_SHORT), \
  defn(PREFERRED_VECTOR_WIDTH_INT), \
  defn(PREFERRED_VECTOR_WIDTH_LONG), \
  defn(PREFERRED_VECTOR_WIDTH_FLOAT), \
  defn(PREFERRED_VECTOR_WIDTH_DOUBLE), \
  defn(MAX_CLOCK_FREQUENCY), \
  defn(ADDRESS_BITS), \
  defn(MAX_MEM_ALLOC_SIZE), \
  defn(IMAGE_SUPPORT), \
  defn(MAX_READ_IMAGE_ARGS), \
  defn(MAX_WRITE_IMAGE_ARGS), \
  defn(IMAGE2D_MAX_WIDTH), \
  defn(IMAGE2D_MAX_HEIGHT), \
  defn(IMAGE3D_MAX_WIDTH), \
  defn(IMAGE3D_MAX_HEIGHT), \
  defn(IMAGE3D_MAX_DEPTH), \
  defn(MAX_SAMPLERS), \
  defn(MAX_PARAMETER_SIZE), \
  defn(MEM_BASE_ADDR_ALIGN), \
  defn(MIN_DATA_TYPE_ALIGN_SIZE), \
  defn(GLOBAL_MEM_CACHELINE_SIZE), \
  defn(GLOBAL_MEM_CACHE_SIZE), \
  defn(GLOBAL_MEM_SIZE), \
  defn(MAX_CONSTANT_BUFFER_SIZE), \
  defn(MAX_CONSTANT_ARGS), \
  defn(LOCAL_MEM_SIZE), \
  defn(ERROR_CORRECTION_SUPPORT), \
  defn(PROFILING_TIMER_RESOLUTION), \
  defn(ENDIAN_LITTLE), \
  defn(AVAILABLE), \
  defn(COMPILER_AVAILABLE),

#define STR_PROPS \
  defn(NAME), \
  defn(VENDOR), \
  defn(PROFILE), \
  defn(VERSION), \
  defn(EXTENSIONS),

#define HEX_PROPS \
   defn(SINGLE_FP_CONFIG), \
   defn(QUEUE_PROPERTIES),


/* XXX For completeness, it'd be nice to dump this one, too. */
#define WEIRD_PROPS \
   CL_DEVICE_MAX_WORK_ITEM_SIZES,

   static struct { cl_device_info param; const char *name; } longProps[] = {
#define defn(X) { CL_DEVICE_##X, #X }
      LONG_PROPS
#undef defn
      { 0, NULL },
   };
   static struct { cl_device_info param; const char *name; } hexProps[] = {
#define defn(X) { CL_DEVICE_##X, #X }
      HEX_PROPS
#undef defn
      { 0, NULL },
   };
   static struct { cl_device_info param; const char *name; } strProps[] = {
#define defn(X) { CL_DEVICE_##X, #X }
      STR_PROPS
#undef defn
      { CL_DRIVER_VERSION, "DRIVER_VERSION" },
      { 0, NULL },
   };
   cl_int status;
   size_t size;
   char buf[65536];
   long long val; /* Avoids unpleasant surprises for some params */
   int ii;


   for (ii = 0; strProps[ii].name != NULL; ii++) {
      status = clGetDeviceInfo(device, strProps[ii].param, sizeof buf, buf, &size);
      if (status != CL_SUCCESS) {
         Warning("\tdevice[%p]: Unable to get %s: %s!\n",
                 device, strProps[ii].name, CLErrString(status));
         continue;
      }
      if (size > sizeof buf) {
         Warning("\tdevice[%p]: Large %s (%d bytes)!  Truncating to %d!\n",
                 device, strProps[ii].name, size, sizeof buf);
      }
      printf("\tdevice[%p]: %s: %s\n",
             device, strProps[ii].name, buf);
   }
   printf("\n");

   status = clGetDeviceInfo(device, CL_DEVICE_TYPE, sizeof val, &val, NULL);
   if (status == CL_SUCCESS) {
      printf("\tdevice[%p]: Type: ", device);
      if (val & CL_DEVICE_TYPE_DEFAULT) {
         val &= ~CL_DEVICE_TYPE_DEFAULT;
         printf("Default ");
      }
      if (val & CL_DEVICE_TYPE_CPU) {
         val &= ~CL_DEVICE_TYPE_CPU;
         printf("CPU ");
      }
      if (val & CL_DEVICE_TYPE_GPU) {
         val &= ~CL_DEVICE_TYPE_GPU;
         printf("GPU ");
      }
      if (val & CL_DEVICE_TYPE_ACCELERATOR) {
         val &= ~CL_DEVICE_TYPE_ACCELERATOR;
         printf("Accelerator ");
      }
      if (val != 0) {
         printf("Unknown (0x%llx) ", val);
      }
      printf("\n");
   } else {
      Warning("\tdevice[%p]: Unable to get TYPE: %s!\n",
              device, CLErrString(status));
   }

   status = clGetDeviceInfo(device, CL_DEVICE_EXECUTION_CAPABILITIES,
                            sizeof val, &val, NULL);
   if (status == CL_SUCCESS) {
      printf("\tdevice[%p]: EXECUTION_CAPABILITIES: ", device);
      if (val & CL_EXEC_KERNEL) {
         val &= ~CL_EXEC_KERNEL;
         printf("Kernel ");
      }
      if (val & CL_EXEC_NATIVE_KERNEL) {
         val &= ~CL_EXEC_NATIVE_KERNEL;
         printf("Native ");
      }
      if (val) {
         printf("Unknown (0x%llx) ", val);
      }
      printf("\n");
   } else {
      Warning("\tdevice[%p]: Unable to get EXECUTION_CAPABILITIES: %s!\n",
              device, CLErrString(status));
   }

   status = clGetDeviceInfo(device, CL_DEVICE_GLOBAL_MEM_CACHE_TYPE,
                            sizeof val, &val, NULL);
   if (status == CL_SUCCESS) {
      static const char *cacheTypes[] = { "None", "Read-Only", "Read-Write" };
      static int numTypes = sizeof cacheTypes / sizeof cacheTypes[0];

      printf("\tdevice[%p]: GLOBAL_MEM_CACHE_TYPE: %s (%lld)\n",
             device, val < numTypes ? cacheTypes[val] : "???", val);
   } else {
      Warning("\tdevice[%p]: Unable to get GLOBAL_MEM_CACHE_TYPE: %s!\n",
              device, CLErrString(status));
   }
   status = clGetDeviceInfo(device,
                            CL_DEVICE_LOCAL_MEM_TYPE, sizeof val, &val, NULL);
   if (status == CL_SUCCESS) {
      static const char *lmemTypes[] = { "???", "Local", "Global" };
      static int numTypes = sizeof lmemTypes / sizeof lmemTypes[0];

      printf("\tdevice[%p]: CL_DEVICE_LOCAL_MEM_TYPE: %s (%lld)\n",
             device, val < numTypes ? lmemTypes[val] : "???", val);
   } else {
      Warning("\tdevice[%p]: Unable to get CL_DEVICE_LOCAL_MEM_TYPE: %s!\n",
              device, CLErrString(status));
   }

   for (ii = 0; hexProps[ii].name != NULL; ii++) {
      status = clGetDeviceInfo(device, hexProps[ii].param, sizeof val, &val, &size);
      if (status != CL_SUCCESS) {
         Warning("\tdevice[%p]: Unable to get %s: %s!\n",
                 device, hexProps[ii].name, CLErrString(status));
         continue;
      }
      if (size > sizeof val) {
         Warning("\tdevice[%p]: Large %s (%d bytes)!  Truncating to %d!\n",
                 device, hexProps[ii].name, size, sizeof val);
      }
      printf("\tdevice[%p]: %s: 0x%llx\n",
             device, hexProps[ii].name, val);
   }
   printf("\n");

   for (ii = 0; longProps[ii].name != NULL; ii++) {
      status = clGetDeviceInfo(device, longProps[ii].param, sizeof val, &val, &size);
      if (status != CL_SUCCESS) {
         Warning("\tdevice[%p]: Unable to get %s: %s!\n",
                 device, longProps[ii].name, CLErrString(status));
         continue;
      }
      if (size > sizeof val) {
         Warning("\tdevice[%p]: Large %s (%d bytes)!  Truncating to %d!\n",
                 device, longProps[ii].name, size, sizeof val);
      }
      printf("\tdevice[%p]: %s: %lld\n",
             device, longProps[ii].name, val);
   }
}
/*
 * PrintPlatform --
 *
 *      Dumps everything about the given platform ID.
 *
 * Results:
 *      void.
 */

static void
PrintPlatform(cl_platform_id platform) {
   static struct { cl_platform_info param; const char *name; } props[] = {
      { CL_PLATFORM_PROFILE, "profile" },
      { CL_PLATFORM_VERSION, "version" },
      { CL_PLATFORM_NAME, "name" },
      { CL_PLATFORM_VENDOR, "vendor" },
      { CL_PLATFORM_EXTENSIONS, "extensions" },
      { 0, NULL },
   };
   cl_device_id *deviceList;
   cl_uint numDevices;
   cl_int status;
   char buf[65536];
   size_t size;
   int ii;

   for (ii = 0; props[ii].name != NULL; ii++) {
      status = clGetPlatformInfo(platform, props[ii].param, sizeof buf, buf, &size);
      if (status != CL_SUCCESS) {
         Warning("platform[%p]: Unable to get %s: %s\n",
                 platform, props[ii].name, CLErrString(status));
         continue;
      }
      if (size > sizeof buf) {
         Warning("platform[%p]: Huge %s (%d bytes)!  Truncating to %d\n",
                 platform, props[ii].name, size, sizeof buf);
      }
      printf("platform[%p]: %s: %s\n", platform, props[ii].name, buf);
   }

   if ((status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL,
                                0, NULL, &numDevices)) != CL_SUCCESS) {
      Warning("platform[%p]: Unable to query the number of devices: %s\n",
              platform, CLErrString(status));
      return;
   }
   printf("platform[%p]: Found %d device(s).\n", platform, numDevices);

   deviceList = (cl_device_id *)malloc(numDevices * sizeof(cl_device_id));
   if ((status = clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL,
                                numDevices, deviceList, NULL)) != CL_SUCCESS) {
      Warning("platform[%p]: Unable to enumerate the devices: %s\n",
              platform, CLErrString(status));
      free(deviceList);
      return;
   }

   for (ii = 0; ii < numDevices; ii++) {
      PrintDevice(deviceList[ii]);
   }

   free(deviceList);
}


int main(int argc, char** argv)
{
    int err;                            // error code returned from api calls

    float data[DATA_SIZE];              // original data set given to device
    float results[DATA_SIZE];           // results returned from device
    unsigned int correct;               // number of correct results returned

    size_t global;                      // global domain size for our calculation
    size_t local;                       // local domain size for our calculation

    cl_platform_id platform;            // compute platform id
    cl_uint numPlatforms;
    cl_platform_id *platformList;
    cl_int status;

    cl_device_id device_id;             // compute device id 
    cl_context context;                 // compute context
    cl_command_queue commands;          // compute command queue
    cl_program program;                 // compute program
    cl_kernel kernel;                   // compute kernel

    cl_mem input;                       // device memory used for the input array
    cl_mem output;                      // device memory used for the output array

    // Fill our data set with random float values
    //
    int i = 0;
    unsigned int count = DATA_SIZE;
    for(i = 0; i < count; i++)
    {
        data[i] = rand() / (float)RAND_MAX;
    //  printf("%f\n",data[i]);
    }


    err = clGetPlatformIDs(1, &platform, &numPlatforms);            
    if(err < 0) {          
        perror("Couldn't find any platforms");
          return EXIT_FAILURE;
    }
    printf("Found %d platform(s).\n", numPlatforms);

    platformList = (cl_platform_id *) malloc(sizeof(cl_platform_id) * numPlatforms);
    if ((status = clGetPlatformIDs(numPlatforms, platformList, NULL)) != CL_SUCCESS) {
       Warning("Unable to enumerate the platforms: %s\n",
               CLErrString(status));
       exit(1);
    }

    int ii;
    for (ii = 0; ii < numPlatforms; ii++) {
       PrintPlatform(platformList[ii]);
    }


    // Connect to a compute device
    //
    int gpu = 1;
    err = clGetDeviceIDs(platformList[0], gpu ? CL_DEVICE_TYPE_GPU : CL_DEVICE_TYPE_CPU, 1, &device_id, NULL);
    if (err != CL_SUCCESS)
    {
        printf("Error: Failed to create a device group!\n");
        return EXIT_FAILURE;
    }



    // Create a compute context 
    //
    context = clCreateContext(0, 1, &device_id, NULL, NULL, &err);
    if (!context)
    {
        printf("Error: Failed to create a compute context!\n");
        return EXIT_FAILURE;
    }

    // Create a command commands
    //
    commands = clCreateCommandQueue(context, device_id, 0, &err);
    if (!commands)
    {
        printf("Error: Failed to create a command commands!\n");
        return EXIT_FAILURE;
    }

    // Create the compute program from the source buffer
    //
    program = clCreateProgramWithSource(context, 1, (const char **) & KernelSource, NULL, &err);
    if (!program)
    {
        printf("Error: Failed to create compute program!\n");
        return EXIT_FAILURE;
    }

    // Build the program executable
    //
    err = clBuildProgram(program, 0, NULL, NULL, NULL, NULL);
    if (err != CL_SUCCESS)
    {
        size_t len;
        char buffer[2048];

        printf("Error: Failed to build program executable!\n");
        clGetProgramBuildInfo(program, device_id, CL_PROGRAM_BUILD_LOG, sizeof(buffer), buffer, &len);
        printf("%s\n", buffer);
        exit(1);
    }

    // Create the compute kernel in the program we wish to run
    //
    kernel = clCreateKernel(program, "square", &err);
    if (!kernel || err != CL_SUCCESS)
    {
        printf("Error: Failed to create compute kernel!\n");
        exit(1);
    }

    // Create the input and output arrays in device memory for our calculation
    //
    input = clCreateBuffer(context,  CL_MEM_READ_ONLY,  sizeof(float) * count, NULL, NULL);
    output = clCreateBuffer(context, CL_MEM_WRITE_ONLY, sizeof(float) * count, NULL, NULL);
    if (!input || !output)
    {
        printf("Error: Failed to allocate device memory!\n");
        exit(1);
    }    

    // Write our data set into the input array in device memory 
    //
    err = clEnqueueWriteBuffer(commands, input, CL_TRUE, 0, sizeof(float) * count, data, 0, NULL, NULL);
    if (err != CL_SUCCESS)
    {
        printf("Error: Failed to write to source array!\n");
        exit(1);
    }

    // Set the arguments to our compute kernel
    //
    err = 0;
    err  = clSetKernelArg(kernel, 0, sizeof(cl_mem), &input);
    err |= clSetKernelArg(kernel, 1, sizeof(cl_mem), &output);
    err |= clSetKernelArg(kernel, 2, sizeof(unsigned int), &count);
    if (err != CL_SUCCESS)
    {
        printf("Error: Failed to set kernel arguments! %d\n", err);
        exit(1);
    }

    // Get the maximum work group size for executing the kernel on the device
    //
    err = clGetKernelWorkGroupInfo(kernel, device_id, CL_KERNEL_WORK_GROUP_SIZE, sizeof(local), &local, NULL);
    if (err != CL_SUCCESS)
    {
        printf("Error: Failed to retrieve kernel work group info! %d\n", err);
        exit(1);
    }

    // Execute the kernel over the entire range of our 1d input data set
    // using the maximum number of work group items for this device
    //
    global = count;
    err = clEnqueueNDRangeKernel(commands, kernel, 1, NULL, &global, &local, 0, NULL, NULL);
    if (err)
    {
        printf("Error: Failed to execute kernel!\n");
        return EXIT_FAILURE;
    }

    // Wait for the command commands to get serviced before reading back results
    //
    clFinish(commands);

    // Read back the results from the device to verify the output
    //
    err = clEnqueueReadBuffer( commands, output, CL_TRUE, 0, sizeof(float) * count, results, 0, NULL, NULL );  
    if (err != CL_SUCCESS)
    {
        printf("Error: Failed to read output array! %d\n", err);
        exit(1);
    }

    // Validate our results
    //
    correct = 0;
    for(i = 0; i < count; i++)
    {
        //printf("%4f %4.10f %4.10f \n", data[i],data[i] * data[i], results[i]);
        if((results[i] - data[i] * data[i]) < 10e-8)
            correct++;
    }

    // Print a brief summary detailing the results
    //
    printf("Computed '%d/%d' correct values!\n", correct, count);

    // Shutdown and cleanup
    //
    free(platformList);
    clReleaseMemObject(input);
    clReleaseMemObject(output);
    clReleaseProgram(program);
    clReleaseKernel(kernel);
    clReleaseCommandQueue(commands);
    clReleaseContext(context);

    return 0;
}

Im running on a Geforce 8800 GTS .

F:\Documents\Dropbox\05030 - OpenCL\Samples\Debug>HelloWorld.exe
Found 1 platform(s).
platform[012C6F88]: profile: FULL_PROFILE
platform[012C6F88]: version: OpenCL 1.1 CUDA 4.2.1
platform[012C6F88]: name: NVIDIA CUDA
platform[012C6F88]: vendor: NVIDIA Corporation
platform[012C6F88]: extensions: cl_khr_byte_addressable_store cl_khr_icd cl_khr_
gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_nv_d3d
11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unro
ll
platform[012C6F88]: Found 1 device(s).
        device[012C6FF0]: NAME: GeForce 8800 GTS
        device[012C6FF0]: VENDOR: NVIDIA Corporation
        device[012C6FF0]: PROFILE: FULL_PROFILE
        device[012C6FF0]: VERSION: OpenCL 1.0 CUDA
        device[012C6FF0]: EXTENSIONS: cl_khr_byte_addressable_store cl_khr_icd c
l_khr_gl_sharing cl_nv_d3d9_sharing cl_nv_d3d10_sharing cl_khr_d3d10_sharing cl_
nv_d3d11_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragm
a_unroll
        device[012C6FF0]: DRIVER_VERSION: 301.42

        device[012C6FF0]: Type: GPU
        device[012C6FF0]: EXECUTION_CAPABILITIES: Kernel
        device[012C6FF0]: GLOBAL_MEM_CACHE_TYPE: None (0)
        device[012C6FF0]: CL_DEVICE_LOCAL_MEM_TYPE: Local (1)
        device[012C6FF0]: SINGLE_FP_CONFIG: 0x3e
        device[012C6FF0]: QUEUE_PROPERTIES: 0x3

        device[012C6FF0]: VENDOR_ID: 4318
        device[012C6FF0]: MAX_COMPUTE_UNITS: 12
        device[012C6FF0]: MAX_WORK_ITEM_DIMENSIONS: 3
        device[012C6FF0]: MAX_WORK_GROUP_SIZE: 512
        device[012C6FF0]: PREFERRED_VECTOR_WIDTH_CHAR: 1
        device[012C6FF0]: PREFERRED_VECTOR_WIDTH_SHORT: 1
        device[012C6FF0]: PREFERRED_VECTOR_WIDTH_INT: 1
        device[012C6FF0]: PREFERRED_VECTOR_WIDTH_LONG: 1
        device[012C6FF0]: PREFERRED_VECTOR_WIDTH_FLOAT: 1
        device[012C6FF0]: PREFERRED_VECTOR_WIDTH_DOUBLE: 0
        device[012C6FF0]: MAX_CLOCK_FREQUENCY: 1188
        device[012C6FF0]: ADDRESS_BITS: 32
        device[012C6FF0]: MAX_MEM_ALLOC_SIZE: 134217728
        device[012C6FF0]: IMAGE_SUPPORT: 1
        device[012C6FF0]: MAX_READ_IMAGE_ARGS: 128
        device[012C6FF0]: MAX_WRITE_IMAGE_ARGS: 8
        device[012C6FF0]: IMAGE2D_MAX_WIDTH: 4096
        device[012C6FF0]: IMAGE2D_MAX_HEIGHT: 16383
        device[012C6FF0]: IMAGE3D_MAX_WIDTH: 2048
        device[012C6FF0]: IMAGE3D_MAX_HEIGHT: 2048
        device[012C6FF0]: IMAGE3D_MAX_DEPTH: 2048
        device[012C6FF0]: MAX_SAMPLERS: 16
        device[012C6FF0]: MAX_PARAMETER_SIZE: 4352
        device[012C6FF0]: MEM_BASE_ADDR_ALIGN: 2048
        device[012C6FF0]: MIN_DATA_TYPE_ALIGN_SIZE: 128
        device[012C6FF0]: GLOBAL_MEM_CACHELINE_SIZE: 0
        device[012C6FF0]: GLOBAL_MEM_CACHE_SIZE: 0
        device[012C6FF0]: GLOBAL_MEM_SIZE: 335544320
        device[012C6FF0]: MAX_CONSTANT_BUFFER_SIZE: 65536
        device[012C6FF0]: MAX_CONSTANT_ARGS: 9
        device[012C6FF0]: LOCAL_MEM_SIZE: 16384
        device[012C6FF0]: ERROR_CORRECTION_SUPPORT: 0
        device[012C6FF0]: PROFILING_TIMER_RESOLUTION: 1000
        device[012C6FF0]: ENDIAN_LITTLE: 1
        device[012C6FF0]: AVAILABLE: 1
        device[012C6FF0]: COMPILER_AVAILABLE: 1

The results of the hello world example was : Computed ‘0/1024’ correct values!
Changing abit around, such it doest compare the results as

result[i] == data[i]*data[i] 

but

(result[i] - data[i]*data[i] ) < 10e-6

i get all values correct. I now wonder for the reasons. All data are stored on the host as:

float data[DATA_SIZE];              // original data set given to device
float results[DATA_SIZE];           // results returned from device

I am compiling using visual studio 2010 and wonder how two floats a == b are evaluated, what precision do it need to have to be equal.

Should i always assume that the calculations on my GPU is correct for difference < 10e-8 ? Will this be different with a newer card? I have ordered a HD 7970.

Last, if anyone have some really nice get started guides, i am interested.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-06T16:32:18+00:00Added an answer on June 6, 2026 at 4:32 pm

    Heres a greate answer for the question.

    Summary: The cpu stores the values with higher bit(80bit) precision than float (32bit) when doing intermediate operations as result[i] < in[i]*in[i].

    OpenCL Floating point precision

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

i would like to start stating that i know nothing of OpenCL/GPU programming but
I learn openCL and I try to testdimension, but they doesn't work for me.
I recently started to learn how to use openCL to speed up some part
I have learn OpenGL programming for some time.And I found a very strange phenomenon
Okay so I'm kind of new with OpenGL, but I thought I should learn
To learn the basics of OCaml, I'm solving one of the easy facebook engineering
I learn best by taking apart something that already does something and figuring out
I'm trying to learn OpenGL ES quickly (I know, I know, but these are
I want to start learning OpenGL but I don't really want to have to
I am trying to learn OpenGL on the iPhone using the Super Bible but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.