Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8559307
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T16:00:42+00:00 2026-06-11T16:00:42+00:00

I am performing a benchmark like show below CHECK( context = clCreateContext(props, 1, &device,

  • 0

I am performing a benchmark like show below

CHECK( context = clCreateContext(props, 1, &device, NULL, NULL, &_err); );
CHECK( queue = clCreateCommandQueue(context, device, 0, &_err); );
#define SYNC() clFinish(queue)
#define LAUNCH(glob, loc, kernel) OCL(clEnqueueNDRangeKernel(queue, kernel, 2,\
                                                             NULL, glob, loc,\
                                                             0, NULL, NULL))

/* Build program, set arguments over here */


START;
for (int i = 0; i < iter; i++) {
    LAUNCH(global, local, plus_kernel);
}
SYNC();
STOP;
printf("Time taken (plus) : %lf\n", uSec / iter);

START;
for (int i = 0; i < iter; i++) {
    LAUNCH(global, local, minus_kernel);
}
SYNC();
STOP;
printf("Time taken (minus): %lf\n", uSec / iter);

START;
for (int i = 0; i < iter; i++) {
    LAUNCH(global, local, plus_kernel);
    LAUNCH(global, local, minus_kernel);
}
SYNC();
STOP;
printf("Time taken (both) : %lf\n", uSec / iter);

The results look weird:

Time taken (plus) : 31.450000
Time taken (minus): 28.120000
Time taken (both) : 2256.380000

START, and STOP are just macros that start and stop a timer.
Here are the relevant macros.

I am not sure why queuing up is the kernels is slowing them down (and only on AMD GPUs)!

EDIT I am using Radeon 7970

EDIT Both kernels are operating on independent memory. Also here is the system information.

OS: Ubuntu 11.10

fglrxinfo:

display: :0  screen: 0
OpenGL vendor string: Advanced Micro Devices, Inc.
OpenGL renderer string: AMD Radeon HD 7900 Series 
OpenGL version string: 4.2.11762 Compatibility Profile Context
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T16:00:44+00:00Added an answer on June 11, 2026 at 4:00 pm

    I think the answer has to do with caching of data on newer GPUs (Specifically the Radeon 7970, which uses the Graphics Compute Next (GCN) architecture.

    One of the advantages of this architecture is it’s caching capabilities (somewhat close to CPU caching at this point). If you perform calls like this:

    PLUS
    PLUS 
    PLUS
    ....
    

    Then the memory that is resident in the inner caches of the GPU. On the other hand if you make calls like this:

    PLUS
    MINUS
    PLUS 
    MINUS
    ...
    

    Where the two kernels have different memory objects associated with them, then the data is kicked out of the hardware devices on each CU, causing a need for them to be brought in from the very sluggish global memory.

    Two easy ways to test if this is the case:

    1. Run only Pluses with varying numbers of iterations. As the number of iterations increases, the average time will go down because the cost of the first run (which brings the data in) is amortized. Also, you should notice that all calls after the first should be relatively equal.

    2. Make the Plus and Minus kernels run on the same memory objects. If the reason for the slowdown is because of the caching of memory objects, then the overall run time should be the average of the individual running times of PLUS and MINUS (depending perhaps on experiment 1).

    Let me know if you find out if this is actually the case!

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

While performing a refactoring, I ended up creating a method like the example below.
When performing many inserts into a database I would usually have code like this:
I'm performing a simulation of a simple queue using SimPy . One of the
When performing a block like: <% @user.favoured_user.each do |user| %> <li><%= user.name %></li> <%
Performing a check to see whether or not a user is attending or not.
Im performing an ajax query to check the name of a car in a
After performing EXPLAIN on a query: explain select name from t1 where name like
When performing a query like: select count(*) from myTextTable where tsv @@ plainto_tsquery('english', 'TERM');
I would like to benchmark a website that our company is developing. It will
While performing a check if there's a camera present and enabled on my windows

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.