Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8556351
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 11, 20262026-06-11T15:19:44+00:00 2026-06-11T15:19:44+00:00

I’m planning on diving into OpenCL and have been reading (only surface knowledge) on

  • 0

I’m planning on diving into OpenCL and have been reading (only surface knowledge) on what OpenCL can do, but have a few questions.

Let’s say I have an AMD Radeon 7750 and I have another computer that has an AMD Radeon 5870 and no plans on using a computer with an Nvidia card. I heard that optimizing the code for a particular device brings performance benefits. What exactly does optimizing mean? From what I’ve read and a little bit of guessing, it sounds like it means writing the code in a way that a GPU likes (in general without concern that it’s an AMD or Nvidia card) as well as in a way that matches how the graphics card handles memory (I’m guessing this is compute device specific? Or is this only brand specific?).

So if I write code and optimized it for the Radeon 7750, would I be able to bring that code to the other computer with the Radeon 5870 and, without changing any part of the code, still retain a reasonable amount of performance benefits from the optimization? In the event that the code doesn’t work, would changing parts of the code be a minor issue or would it involve rewriting enough code that it would have been a better idea to have written an optimized code for the Radeon 5870 in the first place.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-11T15:19:45+00:00Added an answer on June 11, 2026 at 3:19 pm

    Without more information about the algorithms and applications you intend to write, the question is a little vague. But I think I can give you some high-level strategies to keep in mind as you develop your code for these two different platforms.

    The Radeon 7750’s design is of the new Graphics Core Next architecture, while your HD5780 is based on the older VLIW5 (RV770) Architecture.

    For your code to perform well on the HD5780 hardware you must make as heavy use of the packed primitive datatypes as possible, especially the int4, float4 types. This is because the OpenCL compiler has a difficult time automatically discovering parallelism and packing data into the wide vectors for you. If you can structure your code so that you already have taken this into account, then you will be able to fill more of the VLIW-5 slots and thus use more of your stream processors.

    GCN is more like NVidia’s Fermi architecture, where the code’s path to the functional units (ALUs, etc.) of the stream processors does not go through explicitly scheduled VLIW instructions. So more parallelism can be automatically detected at runtime and keep your functional units busy doing useful work without you having to think as hard about how to make that happen.

    Here’s an over-simplified example to illustrate my point:

    // multiply four factors
    // A[0] = B[0] * C[0]
    // ...
    // A[3] = B[3] * C[3];
    
    float *A, *B, *C;
    
    for (i = 0; i < 4; i ++) {
      A[i] = B[i] * C[i];
    }
    

    That code will probably run ok on a GCN architecture (except for suboptimal memory access performance–an advanced topic). But on your HD5870 it would be a disaster, because those four multiplies would take up 4 VLIW5 instructions instead of 1! So you would write that above code using the float4 type:

    float4 A, B, C;
    
    A = B * C;
    

    And it would run really well on both of your cards. Plus it would kick ass on a CPU OpenCL context and make great use of MMX/SSE wide registers, a bonus. It’s also a much better use of the memory system.

    An a tiny nutshell, using the packed primitives is the one thing I can recommend to keep in mind as you start deploying code on these two systems at the same time.

    Here’s one more example that more clearly illustrates what you need to be careful doing on your HD5870. Say we had implemented the previous example using separate work-units:

    // multiply four factors
    // as separate work units
    // A = B * C
    
    float A, B, C;
    
    A = B * C;
    

    And we had four separate work units instead of one. That would be an absolute disaster on the VLIW device and would show tremendously better performance on the GCN device. That is something you will want to also look for when you are writing your code–can you use float4 types to reduce the number of work units doing the same work? If so, then you will see good performance on both platforms.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a jquery bug and I've been looking for hours now, I can't
I have a French site that I want to parse, but am running into
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
this is what i have right now Drawing an RSS feed into the php,
This could be a duplicate question, but I have no idea what search terms
I don't have much knowledge about the IPv6 protocol, so sorry if the question
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
I have just tried to save a simple *.rtf file with some websites and
I want to count how many characters a certain string has in PHP, but

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.