I encountered a very slow if statement response using cuda\jacket in matlab. (5 sec vs 0.02 sec for the same code that finds local maxima, using a simple for loop and an if condition)
Being new to GPU programming, I went reading and when I saw a previous matlab if statements with CUDA SO discussion, I felt something is missing.
You don’t need to use cuda to know that it is better to vectorized your code. However, there are cases where you will need to use an if statement anyway.
For example, I’d like to find whether a pixel of a 2D image (say m(a,b)) is the the local maximum of its 8 nearest neighbors. In matlab, an easy way to do that is by using 8 logical conditions on an if statement:
if m(a,b)>m(a-1,b-1) & m(a,b)>(a,b-1) & m(a,b)>(a+1,b-1) & … etc on all nearest neighbors
I’d appreciate if you have an idea how to resolve (or vectorize) this…
The problem with using multiple “if” statement (or any other conditional statement) is that for each the statements, the result is copied from gpu to host and this can be costly.
The simplest way is to vectorize in the following manner.
This can be further optimized if you can show what the if / else conditions are doing. i.e. please post the if/else code to see if other optimizations are available (i.e look at possible ways to remove if condition entirely).
EDIT
With new information, here is what can be done.
You can use gfor loop to make it even faster.