Say I have a 1D array converted from a MxN 2D matrix, and I want to parallelize each column and do some operations. How do I assign a thread to each column?
For example, if I have a 3×3 matrix:
1 2 3
4 5 6
7 8 9
And I want to add each number in the column depending on the column # (so 1st column will add 1, 2nd will add 2….), it then becomes:
1+1 2+1 3+1
4+2 5+2 6+2
7+3 8+3 9+3
How do I do this in CUDA? I know how to assign threads to all the elements in the array but I don’t know how to assign thread to each column. So, what I want is to send each column (1 , 2 ,3 ) ( 4 , 5 ,6 ) (7 , 8 ,9) and do the operation.
In your example you are adding numbers based on the row. Still, you know the row/column length of the matrix (you know it’s MxN). What you could do is something like:
If you wanted to add a different number, you could do something like: