I’m writing a path tracer in C++ and I’d like to try and implement the most resource-intensive code into CUDA or OpenCL (I’m not sure which one to pick).
I’ve heard that my graphics card’s version of CUDA doesn’t support recursion, which is something my path tracer utilizes heavily.
As I have it coded both in Python and C++, I’ll post some simplified Python code for readability:
def Trace(ray):
hit = what_object_is_hit(ray)
if not hit:
return Color(0, 0, 0)
newRay = hit.bouceChildRayOffSurface(ray)
return hit.diffuse * (Trace(newRay) + hit.emittance)
I tried manually unrolling the function, and there is a definite pattern (d is diffuse and e is emittance):
Level 1: d1 * e1
Level 2: d1 * d2 * e2
+ e1
Level 3: d1 * d2 * d3 * e3
+ d1 * d2 * e2
+ e1
Level 4: d1 * d2 * d3 * d4 * e4
+ d1 * d2 * d3 * e3
+ d1 * d2 * e2
+ e1
I might be wrong, though…
My question is, how would I go about implementing this code in a while loop?
I was thinking using something of this format:
total = Color(0, 0, 0)
n = 1
while n < 10: # Maximum recursion depth
result = magical_function()
if not result: break
total += result
n += 1
I’ve never really dealt with the task of unraveling a recursive function before, so any help would be greatly appreciated. Thanks!
In a recursive function, each time a recursive call occurs, the state of the caller is saved to a stack, then restored when the recursive call is complete. To convert a recursive function to an iterative one, you need to turn the state of the suspended function into an explicit data structure. Of course, you can create your own stack in software, but there are often tricks you can use to make your code more efficient.
This answer works through the transformation steps for this example. You can apply the same methods to other loops.
Tail Recursion Transformation
Let’s take a look at your code again:
In general, a recursive call has to go back to the calling function, so the caller can finish what it’s doing. In this case, the caller “finishes” by performing an addition and a multiplication. This produces a computation like
d1 * (d2 * (d3 * (... + e3) + e2) + e1)). We can take advantage of the distributive law of addition and the associative laws of multiplication and addition to transform the calculation into[d1 * e1] + [(d1 * d2) * e2] + [(d1 * d2) * d3) * e3] + .... Note that the first term in this series only refers to iteration 1, the second only refers to iterations 1 and 2, and so forth. That tells us that we can compute this series on the fly. Moreover, this series contains the series(d1, d1*d2, d1*d2*d3, ...), which we can also compute on the fly. Putting that back into the code:Tail Recursion Elimination
In the new loop, the caller has no work to do after the callee finishes; it simply returns the callee’s result. The caller has no work to finish, so it doesn’t have to save any of its state! Instead of a call, we can overwrite the old parameters and go back to the beginning of the function (not valid Python, but it illustrates the point):
Finally, we have transformed the recursive function into an equivalent loop. All that’s left is to express it in Python syntax.