I have been tinkering with BSP trees for a while now and am also playing with threads. When adding a triangle to a BSP tree, an opportunity arises to create a new thread for the purposes of processing data in parallel.
insert(triangle, bspnode) { .... else if(triangle spans bspnode) { (frontpiece, backpiece) = plane_split(triangle, bspnode) insert(frontpiece, bspnode.front) insert(backpiece, bspnode.back) } .... }
The two insert operations above could be executed by two threads, and since they do not modify the same data, cheap synchronization can be used.
insert(triangle, bspnode) { .... else if(triangle spans bspnode) { (frontpiece, backpiece) = split(triangle, bspnode) handle = beginthread(insert(backpiece, bspnode.front)) insert(frontpiece, bspnode.back) if(handle) { waitforthread(handle) } else { insert(backpiece, bspnode.front) } } .... }
This new method attempts to create a thread to complete the operation in parallel, but should not fail if the thread cannot be created (it will simply revert to the original algorithm).
Is this a sound programming practice, or am I using threads improperly? I have not been able to find any literature on this technique. I like that it tends to use my CPU to its fullest (2 cores), and would theoretically scale to any number of processors available. I don’t like that it might be horribly wasteful on CPU and memory.
Threads are great if some part of the processing is waiting on something external (user input, I/O, some other processing) – the thread that’s waiting can continue to wait, while a thread that isn’t waiting forges on ahead.
However, for processing-intensive tasks, more threads than processors actually creates overhead. It seems like your threads are doing all ‘CPU work’, so I’d stick to one thread per core – test to find the optimal number, though.
The biggest overhead created is from context switching (freezing one thread and loading the execution context of the next one), as well as cache misses when threads are doing tasks with different memory (if your thread can use the CPU cache effectively).