What are some general tips/pointers on vectorizing tree operations? Memory layout wise, algorithm wise, etc.
Some domain specific stuff:
- Each parent node will have quite a few (20 – 200) child nodes.
- Each node has a low probability of having child nodes.
- Operations on the tree is mostly conditional walks.
- The performance of walking over the tree is more important than insertion/deletion/search speeds.
Beware, this is very hard to implement. Last year a team of Intel, Oracle and UCSC presented an amazing solution “FAST: Fast Architecture Sensitive Tree Search
on Modern CPUs and GPUs”. They won the “Best Paper Award 2010” by ACM SIGMOD.