I’m using this type of spline in my code and I’m wondering if the algorithm can benefit from the use of SIMD instructions. (NEON on ARM) The code used is a C translation of the following sources (in Fortran):
- http://pages.cs.wisc.edu/~deboor/pgs/chol1d.f (the most CPU consuming procedure)
- http://pages.cs.wisc.edu/~deboor/pgs/setupq.f (the setup procedure)
- http://pages.cs.wisc.edu/~deboor/pgs/smooth.f (the main function that calls the above procedures)
Can you tell, from your experience, if this code has a chance of being optimized by using SIMD instructions?
Is there a guideline for converting code from ‘normal’ code to code using SIMD instructions?
Thanks
It looks like there are serial dependencies in the loops, so probably the only way that this will lend itself easily to vectorization with SIMD is if you have multiple data sets (e.g. 4) which you can operate on in parallel. These data sets would need to be the same size.