On page three of this OpenCL reference sheet (broken link) there are two built in vector length functions with identical parameters: length() and half_length().
What is the difference between these functions? I gather from the name one is ‘faster’ than the other but in what circumstances? Does it sacrafice accuracy for this speed increase? If not, why would one ever use length() over fast_length()?
According to the OpenCL spec (version 1.1, page 215):
float length(floatn p): Return the length of vectorp, i.e.sqrt(p.x²+p.y²+...)float fast_length(floatn p): Return the length of vectorpcomputed ashalf_sqrt(p.x²+p.y²+...)So
fast_lengthuseshalf_sqrt, whilelengthusessqrt. As you can guesssqrthas better guarantees on accuracy, but might be slower. More to the point:Min Accuracy of
sqrt: 3ulp (unit of least precision)Min Accuracy of
half_sqrt: 8192ulpSo
half_sqrtcan be about 11bits less accurate thensqrt(well actually it can be 13 bit less accurate, since there ist no requirement forsqrtnot to be better then strictly necessary). Sincefloathas a mantissa of23bit(plus one implicit bit)half_sqrtonly promises about 10bit of precision (11bit including the implicit 1). It might however be faster, if the hardware has such a function. In hardware it’s not unusual to havesqrtorrsqrtinstruction providing only a small number of bits (like 10-14) and using Newton-Raphson iterations after the instruction to get the necessary precision. In such a case usinghalf_sqrtis obviously faster.