I’ve got a fixed point class (10.22) and I have a need of a pow, a sqrt, an exp and a log function.
Alas I have no idea where to even start on this. Can anyone provide me with some links to useful articles or, better yet, provide me with some code?
I’m assuming that once I have an exp function then it becomes relatively easy to implement pow and sqrt as they just become.
pow( x, y ) => exp( y * log( x ) )
sqrt( x ) => pow( x, 0.5 )
Its just those exp and log functions that I’m finding difficult (as though I remember a few of my log rules, I can’t remember much else about them).
Presumably, there would also be a faster method for sqrt and pow so any pointers on that front would be appreciated even if its just to say use the methods i outline above.
Please note: This HAS to be cross platform and in pure C/C++ code so I cannot use any assembler optimisations.
A very simple solution is to use a decent table-driven approximation. You don’t actually need a lot of data if you reduce your inputs correctly.
exp(a)==exp(a/2)*exp(a/2), which means you really only need to calculateexp(x)for1 < x < 2. Over that range, a runga-kutta approximation would give reasonable results with ~16 entries IIRC.Similarly,
sqrt(a) == 2 * sqrt(a/4) == sqrt(4*a) / 2which means you need only table entries for1 < a < 4. Log(a) is a bit harder:log(a) == 1 + log(a/e). This is a rather slow iteration, but log(1024) is only 6.9 so you won’t have many iterations.You’d use a similar "integer-first" algorithm for pow:
pow(x,y)==pow(x, floor(y)) * pow(x, frac(y)). This works becausepow(double, int)is trivial (divide and conquer).[edit] For the integral component of
log(a), it may be useful to store a table1, e, e^2, e^3, e^4, e^5, e^6, e^7so you can reducelog(a) == n + log(a/e^n)by a simple hardcoded binary search of a in that table. The improvement from 7 to 3 steps isn’t so big, but it means you only have to divide once bye^ninstead ofntimes bye.[edit 2]
And for that last
log(a/e^n)term, you can uselog(a/e^n) = log((a/e^n)^8)/8– each iteration produces 3 more bitsby table lookup. That keeps your code and table size small. This is typically code for embedded systems, and they don’t have large caches.[edit 3]
That’s still not to smart on my side.
log(a) = log(2) + log(a/2). You can just store the fixed-point valuelog2=0.6931471805599, count the number of leading zeroes, shiftainto the range used for your lookup table, and multiply that shift (integer) by the fixed-point constantlog2. Can be as low as 3 instructions.Using
efor the reduction step just gives you a "nice"log(e)=1.0constant but that’s false optimization. 0.6931471805599 is just as good a constant as 1.0; both are 32 bits constants in 10.22 fixed point. Using 2 as the constant for range reduction allows you to use a bit shift for a division.[edit 5]
And since you’re storing it in Q10.22, you can better store log(65536)=11.09035488. (16 x log(2)). The "x16" means that we’ve got 4 more bits of precision available.
You still get the trick from edit 2,
log(a/2^n) = log((a/2^n)^8)/8. Basically, this gets you a result(a + b/8 + c/64 + d/512) * 0.6931471805599– with b,c,d in the range [0,7].a.bcdreally is an octal number. Not a surprise since we used 8 as the power. (The trick works equally well with power 2, 4 or 16.)[edit 4]
Still had an open end.
pow(x, frac(y)is justpow(sqrt(x), 2 * frac(y))and we have a decent1/sqrt(x). That gives us the far more efficient approach. Sayfrac(y)=0.101binary, i.e. 1/2 plus 1/8. Then that meansx^0.101is(x^1/2 * x^1/8). Butx^1/2is justsqrt(x)andx^1/8is(sqrt(sqrt(sqrt(x))). Saving one more operation, Newton-RaphsonNR(x)gives us1/sqrt(x)so we calculate1.0/(NR(x)*NR((NR(NR(x))). We only invert the end result, don’t use the sqrt function directly.