I’m looking for the fastest way to determine if a long value is a perfect square (i.e. its square root is another integer):
- I’ve done it the easy way, by using the built-in
Math.sqrt()function, but I’m wondering if there is a way to do it faster by restricting yourself to integer-only domain. - Maintaining a lookup table is impractical (since there are about 231.5 integers whose square is less than 263).
Here is the very simple and straightforward way I’m doing it now:
public final static boolean isPerfectSquare(long n) { if (n < 0) return false; long tst = (long)(Math.sqrt(n) + 0.5); return tst*tst == n; }
Note: I’m using this function in many Project Euler problems. So no one else will ever have to maintain this code. And this kind of micro-optimization could actually make a difference, since part of the challenge is to do every algorithm in less than a minute, and this function will need to be called millions of times in some problems.
I’ve tried the different solutions to the problem:
- After exhaustive testing, I found that adding
0.5to the result of Math.sqrt() is not necessary, at least not on my machine. - The fast inverse square root was faster, but it gave incorrect results for n >= 410881. However, as suggested by BobbyShaftoe, we can use the FISR hack for n < 410881.
- Newton’s method was a good bit slower than
Math.sqrt(). This is probably becauseMath.sqrt()uses something similar to Newton’s Method, but implemented in the hardware so it’s much faster than in Java. Also, Newton’s Method still required use of doubles. - A modified Newton’s method, which used a few tricks so that only integer math was involved, required some hacks to avoid overflow (I want this function to work with all positive 64-bit signed integers), and it was still slower than
Math.sqrt(). - Binary chop was even slower. This makes sense because the binary chop will on average require 16 passes to find the square root of a 64-bit number.
- According to John’s tests, using
orstatements is faster in C++ than using aswitch, but in Java and C# there appears to be no difference betweenorandswitch. - I also tried making a lookup table (as a private static array of 64 boolean values). Then instead of either switch or
orstatement, I would just sayif(lookup[(int)(n&0x3F)]) { test } else return false;. To my surprise, this was (just slightly) slower. This is because array bounds are checked in Java.
I figured out a method that works ~35% faster than your 6bits+Carmack+sqrt code, at least with my CPU (x86) and programming language (C/C++). Your results may vary, especially because I don’t know how the Java factor will play out.
My approach is threefold:
int64 x.)To actually check if the residue is a square, I look up the answer in a precomputed table.
At this point, for our number to be a square, it must be 1 mod 8.
The basic structure of Hensel’s lemma is the following. (Note: untested code; if it doesn’t work, try t=2 or 8.)
The idea is that at each iteration, you add one bit onto r, the "current" square root of x; each square root is accurate modulo a larger and larger power of 2, namely t/2. At the end, r and t/2-r will be square roots of x modulo t/2. (Note that if r is a square root of x, then so is -r. This is true even modulo numbers, but beware, modulo some numbers, things can have even more than 2 square roots; notably, this includes powers of 2.) Because our actual square root is less than 2^32, at that point we can actually just check if r or t/2-r are real square roots. In my actual code, I use the following modified loop:
The speedup here is obtained in three ways: precomputed start value (equivalent to ~10 iterations of the loop), earlier exit of the loop, and skipping some t values. For the last part, I look at
z = r - x * x, and set t to be the largest power of 2 dividing z with a bit trick. This allows me to skip t values that wouldn’t have affected the value of r anyway. The precomputed start value in my case picks out the "smallest positive" square root modulo 8192.Even if this code doesn’t work faster for you, I hope you enjoy some of the ideas it contains. Complete, tested code follows, including the precomputed tables.