What is the best algorithm to take a long sequence of integers (say 100,000 of them) and return a measurement of how random the sequence is?
The function should return a single result, say 0 if the sequence is not all all random, up to, say 1 if perfectly random. It can give something in-between if the sequence is somewhat random, e.g. 0.95 might be a reasonably random sequence, whereas 0.50 might have some non-random parts and some random parts.
If I were to pass the first 100,000 digits of Pi to the function, it should give a number very close to 1. If I passed the sequence 1, 2, … 100,000 to it, it should return 0.
This way I can easily take 30 sequences of numbers, identify how random each one is, and return information about their relative randomness.
Is there such an animal?
…..
Update 24-Sep-2019: Google may have just ushered in an era of quantum supremacy says:
“Google’s quantum computer was reportedly able to solve a calculation — proving the randomness of numbers produced by a random number generator — in 3 minutes and 20 seconds that would take the world’s fastest traditional supercomputer, Summit, around 10,000 years. This effectively means that the calculation cannot be performed by a traditional computer, making Google the first to demonstrate quantum supremacy.”
So obviously there is an algorithm to “prove” randomness. Does anyone know what it is? Could this algorithm also provide a measure of randomness?
It can be done this way:
CAcert Research Lab does a Random Number Generator Analysis.
Their results page evaluates each random sequence using 7 tests (Entropy, Birthday Spacing, Matrix Ranks, 6×8 Matrix Ranks, Minimum Distance, Random Spheres, and the Squeeze). Each test result is then color coded as one of “No Problems”, “Potentially deterministic” and “Not Random”.
So a function can be written that accepts a random sequence and does the 7 tests.
If any of the 7 tests are “Not Random” then the function returns a 0. If all of the 7 tests are “No Problems”, then it returns a 1. Otherwise, it can return some number in-between based on how many tests come in as “Potentially Deterministic”.
The only thing missing from this solution is the code for the 7 tests.