I’m not great with statistical mathematics, etc. I’ve been wondering, if I use the following:
import uuid
unique_str = str(uuid.uuid4())
double_str = ''.join([str(uuid.uuid4()), str(uuid.uuid4())])
Is double_str string squared as unique as unique_str or just some amount more unique? Also, is there any negative implication in doing something like this (like some birthday problem situation, etc)? This may sound ignorant, but I simply would not know as my math spans algebra 2 at best.
The
uuid4function returns a UUID created from 16 random bytes and it is extremely unlikely to produce a collision, to the point at which you probably shouldn’t even worry about it.If for some reason
uuid4does produce a duplicate it is far more likely to be a programming error such as a failure to correctly initialize the random number generator than genuine bad luck. In which case the approach you are using it will not make it any better – an incorrectly initialized random number generator can still produce duplicates even with your approach.If you use the default implementation
random.seed(None)you can see in the source that only 16 bytes of randomness are used to initialize the random number generator, so this is an a issue you would have to solve first. Also, if the OS doesn’t provide a source of randomness the system time will be used which is not very random at all.But ignoring these practical issues, you are basically along the right lines. To use a mathematical approach we first have to define what you mean by “uniqueness”. I think a reasonable definition is the number of ids you need to generate before the probability of generating a duplicate exceeds some probability
p. An approcimate formula for this is:where
dis2**(16*8)for a single randomly generated uuid and2**(16*2*8)with your suggested approach. The square root in the formula is indeed due to the Birthday Paradox. But if you work it out you can see that if you square the range of valuesdwhile keepingpconstant then you also squaren.