I wrote the following module which encodes a UUID to an arbitrary base:
http://pypi.python.org/pypi/shortuuid/
Now, this gets it down to 22 symbols with the default alphabet while preserving uniqueness, but I was wondering how many (/which) digits I could cut off while maximising the retained uniqueness.
Are all the digits of a UUID equally random/unique, or are some digits more random than others? For example, if the first few digits are a machine/application-specific identifier, then obviously they would be less random than the last few. I haven’t noticed anything like this in my experiments, but I want to be sure before I advise people on it.
Will truncating it to, say, 8 digits have 1/57^8 probability of a clash, or does is the probability not uniform on the digits?
Because of the way UUID’s are constructed, it very much depends on the version. And yes, some will be more random than others.
http://en.wikipedia.org/wiki/Uuid#Version_1_.28MAC_address.29
One way to hack around this is by taking a hash (i.e.
sha256for example) of the UUID. Those hashes should be distributed in a uniform way.Do note that I haven’t done a really thorough analysis here. My answer should be in the ballpark but I give no guarantee that it’s completely correct.