I need to generate string that meets the following requirements:
- it should be a unique string;
- string length should be 8 characters;
- it should contain 2 digits;
- all symbols (non-digital characters) should be upper case.
I will store them in a data base after generation (they will be assigned to other entities).
My intention is to do something like this:
- Generate 2 random values from 0 to 9—they will be used for digits in the string;
- generate 6 random values from 0 to 25 and add them to 64—they will be used as 6 symbols;
- concatenate everything into one string;
- check if the string already exists in the data base; if not—repeat.
My concern with regard to that algorithm is that it doesn’t guarantee a result in finite time (if there are already A LOT of values in the data base).
Question: could you please give advice on how to improve this algorithm to be more deterministic?
Thanks.
Assuming:
Then your proposed method has two issues. One is that the letters A – Z are ASCII 65 – 90, not 64 – 89. The other is that it doesn’t distribute the numbers evenly within the possible string space. That can be remedied by doing the following:
There are 28 possibilities for the two different integers ((8*8 – 8 duplicates) / 2 orderings), 266 possibilities for the letters, and 100 possibilities for the numbers, the total # of valid combinations being Ncomb = 864964172800 = 8.64 x 1011.
edit: If you want to avoid the database for storage, but still guarantee both uniqueness of strings and have them be cryptographically secure, your best bet is a cryptographically random bijection from a counter between 0 and Nmax <= Ncomb to a subset of the space of possible output strings. (Bijection meaning there is a one-to-one correspondence between the output string and the input counter.)
This is possible with Feistel networks, which are commonly used in hash functions and symmetric cryptography (including AES). You’d probably want to choose Nmax = 239 which is the largest power of 2 <= Ncomb, and use a 39-bit Feistel network, using a constant key you keep secret. You then plug in your counter to the Feistel network, and out comes another 39-bit number X, which you then transform into the corresponding string as follows:
Alternatively, use 40-bit numbers, and if the output of your Feistel network is > Ncomb, then increment the counter and try again. This covers the entire string space at the cost of rejecting invalid numbers and having to re-execute the algorithm. (But you don’t need a database to do this.)
But this isn’t something to get into unless you know what you’re doing.