I’m working with a data set that deals with personal data (i.e. data that deals with people, not [necessarily] private data)… This data that changes over time, and the format is imposed by the client. I need something to use as a primary key, and unfortunately the only field that uniquely identifies a person and doesn’t change unpredictably is SSN. The ID number (primary key) is going to be public facing, so I can’t publish that, but I’m hoping to obscure it.
- The result must be numeric.
- The result may be up to 25 digits long.
- The result must be unique.
- The result should be as difficult as possible to reverse without a key, given the constraints above.
Is there a numeric cipher that would fit this?
Am I crazy for trying this?
Format perserving encryption sounds like a solution to your problems. Use this on the SSN and then you just have some random 10 digit number that you can pad out to the 25 digit id you need. If you do the padding right, you can even invert it (if you have the key). The point is that after running it through the format perserving encryption, you data is not sensitive.