Assume you have a data set as something like a CSV file that contains mildly sensitive information, like who passed a note to whom in a 12 Grade English class. While it’s not a crisis if this data got out, it would be nice to strip out the identifying information so the data could be made public, shared with collaborators, etc. The data looks something like this:
Giver, Recipient:
Anna,Joe
Anna,Mark
Mark,Mindy
Mindy,Joe
How would you process through this list, assign each name a unique but arbitrary identifier, then strip out the names and replace them with said identifier in Python such that you end up with something like:
1,2
1,3
3,4
4,2
you can use
hash()to generate a unique arbitrary identifier, it will return always return same integer for a particular string:or you can also use
iterools.count:or improving my previous answer using the
unique_everseenrecipe from itertools, you can get the exact answer :