Assume you have a data set as something like a CSV file that contains

Question

0

Asked: June 14, 20262026-06-14T21:47:01+00:00 2026-06-14T21:47:01+00:00

Assume you have a data set as something like a CSV file that contains

0

Assume you have a data set as something like a CSV file that contains mildly sensitive information, like who passed a note to whom in a 12 Grade English class. While it’s not a crisis if this data got out, it would be nice to strip out the identifying information so the data could be made public, shared with collaborators, etc. The data looks something like this:

Giver, Recipient:

Anna,Joe
Anna,Mark
Mark,Mindy
Mindy,Joe

How would you process through this list, assign each name a unique but arbitrary identifier, then strip out the names and replace them with said identifier in Python such that you end up with something like:

1,2
1,3
3,4
4,2

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T21:47:02+00:00

you can use hash() to generate a unique arbitrary identifier, it will return always return same integer for a particular string:

 with open("data1.txt") as f:
    lis=[x.split(",") for x in f]
    items=[map(lambda y:hash(y.strip()),x) for x in lis]
    for x in items:
        print ",".join(map(str,x))
   ....:         


-1319295970,1155173045
-1319295970,-1963774321
-1963774321,-1499251772
-1499251772,1155173045

or you can also use iterools.count:

In [80]: c=count(1)

In [81]: with open("data1.txt") as f:
    lis=[map(str.strip,x.split(",")) for x in f]
    dic={}
    for x in set(chain(*lis)):
        dic.setdefault(x.strip(),next(c))
    for x in lis:    
        print ",".join(str(dic[y.strip()]) for y in x)
   ....:         
3,2
3,4
4,1
1,2

or improving my previous answer using the unique_everseen recipe from itertools, you can get the exact answer :

In [84]: c=count(1)

In [85]: def unique_everseen(iterable, key=None):
        seen = set()
        seen_add = seen.add
        if key is None:
                for element in ifilterfalse(seen.__contains__, iterable):
                        seen_add(element)
                        yield element
                else:
                        for element in iterable:
                                k = key(element)
                                if k not in seen:
                                        seen_add(k)
                                        yield element
   ....:                         

In [86]: with open("data1.txt") as f:
    lis=[map(str.strip,x.split(",")) for x in f]
    dic={}
    for x in unique_everseen(chain(*lis)):
        dic.setdefault(x.strip(),next(c))
    for x in lis:    
        print ",".join(str(dic[y.strip()]) for y in x)
   ....:         
1,2
1,3
3,4
4,2

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Assume you have a data set as something like a CSV file that contains

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply