I am new to python and am learning how to do things the right way.
I have list of dictionaries d. Each dictionary represents users, and contains information like user_id, age, etc. This list d can contain several dictionaries that represent the same user (but with slightly different information that does not matter for my purposes). I want to create histogram that shows how many users are in d with given age. How to do it in efficient way?
Edit:
I want to emphasise that I need to eliminate duplicates in the list.
Well, the classic approach to this problem would be to create a defaultdict:
Then iterate over the dictionaries in the list, and (using
d_listinstead ofdas the name of the list of dictionaries),But you included additional information that confuses me. You said multiple dicts could represent the same user. Do you want to eliminate those duplicates from the histogram? If that’s your question, one approach would be to store the users in a dict of
user_recordsusing(firstname, lastname)tuples as keys. Then successive dictionaries representing the same user would smash one another and only one record per user would be preserved. Then iterate over the values in that dictionary (perhaps usinguser_records.itervalues()).This general approach can be modified to use whatever values in each record best identifies unique users. If the
user_idvalue is unique per user, then use that as the key instead of(firstname, lastname). But your question suggested (to me) that theuser_idwouldn’t necessarily be the same for two users who are the same.Once you have the eliminated duplicates, though, there’s also a shortcut if you’re using Python >= 2.7:
Some example code… say we have a
record_list:As you can see, the
record_listhas a duplicate, but theuser_agesdict doesn’t. Now getting a count of ages is as simple as running the values through aCounter.The same thing can be done with any string or immutable object that can serve as a unique identifier of a particular user.