I frequently use sorted and groupby to find duplicates items in an iterable. Now I see it is unreliable:
from itertools import groupby
data = 3 * ('x ', (1,), u'x')
duplicates = [k for k, g in groupby(sorted(data)) if len(list(g)) > 1]
print duplicates
# [] printed - no duplicates found - like 9 unique values
The reason why the code above fails in Python 2.x is explained here.
What is a reliable pythonic way of finding duplicates?
I looked for similar questions/answers on SO. The best of them is “In Python, how do I take a list and reduce it to a list of duplicates?“, but the accepted solution is not pythonic (it is procedural multiline for … if … add … else … add … return result) and other solutions are unreliable (depends on unfulfilled transitivity of “<” operator) or are slow (O n*n).
[EDIT] Closed. The accepted answer helped me to summarize conclusions in my answer below more general.
I like to use builtin types to represent e.g. tree structures. This is why I am afraid of mix now.
Note: Assumes entries are hashable