I have a datastructure like this:
[
[('A', '1'), ('B', '2')],
[('A', '1'), ('B', '2')],
[('A', '4'), ('C', '5')]
]
And I want to obtain this:
[
[('A', '1'), ('B', '2')],
[('A', '4'), ('C', '5')]
]
Is there a good way of doing this while preserving order as shown?
Commands for copy-pasting:
sample = []
sample.append([('A', '1'), ('B', '2')])
sample.append([('A', '1'), ('B', '2')])
sample.append([('A', '4'), ('C', '5')])
This is a somewhat famous question which was well answered by a famous Pythonista long ago: http://code.activestate.com/recipes/52560-remove-duplicates-from-a-sequence/
If you can assume equal records are adjacent, there is a recipe in the itertools docs:
If you can only assume orderable elements, here a variant using the bisect module. Given n inputs with r unique values, its search step costs O(n log r). If a new unique value is found, it is inserted in the seen list for a cost of O(r * r).