I’m trying to merge logs from several servers. Each log is a list of tuples (date, count). date may appear more than once, and I want the resulting dictionary to hold the sum of all counts from all servers.
Here’s my attempt, with some data for example:
from collections import defaultdict
a=[("13.5",100)]
b=[("14.5",100), ("15.5", 100)]
c=[("15.5",100), ("16.5", 100)]
input=[a,b,c]
output=defaultdict(int)
for d in input:
for item in d:
output[item[0]]+=item[1]
print dict(output)
Which gives:
{'14.5': 100, '16.5': 100, '13.5': 100, '15.5': 200}
As expected.
I’m about to go bananas because of a colleague who saw the code. She insists that there must be a more Pythonic and elegant way to do it, without these nested for loops. Any ideas?
Doesn’t get simpler than this, I think:
Note that
Counter(also known as a multiset) is the most natural data structure for your data (a type of set to which elements can belong more than once, or equivalently – a map with semantics Element -> OccurrenceCount. You could have used it in the first place, instead of lists of tuples.Also possible:
Using
reduce(add, seq)instead ofsum(seq, initialValue)is generally more flexible and allows you to skip passing the redundant initial value.Note that you could also use
operator.and_to find the intersection of the multisets instead of the sum.The above variant is terribly slow, because a new Counter is created on every step. Let’s fix that.
We know that
Counter+Counterreturns a newCounterwith merged data. This is OK, but we want to avoid extra creation. Let’s useCounter.updateinstead:That’s what we want. Let’s wrap it with a function compatible with
reduceand see what happens.This is only marginally slower than the OP’s solution.
Benchmark: http://ideone.com/7IzSx (Updated with yet another solution, thanks to astynax)
(Also: If you desperately want an one-liner, you can replace
updateInPlacebylambda x,y: x.update(y) or xwhich works the same way and even proves to be a split second faster, but fails at readability. Don’t :-))