There are many ways to write a Python program that computes a histogram.
By histogram, I mean a function that counts the occurrence of objects in an iterable and outputs the counts in a dictionary. For example:
>>> L = 'abracadabra'
>>> histogram(L)
{'a': 5, 'b': 2, 'c': 1, 'd': 1, 'r': 2}
One way to write this function is:
def histogram(L):
d = {}
for x in L:
if x in d:
d[x] += 1
else:
d[x] = 1
return d
Are there more concise ways of writing this function?
If we had dictionary comprehensions in Python, we could write:
>>> { x: L.count(x) for x in set(L) }
but since Python 2.6 doesn’t have them, we have to write:
>>> dict([(x, L.count(x)) for x in set(L)])
Although this approach may be readable, it is not efficient: L is walked-through multiple times. Furthermore, this won’t work for single-life generators; the function should work equally well for iterator generators such as:
def gen(L):
for x in L:
yield x
We might try to use the reduce function (R.I.P.):
>>> reduce(lambda d,x: dict(d, x=d.get(x,0)+1), L, {}) # wrong!
Oops, this does not work: the key name is 'x', not x. 🙁
I ended with:
>>> reduce(lambda d,x: dict(d.items() + [(x, d.get(x, 0)+1)]), L, {})
(In Python 3, we would have to write list(d.items()) instead of d.items(), but it’s hypothethical, since there is no reduce there.)
Please beat me with a better, more readable one-liner! 😉
Python 3.x does have
reduce, you just have to do afrom functools import reduce. It also has “dict comprehensions”, which have exactly the syntax in your example.Python 2.7 and 3.x also have a Counter class which does exactly what you want:
In Python 2.6 or earlier, I’d personally use a defaultdict and do it in 2 lines:
That’s clean, efficient, Pythonic, and much easier for most people to understand than anything involving
reduce.