I’m referring to this question, and especially the comments to the first answer from @David Robinson and @mgilson:
Sum the second value of each tuple in a list
The original question was to sum the second value of each tuble:
structure = [('a', 1), ('b', 3), ('c', 2)]
First Answer:
sum(n for _, n in structure)
Second Answer:
sum(x[1] for x in structure)
According to discussion, the first answer is 50% faster.
Once I figured out what the first answer does (coming from Perl, I Googled for the special _ variable means in python), I got wondering how come what appears as a pure subset task (getting only the second element of each tuple vs. getting and binding into variables both elements) is actually slower? Is it a missing opportunity to optimize index access in Python? Am I missing something the second answer does which takes time?
If you take a look at the python bytecode, it becomes quite obvious very quickly why unpacking is faster:
The tuple unpacking operation is a simple bytecode (
UNPACK_SEQUENCE), while the indexing operation has to call a method on the tuple (BINARY_SUBSCR). The unpack operation can take place, inline, in the python evaluation loop, while the subscription call requires looking up of the function on the tuple object to retrieve the value, usingPyObject_GetItem.The
UNPACK_SEQUENCEopcode source code special-cases a python tuple or list unpack where the the sequence length matches the argument length exactly:The above code reaches into the native structure of the tuple and retrieves the values directly; no need to use heavy calls such as
PyObject_GetItemwhich have to take into account that the object could be a custom python class.The
BINARY_SUBSCRopcode is only optimized for python lists; anything that isn’t a native python list requires aPyObject_GetItemcall.