While profiling my Python’s application, I’ve discovered that len() seems to be a very expensive one when using sets. See the below code:
import cProfile
def lenA(s):
for i in range(1000000):
len(s);
def lenB(s):
for i in range(1000000):
s.__len__();
def main():
s = set();
lenA(s);
lenB(s);
if __name__ == "__main__":
cProfile.run("main()","stats");
According to profiler’s stats below, lenA() seems to be 14 times slower than lenB():
ncalls tottime percall cumtime percall filename:lineno(function)
1 1.986 1.986 3.830 3.830 .../lentest.py:5(lenA)
1000000 1.845 0.000 1.845 0.000 {built-in method len}
1 0.273 0.273 0.273 0.273 .../lentest.py:9(lenB)
Am I missing something? Currently I use __len__() instead of len(), but the code looks dirty 🙁
Obviously,
lenhas some overhead, since it does a function call and translatesAttributeErrortoTypeError. Also,set.__len__is such a simple operation that it’s bound to be very fast in comparison to just about anything, but I still don’t find anything like the 14x difference when usingtimeit:You should always just call
len, not__len__. If the call tolenis the bottleneck in your program, you should rethink its design, e.g. cache sizes somewhere or calculate them without callinglen.