I’m processing a large amount of : splitted data with Python. And I’m having a strange situation.
This is the original code written by my colleague:
tag = word[i].split(":")[0].decode('utf8')
value = int(word[i].split(":")[1])
And I think it is inefficient since it called the split function twice where one single call should suffice, so I change it to this:
tokens = word[i].split(":")
tag = tokens[0].decode('utf8')
value = int(tokens[1])
And very strange thing happens after this:
I used a log to record the performance of the code, and it takes about 10 seconds to process 1000 lines of data, but after my modification it takes about 50 seconds to process 1000 lines of data.
Why this happens? Isn’t ONE call supposed to be faster than TWO calls?
Thanks for your advice.
Logs are not a good way to benchmark a short code section – there are quite a few other things happening in your system. Using the timeit module will give more accurate results: