Recently, I’ve discovered with the help of Jon Clements in this thread that the following codes have very different execution times.
Do you have any idea why this is happening?
Comment: self.stream_data is a vector tuple with many zeros and int16 values and create_ZS_data method is performing so called ZeroSuppression.
Environment
Input: Many (3.5k) small files (~120kb each)
OS: Linux64
Python ver 2.6.8
Solution based on a generator:
def create_ZS_data(self):
self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]]
for row, column in itertools.product(xrange(self.rows), xrange(self.columns))
if self.stream_data[column + row * self.rows ] )
Profiler info:
ncalls tottime percall cumtime percall filename:lineno(function)
3257 1.117 0.000 71.598 0.022 decode_from_merlin.py:302(create_ZS_file)
463419 67.705 0.000 67.705 0.000 decode_from_merlin.py:86(<genexpr>)
Jon’s Solution:
create_ZS_data(self):
self.ZS_data = list()
for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)):
for colno, col in enumerate(cols):
# col == value, (rowno, colno) = index
if col:
self.ZS_data.append([colno, rowno, col])
Profiler info:
ncalls tottime percall cumtime percall filename:lineno(function)
3257 18.616 0.006 19.919 0.006 decode_from_merlin.py:83(create_ZS_data)
I looked at the prior discussion; you seem to be troubled that your clever comprehension isn’t as efficient in cycles as it is in characters of source code. What I didn’t point out then was that this would be my preferred implementation to read:
I’ve not tested it, but I can make sense of it. There are a couple of things that jump out at me as being potential inefficiencies. Recomputing the Cartesian product of two constant monotonically “boring” indices has got to be expensive:
you then use the results
[(0, 0), (0, 1), ...]to do single element indexing from your source:which is also more costly than handling larger slices as the “Jon’s” implementation does.
Generators are not some secret sauce that guarantee efficiency. In this particular case, with 135kb of data that has already been read into core, a poorly constructed generator does seem to be costing you. If you want concise matrix operations, use APL; if you want readable code, don’t strive for rabid minimization in Python.