Recently, I’ve discovered with the help of Jon Clements in this thread that the

Question

0

Asked: June 8, 20262026-06-08T11:10:07+00:00 2026-06-08T11:10:07+00:00

Recently, I’ve discovered with the help of Jon Clements in this thread that the

0

Recently, I’ve discovered with the help of Jon Clements in this thread that the following codes have very different execution times.

Do you have any idea why this is happening?

Comment: self.stream_data is a vector tuple with many zeros and int16 values and create_ZS_data method is performing so called ZeroSuppression.

Environment
Input: Many (3.5k) small files (~120kb each)
OS: Linux64
Python ver 2.6.8

Solution based on a generator:

def create_ZS_data(self):
    self.ZS_data = ( [column, row, self.stream_data[column + row * self.rows ]]
                     for row, column in itertools.product(xrange(self.rows), xrange(self.columns))
                     if self.stream_data[column + row * self.rows ] )

Profiler info:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3257    1.117    0.000   71.598    0.022 decode_from_merlin.py:302(create_ZS_file)
   463419   67.705    0.000   67.705    0.000 decode_from_merlin.py:86(<genexpr>)

Jon’s Solution:

create_ZS_data(self):
    self.ZS_data = list()
    for rowno, cols in enumerate(self.stream_data[i:i+self.columns] for i in xrange(0, len(self.stream_data), self.columns)):
        for colno, col in enumerate(cols):
            # col == value, (rowno, colno) = index
            if col:
                self.ZS_data.append([colno, rowno, col])

Profiler info:

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     3257   18.616    0.006   19.919    0.006 decode_from_merlin.py:83(create_ZS_data)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T11:10:10+00:00

I looked at the prior discussion; you seem to be troubled that your clever comprehension isn’t as efficient in cycles as it is in characters of source code. What I didn’t point out then was that this would be my preferred implementation to read:

def sparse_table_elements(cells, columns, rows):
    ncells = len(cells)
    non_zeros = list()
    for nrow in range(0, ncells, columns):
         row = cells[nrow:nrow+columns]
         for ncol, cell in enumerate(row):
             if cell:
                 non_zeros.append([ncol, nrow, cell])
    return non_zeros

I’ve not tested it, but I can make sense of it. There are a couple of things that jump out at me as being potential inefficiencies. Recomputing the Cartesian product of two constant monotonically “boring” indices has got to be expensive:

itertools.product(xrange(self.rows), xrange(self.columns))

you then use the results [(0, 0), (0, 1), ...] to do single element indexing from your source:

stream_data[column + row * self.rows]

which is also more costly than handling larger slices as the “Jon’s” implementation does.

Generators are not some secret sauce that guarantee efficiency. In this particular case, with 135kb of data that has already been read into core, a poorly constructed generator does seem to be costing you. If you want concise matrix operations, use APL; if you want readable code, don’t strive for rabid minimization in Python.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Recently, I’ve discovered with the help of Jon Clements in this thread that the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply