Profiling some computational work I’m doing showed me that one bottleneck in my program

Question

0

Asked: May 24, 20262026-05-24T01:29:12+00:00 2026-05-24T01:29:12+00:00

Profiling some computational work I’m doing showed me that one bottleneck in my program

0

Profiling some computational work I’m doing showed me that one bottleneck in my program was a function that basically did this (np is numpy, sp is scipy):

def mix1(signal1, signal2):
    spec1 = np.fft.fft(signal1, axis=1)
    spec2 = np.fft.fft(signal2, axis=1)
    return np.fft.ifft(spec1*spec2, axis=1)

Both signals have shape (C, N) where C is the number of sets of data (usually less than 20) and N is the number of samples in each set (around 5000). The computation for each set (row) is completely independent of any other set.

I figured that this was just a simple convolution, so I tried to replace it with:

def mix2(signal1, signal2):
    outputs = np.empty_like(signal1)

    for idx, row in enumerate(outputs):
        outputs[idx] = sp.signal.convolve(signal1[idx], signal2[idx], mode='same')

    return outputs

…just to see if I got the same results. But I didn’t, and my questions are:

Why not?
Is there a better way to compute the equivalent of mix1()?

(I realise that mix2 probably wouldn’t have been faster as-is, but it might have been a good starting point for parallelisation.)

Here’s the full script I used to quickly check this:

import numpy as np
import scipy as sp
import scipy.signal

N = 4680
C = 6

def mix1(signal1, signal2):
    spec1 = np.fft.fft(signal1, axis=1)
    spec2 = np.fft.fft(signal2, axis=1)
    return np.fft.ifft(spec1*spec2, axis=1)

def mix2(signal1, signal2):
    outputs = np.empty_like(signal1)

    for idx, row in enumerate(outputs):
        outputs[idx] = sp.signal.convolve(signal1[idx], signal2[idx], mode='same')

    return outputs

def test(num, chans):
    sig1 = np.random.randn(chans, num)
    sig2 = np.random.randn(chans, num)
    res1 = mix1(sig1, sig2)
    res2 = mix2(sig1, sig2)

    np.testing.assert_almost_equal(res1, res2)

if __name__ == "__main__":
    np.random.seed(0x1234ABCD)
    test(N, C)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-24T01:29:13+00:00

So I tested this out and can now confirm a few things:

1) numpy.convolve is not circular, which is what the fft code is giving you:

2) FFT does not internally pad to a power of 2. Compare the vastly different speeds of the following operations:

x1 = np.random.uniform(size=2**17-1)
x2 = np.random.uniform(size=2**17)

np.fft.fft(x1)
np.fft.fft(x2)

3) Normalization is not a difference — if you do a naive circular convolution by adding up a(k)*b(i-k), you will get the result of the FFT code.

The thing is padding to a power of 2 is going to change the answer. I’ve heard tales that there are ways to deal with this by cleverly using prime factors of the length (mentioned but not coded in Numerical Recipes) but I’ve never seen people actually do that.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Profiling some computational work I’m doing showed me that one bottleneck in my program

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply