Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6184149
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 24, 20262026-05-24T01:29:12+00:00 2026-05-24T01:29:12+00:00

Profiling some computational work I’m doing showed me that one bottleneck in my program

  • 0

Profiling some computational work I’m doing showed me that one bottleneck in my program was a function that basically did this (np is numpy, sp is scipy):

def mix1(signal1, signal2):
    spec1 = np.fft.fft(signal1, axis=1)
    spec2 = np.fft.fft(signal2, axis=1)
    return np.fft.ifft(spec1*spec2, axis=1)

Both signals have shape (C, N) where C is the number of sets of data (usually less than 20) and N is the number of samples in each set (around 5000). The computation for each set (row) is completely independent of any other set.

I figured that this was just a simple convolution, so I tried to replace it with:

def mix2(signal1, signal2):
    outputs = np.empty_like(signal1)

    for idx, row in enumerate(outputs):
        outputs[idx] = sp.signal.convolve(signal1[idx], signal2[idx], mode='same')

    return outputs

…just to see if I got the same results. But I didn’t, and my questions are:

  1. Why not?
  2. Is there a better way to compute the equivalent of mix1()?

(I realise that mix2 probably wouldn’t have been faster as-is, but it might have been a good starting point for parallelisation.)

Here’s the full script I used to quickly check this:

import numpy as np
import scipy as sp
import scipy.signal

N = 4680
C = 6

def mix1(signal1, signal2):
    spec1 = np.fft.fft(signal1, axis=1)
    spec2 = np.fft.fft(signal2, axis=1)
    return np.fft.ifft(spec1*spec2, axis=1)

def mix2(signal1, signal2):
    outputs = np.empty_like(signal1)

    for idx, row in enumerate(outputs):
        outputs[idx] = sp.signal.convolve(signal1[idx], signal2[idx], mode='same')

    return outputs

def test(num, chans):
    sig1 = np.random.randn(chans, num)
    sig2 = np.random.randn(chans, num)
    res1 = mix1(sig1, sig2)
    res2 = mix2(sig1, sig2)

    np.testing.assert_almost_equal(res1, res2)

if __name__ == "__main__":
    np.random.seed(0x1234ABCD)
    test(N, C)
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-24T01:29:13+00:00Added an answer on May 24, 2026 at 1:29 am

    So I tested this out and can now confirm a few things:

    1) numpy.convolve is not circular, which is what the fft code is giving you:

    2) FFT does not internally pad to a power of 2. Compare the vastly different speeds of the following operations:

    x1 = np.random.uniform(size=2**17-1)
    x2 = np.random.uniform(size=2**17)
    
    np.fft.fft(x1)
    np.fft.fft(x2)
    

    3) Normalization is not a difference — if you do a naive circular convolution by adding up a(k)*b(i-k), you will get the result of the FFT code.

    The thing is padding to a power of 2 is going to change the answer. I’ve heard tales that there are ways to deal with this by cleverly using prime factors of the length (mentioned but not coded in Numerical Recipes) but I’ve never seen people actually do that.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I did some profiling on a program that I'm running and the thing that
I had a performance problem today that showed up after some profiling. Calls to
I've got this webapp that needs some memory tuning. While I'm already profiling the
Doing some profiling (mem & speed) I've been stomped by the fact that win7
I'm profiling some APIs to see which one is suitable for this project. I
I've been doing some profiling lately and I've encountered one case which is driving
I was profiling a program today at work that does a lot of buffered
I'm profiling some Flash code, and one of the methods that's taking a lot
I am running some profiling tests, and usleep is an useful function. But while
I'd like to do some basic profiling of my code, but found that the

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.