Several users have asked about the speed or memory consumption of image convolutions in numpy or scipy [1, 2, 3, 4]. From the responses and my experience using Numpy, I believe this may be a major shortcoming of numpy compared to Matlab or IDL.
None of the answers so far have addressed the overall question, so here it is: “What is the fastest method for computing a 2D convolution in Python?” Common python modules are fair game: numpy, scipy, and PIL (others?). For the sake of a challenging comparison, I’d like to propose the following rules:
- Input matrices are 2048×2048 and 32×32, respectively.
- Single or double precision floating point are both acceptable.
- Time spent converting your input matrix to the appropriate format doesn’t count — just the convolution step.
- Replacing the input matrix with your output is acceptable (does any python library support that?)
- Direct DLL calls to common C libraries are alright — lapack or scalapack
- PyCUDA is right out. It’s not fair to use your custom GPU hardware.
It really depends on what you want to do… A lot of the time, you don’t need a fully generic (read: slower) 2D convolution… (i.e. If the filter is separable, you use two 1D convolutions instead… This is why the various
scipy.ndimage.gaussian,scipy.ndimage.uniform, are much faster than the same thing implemented as a generic n-D convolutions.)At any rate, as a point of comparison:
This takes 6.9 sec on my machine…
Compare this with
fftconvolveThis takes about 10.8 secs. However, with different input sizes, using fft’s to do a convolution can be considerably faster (Though I can’t seem to come up with a good example, at the moment…).