I’m plotting several images at once, sharing axes, because I use it for exploratory purposes. Each image is the same satellite image at different dates. I’m experimenting a slow response from matplotlib when zooming and panning, and I would like to ask for any tips that could speed up the process.
What I am doing now is:
-
Load data from several netcdf files.
-
Calculate maximum value of all the data, for normalization.
-
Create a grid of subplots using ImageGrid. As each subplot is generated, I delete the array to free some memory (each array is stored in a list, the “deletion” is just a list.pop()). See the code below.
It’s 15 images, single-channel, of 4600×3840 pixels each. I’ve noticed that the bottleneck is not the RAM (I have 8 GB), but the processor. Python spikes to 100% usage on one of the cores when zooming or panning (it’s an Intel(R) Core(TM) i5-2500 CPU @ 3.30GHz, 4 cores, 64 bit).
The code is:
import os
import sys
import numpy as np
import netCDF4 as ncdf
import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import ImageGrid
from matplotlib.colors import LogNorm
MIN = 0.001 # Hardcoded minimum data value used in normalization
variable = 'conc_chl'
units = r'$mg/m^3$'
data = []
dates = []
# Get a list of only netCDF files
filelist = os.listdir(sys.argv[1])
filelist = [f for f in filelist if os.path.splitext(f)[1] == '.nc']
filelist.sort()
filelist.reverse()
# Load data and extract dates from filenames
for f in filelist:
dataset = ncdf.Dataset(os.path.join(sys.argv[1],f), 'r')
data.append(dataset.variables[variable][:])
dataset.close()
dates.append((f.split('_')[2][:-3],f.split('_')[1]))
# Get the maximum value of all data. Will be used for normalization
maxc = np.array(data).max()
# Plot the grid of images + dates
fig = plt.figure()
grid = ImageGrid(fig, 111,\
nrows_ncols = (3, 5),\
axes_pad = 0.0,\
share_all=True,\
aspect = False,\
cbar_location = "right",\
cbar_mode = "single",\
cbar_size = '2.5%',\
)
for g in grid:
v = data.pop()
d = dates.pop()
im = g.imshow(v, interpolation='none', norm=LogNorm(), vmin=MIN, vmax=maxc)
g.text(0.01, 0.01, '-'.join(d), transform = g.transAxes) # Date on a corner
cticks = np.logspace(np.log10(MIN), np.log10(maxc), 5)
cbar = grid.cbar_axes[0].colorbar(im)
cbar.ax.set_yticks(cticks)
cbar.ax.set_yticklabels([str(np.round(t, 2)) for t in cticks])
cbar.set_label_text(units)
# Fine-tune figure; make subplots close to each other and hide x ticks for
# all
fig.subplots_adjust(left=0.02, bottom=0.02, right=0.95, top=0.98, hspace=0, wspace=0)
grid.axes_llc.set_yticklabels([], visible=False)
grid.axes_llc.set_xticklabels([], visible=False)
plt.show()
Any clue about what could be improved to make it more responsive?
It seems that setting
interpolation='none'is significantly slower than setting it to ‘nearest’ (or even ‘bilinear’). On supported backends (e.g. any Agg backend) the code paths for ‘none’ and ‘nearest’ are different: ‘nearest’ gets passed to Agg’s interpolation routine, whereas ‘none’ does an unsampled rescale of the image (I’m just reading the code comments here).These different approaches give different qualitative results; for example, the code snippet below gives a slight moiré pattern, which doesn’t appear when
interpolation='none'.I think that ‘none’ is roughly the same as ‘nearest’ when zooming in (image pixels are larger than screen pixels) but gives a higher-order interpolation result when zooming out (image pixels smaller than screen pixels). I think the delay comes from some extra Matplotlib/Python calculations needed for the rescaling.