I would like to draw histogram using matplotlib. However, due to the huge data(a list containing about 100,000 numbers) I sent to the hist() function, error will arise when draw two figures. But it goes smoothly while only drawing either of the two plots. Could anyone help me to deal with this? Thanks in advance.
Here is the simplified code to show the error:
f_120 = plt.figure(1)
plt.hist(taccept_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'b', label = 'accepted answer')
plt.hist(tfirst_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'g',label = 'first answer')
plt.axvline(x = 30, ymin = 0, ymax = 1, color = 'r', linestyle = '--', label = '30 min')
plt.axvline(x = 60, ymin = 0, ymax = 1, color = 'c', linestyle = '--', label = '1 hour')
plt.legend()
plt.ylabel('Percentage of answered questions')
plt.xlabel('Minutes elapsed after questions are posted')
plt.title('Cumulative histogram: time elapsed \n before questions receive answer (first 2 hrs)')
plt.ylim(0,1)
plt.xlim(0,120)
f_120.show()
f_120.savefig('7month_0_120.png', format = 'png' )
plt.close()
f_2640 = plt.figure(2)
plt.hist(taccept_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'b', label = 'accepted answer')
plt.hist(tfirst_list, bins=6000000, normed = True, histtype ="step", cumulative = True, color = 'g',label = 'first answer')
plt.axvline(x = 240, ymin = 0, ymax = 1, color = 'r', linestyle = '--', label = '4 hours')
plt.axvline(x = 1440, ymin = 0, ymax = 1, color = 'c', linestyle = '--', label = '1 day')
plt.legend(loc= 4)
plt.ylabel('Percentage of answered questions')
plt.xlabel('Minutes elapsed after questions are posted')
plt.title('Cumulative histogram: time elapsed \n before questions receive answer (first 48)')
plt.ylim(0,1)
plt.xlim(0,2640)
f_2640.show()
f_2640.savefig('7month_0_2640.png', format = 'png' )
The following is the error detail:
plt.hist(tfirst_list, bins=6000000, normed = True, histtype =”step”, cumulative = True, color = ‘g’,label = ‘first answer’)
File “C:\software\Python26\lib\site-packages\matplotlib\pyplot.py”, line 2160, in hist
ret = ax.hist(x, bins, range, normed, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, **kwargs)
File “C:\software\Python26\lib\site-packages\matplotlib\axes.py”, line 7775, in hist
closed=False, edgecolor=c, fill=False) )
File “C:\software\Python26\lib\site-packages\matplotlib\axes.py”, line 6384, in fill
for poly in self._get_patches_for_fill(*args, **kwargs):
File “C:\software\Python26\lib\site-packages\matplotlib\axes.py”, line 317, in _grab_next_args
for seg in self._plot_args(remaining, kwargs):
File “C:\software\Python26\lib\site-packages\matplotlib\axes.py”, line 304, in _plot_args
seg = func(x[:,j%ncx], y[:,j%ncy], kw, kwargs)
File “C:\software\Python26\lib\site-packages\matplotlib\axes.py”, line 263, in _makefill
(x[:,np.newaxis],y[:,np.newaxis])),
File “C:\software\Python26\lib\site-packages\numpy\core\shape_base.py”, line 270, in hstack
return _nx.concatenate(map(atleast_1d,tup),1)
MemoryError
As others have noted, six million bins doesn’t sound very useful. But a simple thing would be to reuse the same figure: since the only plot elements that change are things other than the histograms, try something like this:
and after the savefig don’t close the figure and plot new histograms, instead reuse it, changing what needs to be changed:
Finally call savefig again with the new filename.