I have a 2-d array containing pairs of values and I’d like to make a boxplot of the y-values by different bins of the x-values. I.e. if the array is:
my_array = array([[1, 40.5], [4.5, 60], ...]])
then I’d like to bin my_array[:, 0] and then for each of the bins, produce a boxplot of the corresponding my_array[:, 1] values that fall into each box. So in the end I want the plot to contain number of bins-many box plots.
I tried the following:
min_x = min(my_array[:, 0])
max_x = max(my_array[:, 1])
num_bins = 3
bins = linspace(min_x, max_x, num_bins)
elts_to_bins = digitize(my_array[:, 0], bins)
However, this gives me values in elts_to_bins that range from 1 to 3. I thought I should get 0-based indices for the bins, and I only wanted 3 bins. I’m assuming this is due to some trickyness with how bins are represented in linspace vs. digitize.
What is the easiest way to achieve this? I want num_bins-many equally spaced bins, with the first bin containing the lower half of the data and the upper bin containing the upper half… i.e., I want each data point to fall into some bin, so that I can make a boxplot.
thanks.
Numpy has a dedicated function for creating histograms the way you need to:
which you can use like:
The key point here is to use the
weightsargument: each valuea[i]will contributeweights[i]to the histogram. Example:describes 10 points at x = 0 and 2 points at x = 1.
You can set the number of bins, or the bin limits, with the
binsargument (see the official documentation for more details).The histogram can then be plotted with something like:
If you only need to do a histogram plot, the similar hist() function can directly plot the histogram: