I have several data sets (distribution) as follows:
set1 = [1,2,3,4,5]
set2 = [3,4,5,6,7]
set3 = [1,3,4,5,8]
How do I plot a scatter plot with the data sets above with the y-axis being the probability (i.e. the percentile of the distribution in set: 0%-100% ) and the x-axis being the data set names?
in JMP, it is called ‘Quantile Plot’.
Something like image attached:

Please educate. Thanks.
[EDIT]
My data is in csv as such:

Using JMP analysis tool, I’m able to plot the probability distribution plot (QQ-plot/Normal Quantile Plot as figure far below):

I believe Joe Kington almost has my problem solved but, I’m wondering how to process the raw csv data into arrays of probalility or percentiles.
I doing this to automate some stats analysis in Python rather than depending on JMP for plotting.
I’m not entirely clear on what you want, so I’m going to guess, here…
You want the “Probability/Percentile” values to be a cumulative histogram?
So for a single plot, you’d have something like this? (Plotting it with markers as you’ve shown above, instead of the more traditional step plot…)
If that’s roughly what you want for a single plot, there are multiple ways of making multiple plots on a figure. The easiest is just to use subplots.
Here, we’ll generate some datasets and plot them on different subplots with different symbols…
If we want this to look like one continuous plot, we can just squeeze the subplots together and turn off some of the boundaries. Just add the following in before calling
plt.show()Hopefully that helps a bit, at any rate!
Edit: If you want percentile values, instead a cumulative histogram (I really shouldn’t have used 100 as the sample size!), it’s easy to do.
Just do something like this (using
numpy.percentileinstead of normalizing things by hand):