I am trying to figure out what the right parameter in the hist function in R does. The documentation is unfortunately unclear to someone without a deep understanding of statistics such as myself.
The documentation as stated online is:
right logical; if TRUE, the histograms cells are right-closed (left open) intervals.
What does it mean to be right-closed (or left open) intervals?
When creating histograms of non-categorial data (things like pH, temperature, etc.), you need to specify things called “bins”. Each bin has something called an interval specified for it. For example, if I have the data:
I can create 5 bins with right-open, left-closed intervals like this:
What this means is that the first bin will “hold” values between 10 and 12, including 10 but not including 12. The interval notation used above is shorthand for this:
So that means the values 11 will go into the 1st bin, but the value 12 will go into the second bin, etc. R will do this binning process for you then draw the histogram based on how many items are in each bin. For the above data, you’ll get a rather not-interesting (or interesting, depending on your expectations) histogram that is mostly flat except at the first bin.
The following examples illustrate what the different combinations of brackets and parentheses mean when using interval notation (assume x is an element of the real number line):
Note that you can’t use brackets for infinities, assuming you’re not using the extended real number line
If I want left-open, right-closed intervals, then the bins would look like this:
See the difference? In this case, now values 11, and 12 will go into the first bin. This may change in the appearance of the histogram depending on how you bin the data. Now, this time your histogram is still almost flat but now the 5th bin is different from the rest (only 1 data point instead of 2 for the rest).
Now, fortunately in R you don’t have to specify the bins yourself, but R is nice enough to ask you whether you want the bins to be left-closed, right-open (
[a, b)) or left-open, right-closed ((a, b]). That’s the difference you get w.r.t the “right” parameter does in thehist()function.