I’m trying to use the R function quantcut() to recode a numeric variable as a factor with levels corresponding to quantiles. For example:
> X
[1] 6 4 9 6 1 2 5 3 5 7 10 7 2 7 7 5 6 6 3 4 6 4 2 7 6 7
[27] 4 3 5 3 7 6 8 12 4 4 0 1 7 6 7 4 7 1 1 1 2 3 3 1 1 6
[53] 5 3 1 1 1 3 3 3 1 1 3 1 1 1 3 3 0 1 3 1 8 5 3 0 0 2
[79] 1 3 8 0 1 4 1 1 1 1 1 1 3 2 1 4 1 5 5 12 7 2 6 6 2 6
[105] 0 1 4 1 4 0 7 3 2 1 1 8 5 5 3 0 5 6 2 4 2 2 2 6 4 2
[131] 2 2 2 6 8 5 1 2 8 3 2 7 4 6 6 6 7 5 1 5 5 6 1 4 4 5
[157] 6 2 4 7 2 4 10 6 3 5 2 2 6 6 2 4 5 7 4 5 11 6 6 8 2 4
[183] 4 6 12 16 9 7 14 13 11 5 5 2 2 7 7 6 4 3 4 3 5 4 5 7 9 4
[209] 3 12 4 4 4 8 7 6 1 3 6 7 5 5 6 9 6 4 7 8 5 6 3 6 4 7
[235] 3 3 4 7 5 7 5 9 5 8 3 4 3 2 5 2 4 3 8 4 2 2 1 5 3 5
[261] 8 5 6 4 5 1 1 2 6 2 7 2 4 4 3 3 4 10 5 6 10 2 5 5 0 1
[287] 6 2 5 4 6 6 9 5 5 6 3 8 1 5 1 8 5 2 5 2 4 2 4 4
bins=10
labels = 1:bins
library(gtools)
x2 = quantcut(X, q = seq(0, 1, by=1/bins), labels=labels)
I get the error: “Error in cut.default(x[!flag], breaks = newquant, include.lowest = TRUE, :
‘breaks’ are not unique”. I thought this was because there are ties in the quantiles, but the documentation for quantcut specifically shows an example of how the function can handle ties by using fewer intervals. The error occurs regardless of whether I specify the labels argument.
Any advice would be greatly appreciated.
EDIT: Here is code to enter the variable X:
X = c(6L, 4L, 9L, 6L, 1L, 2L, 5L, 3L, 5L, 7L, 10L, 7L, 2L, 7L, 7L,
5L, 6L, 6L, 3L, 4L, 6L, 4L, 2L, 7L, 6L, 7L, 4L, 3L, 5L, 3L, 7L,
6L, 8L, 12L, 4L, 4L, 0L, 1L, 7L, 6L, 7L, 4L, 7L, 1L, 1L, 1L,
2L, 3L, 3L, 1L, 1L, 6L, 5L, 3L, 1L, 1L, 1L, 3L, 3L, 3L, 1L, 1L,
3L, 1L, 1L, 1L, 3L, 3L, 0L, 1L, 3L, 1L, 8L, 5L, 3L, 0L, 0L, 2L,
1L, 3L, 8L, 0L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 2L, 1L, 4L,
1L, 5L, 5L, 12L, 7L, 2L, 6L, 6L, 2L, 6L, 0L, 1L, 4L, 1L, 4L,
0L, 7L, 3L, 2L, 1L, 1L, 8L, 5L, 5L, 3L, 0L, 5L, 6L, 2L, 4L, 2L,
2L, 2L, 6L, 4L, 2L, 2L, 2L, 2L, 6L, 8L, 5L, 1L, 2L, 8L, 3L, 2L,
7L, 4L, 6L, 6L, 6L, 7L, 5L, 1L, 5L, 5L, 6L, 1L, 4L, 4L, 5L, 6L,
2L, 4L, 7L, 2L, 4L, 10L, 6L, 3L, 5L, 2L, 2L, 6L, 6L, 2L, 4L,
5L, 7L, 4L, 5L, 11L, 6L, 6L, 8L, 2L, 4L, 4L, 6L, 12L, 16L, 9L,
7L, 14L, 13L, 11L, 5L, 5L, 2L, 2L, 7L, 7L, 6L, 4L, 3L, 4L, 3L,
5L, 4L, 5L, 7L, 9L, 4L, 3L, 12L, 4L, 4L, 4L, 8L, 7L, 6L, 1L,
3L, 6L, 7L, 5L, 5L, 6L, 9L, 6L, 4L, 7L, 8L, 5L, 6L, 3L, 6L, 4L,
7L, 3L, 3L, 4L, 7L, 5L, 7L, 5L, 9L, 5L, 8L, 3L, 4L, 3L, 2L, 5L,
2L, 4L, 3L, 8L, 4L, 2L, 2L, 1L, 5L, 3L, 5L, 8L, 5L, 6L, 4L, 5L,
1L, 1L, 2L, 6L, 2L, 7L, 2L, 4L, 4L, 3L, 3L, 4L, 10L, 5L, 6L,
10L, 2L, 5L, 5L, 0L, 1L, 6L, 2L, 5L, 4L, 6L, 6L, 9L, 5L, 5L,
6L, 3L, 8L, 1L, 5L, 1L, 8L, 5L, 2L, 5L, 2L, 4L, 2L, 4L, 4L)
Okay, the issue can be traced to here, where as you say, the 70% and 80% quantiles are the same.
quantileis used internally byquantcutI can’t see how to address this issue using
quantcutitself, but you could always just usecutandquantileanduniquein combination to sort it out. From what I can tell, this is whatquantcutdoes internally when there are ties anyway.