Here’s a problem I’m encountering:
Example Data
df <- data.frame(1,2,3,4,5,6,7,8)
df <- rbind(df,df,df,df)
What I would like to do is find the p.value for the chisq.test of 1,2,3 vs. 4,5,6 in the data.frame defined above in the first row.
Let’s try it flat out:
chisq.test(c(1,2,3),c(4,5,6))$p.value ## this works.
But when I try to do it by calling the columns/rows…
chisq.test(df[1,1:3],df[1,4:6])$p.value
Gives: Error in complete.cases(x, y) : not all arguments have the same length
Interesting, because that doesn’t seem to be true:
length(df[1,1:3])
length(df[1,4:6])
Any thoughts on how to change the notation to get the desired result?
?chisq.testtells us:If we look at
dfas per your Q, the subsets you define are:and the same for your other subset. What happens then is in the lap of the God’s. What happens internally is that as
df[1,1:3]is a data frame, it is converted first to a one column matrix, and thence to a vector:whilst
df[1,4:6](yin thechisq.testfunction) is left untouched:and when the code calls
complete.cases(x,y), we get the error you report:complete.casescalls internal code so we can’t see what is going on, but essentially R thinksxandyare not of the same length and this is because they are of different types.@Prasad provides a work around, namely unlisting the 2 data frames you supply to
chisq.testinto vectors.However, the way you are using the function doesn’t make much sense, to me at least. One would normally store the data in columns, rather than rows of a data frame. It might not appear like there is a difference, but the columns of the data frame are its components, like the components of a list. Each individual component (column) is a discrete entity, a vector of data on the /n/ observations in the data frame. If we transpose your
df(and cast back to a data frame) to reflect a more natural data set-up:then we can use the approach you did, but index the separate rows of the first column of
df2(rather than the separate columns of the first row ofdf) in thechisq.testcall:This works, because R is able to drop the empty dimension in both subsets, so both inputs are vectors of the appropriate length: