I am running the summary(lm(…)) function in R. When I print the coefficients I get estimates for all variables except the last variable. The last variable I get “NA”.
I tried switching the last column of data with another column and again, whatever was in the last column got “NA”, but everything else got estimates.
A little bit about the data: I have about 5 variables with data in every row and then I have 12 seasonal variables that, for example, if the month is january there is a 1 for every day in january, 0 otherwise. For february variable there is a 1 if month is february and 0 otherwise and so on. Does anyone know what would produce “NA” in the last column of the coefficient estimate? So the first time I ran it, it was the coefficient for the December dummy variable. Is it because of my monthly dummy variables? Thanks
This is my reproducible example.
dat<- data.frame(
one<-c(sample(1000:1239)),
two<-c(sample(200:439)),
three<-c(sample(600:839)),
Jan<-c(rep(1,20), rep(0,220)),
Feb<-c(rep(0,20),rep(1,20),rep(0,200)),
Mar<-c(rep(0,40),rep(1,20),rep(0,180)),
Apr<-c(rep(0,60),rep(1,20),rep(0,160)),
May<-c(rep(0,80),rep(1,20),rep(0,140)),
Jun<-c(rep(0,100),rep(1,20),rep(0,120)),
Jul<-c(rep(0,120),rep(1,20),rep(0,100)),
Aug<-c(rep(0,140),rep(1,20),rep(0,80)),
Sep<-c(rep(0,160),rep(1,20),rep(0,60)),
Oct<-c(rep(0,180),rep(1,20),rep(0,40)),
Nov<-c(rep(0,200),rep(1,20),rep(0,20)),
Dec<-c(rep(0,220),rep(1,20)
)
attach(dat)
summary(lm(one ~ two + three + Jan + Feb +
Mar + Apr + May + Jun + Jul + Aug + Sep + Oct + Nov + Dec))
You have to think a bit more about how your model is defined.
Here’s your approach (edited for readability):
And the answers:
note this line, it indicates that R (and any other statistical package you choose to use) can’t estimate all the parameters because the predictor variables are not all linearly independent.
The intercept here represents the predicted value when all predictor variables are zero. In any particular case the interpretation of the intercept depends on how you have parameterized your model. The dummy variables you have defined for month are not all linearly independent;
lmis smart enough to detect this and drop some of the unidentifiable (linearly dependent) predictor variables. The details of which particular predictor(s) are discarded in this case are obscure and technical (you would probably have to look inside thelm.fitfunction, but you probably don’t want to do this). In this case, R decides to throw away theDecemberpredictor. Therefore, if we set all the predictors (two,three, and all month dummies Jan-Nov) to zero, we end up with the expected value whentwo=0 andthree=0 and when the month is not equal to any of Jan-Nov — i.e., the expected value for December.Now do it again, this time setting up a model formula that uses
-1to discard the intercept term (we reset the random seed for reproducibility):The estimates for
twoandthreeare the same as before.The estimate for December is the same as the intercept estimate above. The other months’ parameter estimates are equal to (intercept+previous value). The p values are different, because their meaning has changed. Previously, they were a test of differences of each month from December; now they are a test of the differences of each month from a baseline value of zero.