i’m trying to fit a negbin model with sqrt link. Unfortunately it seems to be that I have to specify starting values. Is anybody familiar with setting starting values when running the glm.nb command (package MASS)?
When I don’t use starting values, I get an error message:
no valid set of coefficients has been found: please supply starting values
Looking at ?glm.nb it seems to be possible to set starting values, unfortunately I absolutely don’t know how to do this. Some further information: 1.When computing the regression with the standard log link, the regression can be estimated. 2. It is not possible to set the start value for the algorithm to an arbitrary value, so for example
glm.nb(<model>,link=sqrt, start=1)
does not work!
Finding suitable starting values can be difficult for sufficiently complex problems. However for setting the starting values (the documentation of which is not great, but exists) you should learn to read the error messages. Here is a replicate of your unsuccessful attempt using
start=1with a built-in data set:It tells you exactly what it is expecting: a vector of values for each coefficient to be estimated.
works, because I gave a vector of length 7. You might have to play around with the actual values in it to get a model that always predicts positive values. It is likely that the default algorithm of generating starting values in
glm.nbgives negative prediction somewhere, and thesqrtlink cannot tolerate that (unlike thelog). If you are having trouble finding valid starting values by hand, you can try running a simpler model, and expand estimates from it by 0’s for the other parameters to get a good starting location.EDIT: building up a model
Suppose you can’t find valid starting values for your complicated model. Then start with a simple one, for example
Now let’s add the next variable using the previous starting values by adding 0 estimates for the effect of the new variable (in this case
Agehas four levels, so needs 3 coefficients):You usually want to keep adding 0’s and not, say, 100’s, because a coefficient of 0 means that the new variable has no effect – which is exactly what the simpler model that you just fitted assumes.