Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8916915
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 15, 20262026-06-15T05:22:09+00:00 2026-06-15T05:22:09+00:00

I am using the book Generalized Linear Models and Extension by Hardin and Hilbe

  • 0

I am using the book “Generalized Linear Models and Extension” by Hardin and Hilbe (second edition, 2007) at the moment. The authors suggest that instead of OLS models, “the log link is generally used for response data that take only positive values on the continuous scale”. Of course they also suggest residual plots to check whether a “normal” linear model using an identity link can still be used.

I am trying to replicate in R what they do in the book in STATA. Indeed, I have no problems in STATA with the log link. However, when calling the same model using R’s glm-function, but specifying family=gaussian(link="log") I am asked to provide starting values. When I set them all equal to zero, I always get the message that the algorithm did not converge. Picking other values the message is sometimes the same, but more often I get:

Error in glm.fit(x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,  :
     NA/NaN/Inf in 'x'

As I said, in STATA I can run these models without setting starting values and without errors. I tried many different models, and different datasets, but the problem is always the same (unless I only include one single independent variable). Could anyone tell me why this is the case, or what I do wrong, or why the suggested models from the book might not be appropriate? I’d appreciate any help, thanks!

Edit: As an example which reproduces the error consider the dataset which can be downloaded here. With this dataset loaded, I run the following model:

mod <- glm(betaplasma ~ age + vituse, family=gaussian(link="log"), data=data2, start=c(0,0,0))

This produces the the warning message that the algorithm did not converge.

Edit2: I was asked to also provide the STATA output for that model. Here it is:

. glm betaplasma age vituse, link(log)

Iteration 0:   log likelihood = -2162.1385  
Iteration 1:   log likelihood = -2096.4765  
Iteration 2:   log likelihood = -2076.2465  
Iteration 3:   log likelihood = -2076.2244  
Iteration 4:   log likelihood = -2076.2244  

Generalized linear models                          No. of obs      =       315
Optimization     : ML                              Residual df     =       312
                                                   Scale parameter =  31384.51
Deviance         =  9791967.359                    (1/df) Deviance =  31384.51
Pearson          =  9791967.359                    (1/df) Pearson  =  31384.51

Variance function: V(u) = 1                        [Gaussian]
Link function    : g(u) = ln(u)                    [Log]

                                                   AIC             =  13.20142
Log likelihood   = -2076.224437                    BIC             =   9790173

------------------------------------------------------------------------------
             |                 OIM
  betaplasma |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |   .0056809   .0032737     1.74   0.083    -.0007354    .0120972
      vituse |   -.273027   .0650773    -4.20   0.000    -.4005762   -.1454779
       _cons |   5.467577   .2131874    25.65   0.000     5.049738    5.885417
------------------------------------------------------------------------------
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-15T05:22:10+00:00Added an answer on June 15, 2026 at 5:22 am

    As I said in my comment, it’s probably true that Stata has more robust (in the numerical, not the statistical sense) GLM fitting than R. That said, fitting this particular dataset doesn’t seem too hard.

    Read data:

    data2 <- read.table("http://lib.stat.cmu.edu/datasets/Plasma_Retinol",
             skip=30,nrows=315)
    dnames <- c("age","sex","smokstat","quetelet","vituse","calories","fat","fiber",
               "alcohol","cholesterol","betadiet","retdiet","betaplasma","retplasma")
    names(data2) <- dnames
    

    Plot the data:

    par(mfrow=c(1,2),las=1,bty="l")
    with(data2,plot(betaplasma~age))
    with(data2,boxplot(betaplasma~vituse))
    

    enter image description here

    It’s fairly easy to get these to fit by setting the starting value of the intercept parameter to something reasonable (i.e. something close to the mean of the data on the log scale: either of these works

    mod <- glm(betaplasma ~ age + vituse, family=gaussian(link="log"), data=data2,
               start=c(10,0,0))
    mod <- glm(betaplasma ~ age + vituse, family=gaussian(link="log"), data=data2,
               start=c(log(mean(data2$betaplasma)),0,0))
    

    The latter case is probably a reasonable default strategy for starting log-link fits. The results (slightly abbreviated) match Stata’s very closely:

    summary(mod)
    ## 
    ## Coefficients:
    ##              Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)  5.467575   0.218360  25.039  < 2e-16 ***
    ## age          0.005681   0.003377   1.682   0.0935 .  
    ## vituse      -0.273027   0.065552  -4.165 4.03e-05 ***
    ## ---
    ## Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 
    ## 
    ## (Dispersion parameter for gaussian family taken to be 31385.26)
    ## 
    ##     Null deviance: 10515638  on 314  degrees of freedom
    ## Residual deviance:  9791967  on 312  degrees of freedom
    ## AIC: 4160.4
    ## 
    ## Number of Fisher Scoring iterations: 9
    
    confint(mod)
    ##                     2.5 %      97.5 %
    ## (Intercept)  5.0364648709  5.87600710
    ## age         -0.0007913795  0.01211007
    ## vituse      -0.4075213916 -0.14995759
    

    (R is using t rather than Z statistics for the p-values and (?) confidence intervals)

    However, there are a few reasons I might not fit this model to these data. In particular, the assumption of constant variance (associated with the Gaussian model) is not very reasonable — these data seem better suited for a lognormal model (or equivalently, for just log-transforming and analyzing with a standard Gaussian model).

    Plotting on a log(1+x) scale (there is a zero entry in the data):

    with(data2,plot(log(1+betaplasma)~age))
    with(data2,boxplot(log(1+betaplasma)~vituse))
    

    enter image description here

    Plotting with ggplot (this fits separate lines for each value of vituse rather than fitting an additive model)

    library(ggplot)
    theme_set(theme_bw())
    (g1 <- qplot(age,1+betaplasma,colour=factor(vituse),data=data2)+
        geom_smooth(method="lm")+
        scale_y_log10())
    

    enter image description here

    View without ‘outlier’:

    g1 %+% subset(data2,betaplasma>0)
    

    enter image description here

    Two other points: (1) it’s a bit odd that there’s a response with a value of 0 in this data set — not impossible, but odd; (2) it looks like vituse should be treated as a factor rather than as numeric (“1=Yes, fairly often, 2=Yes, not often, 3=No”) — possibly ordinal.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Using Mongoid 2.4.5 on Rails 3.2.1 I have a Model Book that has_many :pages
I am fairly new to R and presently reading a book Generalized Additive Models,
I'm using a book API that returns the following var _OLBookInfo = { ISBN:234234234234234:
I'm using a book called Android Wireless Application Development 2nd edition 2009 (L.Darcey &
I am studying the Windows programming by using the book Programming.Windows.5th.Edition(Charles Petzold). When I
I have a model like this: class Book(models.Model): authors = models.ManyToManyField(User) # User is
I'm using Google Book Search API to add missings bits and pieces to my
I was trying to use the function glMultiDrawElements while studying OpenGL (using red book)
I'm a newb to RoR. I'm using the book Agile Web Development with Rails
I am learning OpenCV using Learning OpenCV book. One problem I am facing while

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.