Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8299455
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 8, 20262026-06-08T16:09:40+00:00 2026-06-08T16:09:40+00:00

I have an optimization problem that the Nelder-Mead method will solve, but that I

  • 0

I have an optimization problem that the Nelder-Mead method will solve, but that I would also like to solve using BFGS or Newton-Raphson, or something that takes a gradient function, for more speed, and hopefully more precise estimates. I wrote such a gradient function following (I thought) the example in the optim / optimx documentation, but when I use it with BFGS my starting values either don’t move (optim()), or else the function outright doesn’t run (optimx(), which returns Error: Gradient function might be wrong - check it!). I’m sorry there’s a bit of code involved in reproducing this, but here goes:

This is the function that I want to get parameter estimates for (this is for smoothing old-age mortality rates, where x is age, starting at age 80):

    KannistoMu <- function(pars, x = .5:30.5){
      a <- pars["a"]
      b <- pars["b"]
      (a * exp(b * x)) / (1 + a * exp(b * x))
    }

And here’s a log likelihood function for estimating it from observed rates (defined as deaths, .Dx over exposure, .Exp):

    KannistoLik1 <- function(pars, .Dx, .Exp, .x. = .5:30.5){
      mu <- KannistoMu(exp(pars), x = .x.)
      # take negative and minimize it (default optimizer behavior)
      -sum(.Dx * log(mu) - .Exp * mu, na.rm = TRUE) 
    }

you see exp(pars) in there because I give log(pars) to optimize over, in order to constrain the final a and b to be positive.

Example data (1962 Japan females, if anyone is curious):

    .Dx <- structure(c(10036.12, 9629.12, 8810.11, 8556.1, 7593.1, 6975.08, 
      6045.08, 4980.06, 4246.06, 3334.04, 2416.03, 1676.02, 1327.02, 
      980.02, 709, 432, 350, 217, 134, 56, 24, 21, 10, 8, 3, 1, 2, 
      1, 0, 0, 0), .Names = c("80", "81", "82", "83", "84", "85", "86", 
      "87", "88", "89", "90", "91", "92", "93", "94", "95", "96", "97", 
      "98", "99", "100", "101", "102", "103", "104", "105", "106", 
      "107", "108", "109", "110"))
    .Exp <- structure(c(85476.0333333333, 74002.0866666667, 63027.5183333333, 
      53756.8983333333, 44270.9, 36749.85, 29024.9333333333, 21811.07, 
      16912.315, 11917.9583333333, 7899.33833333333, 5417.67, 3743.67833333333, 
      2722.435, 1758.95, 1043.985, 705.49, 443.818333333333, 223.828333333333, 
      93.8233333333333, 53.1566666666667, 27.3333333333333, 16.1666666666667, 
      10.5, 4.33333333333333, 3.16666666666667, 3, 2.16666666666667, 
      1.5, 0, 1), .Names = c("80", "81", "82", "83", "84", "85", "86", 
      "87", "88", "89", "90", "91", "92", "93", "94", "95", "96", "97", 
      "98", "99", "100", "101", "102", "103", "104", "105", "106", 
      "107", "108", "109", "110"))

The following works for the Nelder-Mead method:

    NMab <- optim(log(c(a = .1, b = .1)), 
      fn = KannistoLik1, method = "Nelder-Mead",
      .Dx = .Dx, .Exp = .Exp)
    exp(NMab$par) 
    # these are reasonable estimates
       a         b 
    0.1243144 0.1163926

This is the gradient function I came up with:

    Kannisto.gr <- function(pars, .Dx, .Exp, x = .5:30.5){
      a <- exp(pars["a"])
      b <- exp(pars["b"])
      d.a <- (a * exp(b * x) * .Exp + (-a * exp(b * x) - 1) * .Dx) /
        (a ^ 3 * exp(2 * b * x) + 2 * a ^ 2 * exp(b * x) + a)
      d.b <- (a * x * exp(b * x) * .Exp + (-a * x * exp(b * x) - x) * .Dx) /
        (a ^ 2 * exp(2 * b * x) + 2 * a * exp(b * x) + 1)
      -colSums(cbind(a = d.a, b = d.b), na.rm = TRUE)
    }

The output is a vector of length 2, the change with respect to the parameters a and b. I also have an uglier version arrived at by exploiting the output of deriv(), which returns the same answer, and which I don’t post (just to confirm that the derivatives are right).

If I supply it to optim() as follows, with BFGS as the method, the estimates do not move from the starting values:

    BFGSab <- optim(log(c(a = .1, b = .1)), 
      fn = KannistoLik1, gr = Kannisto.gr, method = "BFGS",
      .Dx = .Dx, .Exp = .Exp)
    # estimates do not change from starting values:
    exp(BFGSab$par) 
      a   b 
    0.1 0.1

When I look at the $counts element of the output, it says that KannistoLik1() was called 31 times and Kannisto.gr() just 1 time. $convergence is 0, so I guess it thinks it converged (if I give less reasonable starts they also stay put). I reduced the tolerance, etc, and nothing changes. When I try the same call in optimx() (not shown), I receive the waring I mentioned above, and no object is returned. I get the same results when specifying gr = Kannisto.gr with the "CG". With the "L-BFGS-B" method I get the same starting values back as estimate, but it is also reported that both function and gradient were called 21 times, and there is an error message:
"ERROR: BNORMAL_TERMINATION_IN_LNSRCH"

I’m hoping that there is some minor detail in the way the gradient function is written that will solve this, as this later warning and the optimx behavior are bluntly hinting that the function simply isn’t right (I think). I also tried the maxNR() maximizer from the maxLik package and observed similar behavior (starting values don’t move). Can anyone give me a pointer? Much obliged

[Edit]
@Vincent suggested I compare with the output from a numerical approximation:

    library(numDeriv)
    grad( function(u) KannistoLik1( c(a=u[1], b=u[2]), .Dx, .Exp ), log(c(.1,.1)) )
    [1] -14477.40  -7458.34
    Kannisto.gr(log(c(a=.1,b=.1)), .Dx, .Exp)
     a        b 
    144774.0  74583.4 

so different sign, and off by a factor of 10? I change the gradient function to follow suit:

    Kannisto.gr2 <- function(pars, .Dx, .Exp, x = .5:30.5){
      a <- exp(pars["a"])
      b <- exp(pars["b"])
      d.a <- (a * exp(b * x) * .Exp + (-a * exp(b * x) - 1) * .Dx) /
        (a ^ 3 * exp(2 * b * x) + 2 * a ^ 2 * exp(b * x) + a)
      d.b <- (a * x * exp(b * x) * .Exp + (-a * x * exp(b * x) - x) * .Dx) /
        (a ^ 2 * exp(2 * b * x) + 2 * a * exp(b * x) + 1)
      colSums(cbind(a=d.a,b=d.b), na.rm = TRUE) / 10
    }
    Kannisto.gr2(log(c(a=.1,b=.1)), .Dx, .Exp)
    # same as numerical:
      a         b 
    -14477.40  -7458.34 

Try it in the optimizer:

    BFGSab <- optim(log(c(a = .1, b = .1)), 
      fn = KannistoLik1, gr = Kannisto.gr2, method = "BFGS",
      .Dx = .Dx, .Exp = .Exp)
    # not reasonable results:
    exp(BFGSab$par) 
      a   b 
    Inf Inf 
    # and in fact, when not exp()'d, they look oddly familiar:
    BFGSab$par
      a         b 
    -14477.40  -7458.34 

Following Vincent’s answer, I rescaled the gradient function, and used abs() instead of exp() to keep parameters positive. The most recent, and better performing objective and gradient functions:

    KannistoLik2 <- function(pars, .Dx, .Exp, .x. = .5:30.5){
      mu <- KannistoMu.c(abs(pars), x = .x.)
      # take negative and minimize it (default optimizer behavior)
      -sum(.Dx * log(mu) - .Exp * mu, na.rm = TRUE) 
    }

    # gradient, to be down-scaled in `optim()` call
    Kannisto.gr3 <- function(pars, .Dx, .Exp, x = .5:30.5){
      a <- abs(pars["a"])
      b <- abs(pars["b"])
      d.a <- (a * exp(b * x) * .Exp + (-a * exp(b * x) - 1) * .Dx) /
        (a ^ 3 * exp(2 * b * x) + 2 * a ^ 2 * exp(b * x) + a)
      d.b <- (a * x * exp(b * x) * .Exp + (-a * x * exp(b * x) - x) * .Dx) /
        (a ^ 2 * exp(2 * b * x) + 2 * a * exp(b * x) + 1)
      colSums(cbind(a = d.a, b = d.b), na.rm = TRUE) 
    }

    # try it out:
    BFGSab2 <- optim(
      c(a = .1, b = .1), 
      fn = KannistoLik2, 
      gr = function(...) Kannisto.gr3(...) * 1e-7, 
      method = "BFGS",
      .Dx = .Dx, .Exp = .Exp
    )
    # reasonable:
    BFGSab2$par
            a         b 
    0.1243249 0.1163924 

    # better:
    KannistoLik2(exp(NMab1$par),.Dx = .Dx, .Exp = .Exp) > KannistoLik2(BFGSab2$par,.Dx = .Dx, .Exp = .Exp)
    [1] TRUE

This was solved much faster than I was expecting, and I learned more than a couple tricks. Thanks Vincent!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-08T16:09:42+00:00Added an answer on June 8, 2026 at 4:09 pm

    To check if the gradient is correct,
    you can compare it with a numeric approximation:

    library(numDeriv); 
    grad( function(u) KannistoLik1( c(a=u[1], b=u[2]), .Dx, .Exp ), c(1,1) ); 
    Kannisto.gr(c(a=1,b=1), .Dx, .Exp)
    

    The signs are wrong: the algorithm does not see any improvement
    when it moves in this direction, and therefore does not move.

    You can use some computer algebra system (here, Maxima)
    to do the computations for you:

    display2d: false;
    f(a,b,x) := a * exp(b*x) / ( 1 + a * exp(b*x) );
    l(a,b,d,e,x) := - d * log(f(a,b,x)) + e * f(a,b,x);
    factor(diff(l(exp(a),exp(b),d,e,x),a));
    factor(diff(l(exp(a),exp(b),d,e,x),b));
    

    I just copy and paste the result into R:

    f_gradient <- function(u, .Dx, .Exp, .x.=.5:30.5) {
      a <- u[1]
      b <- u[1]
      x <- .x.
      d <- .Dx
      e <- .Exp
      c(
        sum( (e*exp(exp(b)*x+a)-d*exp(exp(b)*x+a)-d)/(exp(exp(b)*x+a)+1)^2 ),
        sum( exp(b)*x*(e*exp(exp(b)*x+a)-d*exp(exp(b)*x+a)-d)/(exp(exp(b)*x+a)+1)^2 )
      )  
    }
    
    library(numDeriv)
    grad( function(u) KannistoLik1( c(a=u[1], b=u[2]), .Dx, .Exp ), c(1,1) )
    f_gradient(c(a=1,b=1), .Dx, .Exp)  # Identical
    

    If you blindly put the gradient in the optimization,
    there is a numeric instability problem: the solution given is (Inf,Inf)…
    To prevent it, you can rescale the gradient
    (a better workaround would be to use a less explosive transformation than the exponential,
    to ensure that the parameters remain positive).

    BFGSab <- optim(
      log(c(a = .1, b = .1)), 
      fn = KannistoLik1, 
      gr = function(...) f_gradient(...) * 1e-3, 
      method = "BFGS",
      .Dx = .Dx, .Exp = .Exp
    )
    exp(BFGSab$par) # Less precise than Nelder-Mead
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am using the new System.Web.Optimization and have created a bundle like this: bundles.Add(New
I have to solve the following optimization problem: Given a set of elements (E1,E2,E3,E4,E5,E6)
(This is not exactly the problem that I have, but it's isomorphic, and I
I have a little math problem. I would like to have a function with
I have a cost optimization request that I don't know how if there is
I have come across loop-unrolling but what other types of compiler optimization are there
I have an optimization problem for the headers of a table with plain style.
I have a continuous linear programming problem that involves maximizing a linear function over
Ok so I have a optimization that I need to make to a Rails
I have a complex objective function I am looking to optimize. The optimization problem

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.