In R, optim command uses Nelder-Mead method to optimize a function. An article states
The parameters to be estimated are optimized over initial values. As a result, different initial values will lead to different estimates.
What does it mean by the parameters to be estimated are optimized over initial values?
@GavinSimpson’s request to cite your sources is well-founded.
That said, this is a basic optimization concept. In general, you have to pick starting values for your parameters (or the routine has to guess them for you). Because optimization generally finds local minima or maxima, if you start near a local minimum that is not also the global minimum, you are likely to find that local min, not the global min.
Here’s an example. First, create and plot a 6th order polynomial (with multiple local minima). Then optimize starting from three different points.
See how the starting value matters?
N.B. I realize that Nelder-Mead is not ideal for a univariate distribution, but I used it here for simplicity since it illustrates the point.