I’m trying to find the min of a function in N parameters using gradient descent. However I want to do that while limiting the sum of absolute values of the parameters to be 1 (or <= 1, doesn’t matter). For this reason I am using the method of lagrange multipliers so if my function is f(x), I will be minimizing f(x) + lambda * (g(x)-1) where g(x) is a smooth approximation for the sum of absolute values of the parameters.
Now as I understand, the gradient of this function will only be 0 when g(x)=1, so that a method to find a local minimum should find the minimum of my function in which my condition is also satisfied. The problem is that this addition my function unbounded so that Gradient Descent simply finds larger and larger lambdas with larger and larger parameters (in absolute value) and never converges.
At the moment I’m using python’s (scipy) implementation of CG so I would really prefer suggestions that do not require me to re-write / tweak the CG code myself but use an existing method.
The problem is that when using Lagrange multipliers, the critical points don’t occur at local minima of the Lagrangian – they occur at saddle points instead. Since the gradient descent algorithm is designed to find local minima, it fails to converge when you give it a problem with constraints.
There are typically three solutions:
Personally, I would go with the third approach, and find the gradient of the square of the gradient of the Lagrangian numerically if it’s too difficult to get an analytic expression for it.
Also, you don’t quite make it clear in your question – are you using gradient descent, or CG (conjugate gradients)?