Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function $\displaystyle f(x)$ :

\displaystyle f(x)=\|Ax-b\|^{2}

The minimum of $f$ is obtained when the gradient is 0:

\nabla _{x}f=2A^{T}(Ax-b)=0

.

Whereas linear conjugate gradient seeks a solution to the linear equation $\displaystyle A^{T}Ax=A^{T}b$ , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient $\nabla _{x}f$ alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there.

Given a function $\displaystyle f(x)$ of $N$ variables to minimize, its gradient $\nabla _{x}f$ indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction:

\Delta x_{0}=-\nabla _{x}f(x_{0})

with an adjustable step length $\displaystyle \alpha$ and performs a line search in this direction until it reaches the minimum of $\displaystyle f$ :

\displaystyle \alpha _{0}:=\arg \min _{\alpha }f(x_{0}+\alpha \Delta x_{0})

,

\displaystyle x_{1}=x_{0}+\alpha _{0}\Delta x_{0}

After this first iteration in the steepest direction $\displaystyle \Delta x_{0}$ , the following steps constitute one iteration of moving along a subsequent conjugate direction $\displaystyle s_{n}$ , where $\displaystyle s_{0}=\Delta x_{0}$ :

Calculate the steepest direction: $\Delta x_{n}=-\nabla _{x}f(x_{n})$ ,
Compute $\displaystyle \beta _{n}$ according to one of the formulas below,
Update the conjugate direction: $\displaystyle s_{n}=\Delta x_{n}+\beta _{n}s_{n-1}$
Perform a line search: optimize $\displaystyle \alpha _{n}=\arg \min _{\alpha }f(x_{n}+\alpha s_{n})$ ,
Update the position: $\displaystyle x_{n+1}=x_{n}+\alpha _{n}s_{n}$ ,

With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parameters $\displaystyle \alpha$ and $\displaystyle \beta$ are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys, where the steepest descent method slows down and follows a criss-cross pattern.

Four of the best known formulas for $\displaystyle \beta _{n}$ are named after their developers:

Fletcher–Reeves:^[1]

\beta _{n}^{FR}={\frac {\Delta x_{n}^{T}\Delta x_{n}}{\Delta x_{n-1}^{T}\Delta x_{n-1}}}.

Polak–Ribière:^[2]

\beta _{n}^{PR}={\frac {\Delta x_{n}^{T}(\Delta x_{n}-\Delta x_{n-1})}{\Delta x_{n-1}^{T}\Delta x_{n-1}}}.

Hestenes-Stiefel:^[3]

\beta _{n}^{HS}=-{\frac {\Delta x_{n}^{T}(\Delta x_{n}-\Delta x_{n-1})}{s_{n-1}^{T}(\Delta x_{n}-\Delta x_{n-1})}}.

Dai–Yuan:^[4]

\beta _{n}^{DY}=-{\frac {\Delta x_{n}^{T}\Delta x_{n}}{s_{n-1}^{T}(\Delta x_{n}-\Delta x_{n-1})}}.

.

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is $\displaystyle \beta =\max\{0,\beta ^{PR}\}$ , which provides a direction reset automatically.^[5]

Newton-based methods – Newton-Raphson Algorithm, Quasi-Newton methods (e.g., BFGS method) – tend to converge in fewer iterations, although each iteration typically requires more computation than a conjugate gradient iteration, as Newton-like methods require computing the Hessian (matrix of second derivatives) in addition to the gradient. Quasi-Newton methods also require more memory to operate (see also the limited-memory L-BFGS method).

References

↑ R. Fletcher and C. M. Reeves, "Function minimization by conjugate gradients", Comput. J. 7 (1964), 149–154.
↑ E. Polak and G. Ribière, "Note sur la convergence de directions conjugu´ee", Rev. Francaise Informat Recherche Operationelle, 3e Ann´ee 16 (1969), 35–43.
↑ M. R. Hestenes and E. Stiefel, "Methods of conjugate gradients for solving linear systems", J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953).
↑ Y.-H. Dai and Y. Yuan, "A nonlinear conjugate gradient method with a strong global convergence property", SIAM J. Optim. 10 (1999), no. 1, 177–182.
↑ J. R. Shewchuk, "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain", August 1994.

External links

An Introduction to the Conjugate Gradient Method Without the Agonizing Pain by Jonathan Richard Shewchuk.
A NONLINEAR CONJUGATE GRADIENT METHOD WITH A STRONG GLOBAL CONVERGENCE PROPERTY by Y. H. DAI and Y. YUAN.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] R. Fletcher and C. M. Reeves, "Function minimization by conjugate gradients", Comput. J. 7 (1964), 149–154.

[2] E. Polak and G. Ribière, "Note sur la convergence de directions conjugu´ee", Rev. Francaise Informat Recherche Operationelle, 3e Ann´ee 16 (1969), 35–43.

[3] M. R. Hestenes and E. Stiefel, "Methods of conjugate gradients for solving linear systems", J. Research Nat. Bur. Standards 49 (1952), 409–436 (1953).

[4] Y.-H. Dai and Y. Yuan, "A nonlinear conjugate gradient method with a strong global convergence property", SIAM J. Optim. 10 (1999), no. 1, 177–182.

[5] J. R. Shewchuk, "An Introduction to the Conjugate Gradient Method Without the Agonizing Pain", August 1994.

Nonlinear conjugate gradient method

See also

References

External links