LogSumExp

The LogSumExp (LSE) (also called softmax[1]) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms.[2] It is defined as the logarithm of the sum of the exponentials of the arguments:

\mathrm {LSE} (x_{1},\dots ,x_{n})=\log \left(\exp(x_{1})+\cdots +\exp(x_{n})\right)

In tropical analysis, this is the sum in the log semiring.

Properties

The LogSumExp function domain is $\mathbb {R} ^{n}$ , the real coordinate space, and its range is $\mathbb {R}$ , the real line. The larger the values of $x_{k}$ or their deviation, the better the approximation becomes. The LogSumExp function is convex, and is strictly monotonically increasing everywhere in its domain[3] (but not strictly convex everywhere[4]).

LSE is a smooth maximum because, applying the tangent line approximation $\log(X+a)\approx \log X+a/X,$ if one term, $x_{j},$ is much larger than the rest, the second term is small because it has $x_{j}$ in the denominator, and one gets:

{\textstyle \mathrm {LSE} (\mathbf {x} )=\log {\bigl (}\sum _{i}\exp x_{i}{\bigr )}\approx \log(\exp x_{j})+{\bigl (}\sum _{i\neq j}\exp x_{i}{\bigr )}/\exp x_{j}\approx x_{j}=\max _{i}x_{i}.}

Indeed, there are the following tight bounds (if $n>1$ , otherwise the first inequality is not strict):

\max {\{x_{1},\dots ,x_{n}\}}<\mathrm {LSE} (x_{1},\dots ,x_{n})\leq \max {\{x_{1},\dots ,x_{n}\}}+\log(n).

The upper bound is equality if and only if all $x_{i}$ are equal.

This is because ${\textstyle \sum _{i=1}^{n}x_{i}\leq n\cdot \max _{i}x_{i}}$ (a sum is at most its maximum term each time), and for positive numbers, ${\textstyle \sum _{i=1}^{n}x_{i}\geq x_{i}}$ for any term, include the maximum (since it's adding positive numbers), and in fact is strict if $n>1$ (since you're adding a positive number). Combining with logarithms and exponents, one gets:

{\begin{aligned}\max {\{x_{1}\dots ,x_{n}\}}&=\log \left(\exp(\max x_{i})\right)\\&\leq \log \left(\exp(x_{1})+\cdots +\exp(x_{n})\right)\\&\leq \log \left(n\cdot \exp(\max x_{i})\right)\\&=\max {\{x_{1},\dots ,x_{n}\}}+\log(n).\end{aligned}}

The lower bound is met only for $n=1$ , otherwise it is strict but approached when all but one of the arguments approach negative infinity, and the upper bound is met when all the arguments are equal.

Writing $\mathbf {x} =(x_{1},\dots ,x_{n}),$ the partial derivatives are:

{\frac {\partial }{\partial x_{i}}}{LSE(\mathbf {x} )}={\frac {\exp x_{i}}{\sum _{j}\exp {x_{j}}}}.

This can be calculated via logarithmic differentiation.

Expressing the partial derivatives as a vector with the gradient yields the softmax function, the multivariable analog of the logistic function.

The convex conjugate of LogSumExp is the negative entropy.

log-sum-exp trick for log-domain calculations

The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in log probability.

Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in linear-scale becomes the LSE in log-scale.

A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision floating point numbers.

Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient). Therefore, many math libraries such as IT++ provide a default routine of LSE and use this formula internally.

LSE(x_{1},\dots ,x_{n})=x^{*}+\log \left(\exp(x_{1}-x^{*})+\cdots +\exp(x_{n}-x^{*})\right)

where $x^{*}=\max {\{x_{1},\dots ,x_{n}\}}$

A strictly convex log-sum-exp type function

LSE is convex but not strictly convex. We can define a strictly convex log-sum-exp type function[5] by adding an extra argument set to zero:

LSE_{0}^{+}(x_{1},...,x_{n})=LSE(0,x_{1},...,x_{n})

This function is a proper Bregman generator (strictly convex and differentiable). It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.

References

Zhang, Aston; Zack, Lipton; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.
Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18: 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442.
El Ghaoui, Laurent (2017). Optimization Models and Applications.
"convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.
Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225. Bibcode:2018arXiv180307225N. Cite journal requires |journal= (help)

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Zhang, Aston; Zack, Lipton; Li, Mu; Smola, Alex. "Dive into Deep Learning, Chapter 3 Exercises". www.d2l.ai. Retrieved 27 June 2020.

[F._Nielsen_2016-2] Nielsen, Frank; Sun, Ke (2016). "Guaranteed bounds on the Kullback-Leibler divergence of univariate mixtures using piecewise log-sum-exp inequalities". Entropy. 18: 442. arXiv:1606.05850. Bibcode:2016Entrp..18..442N. doi:10.3390/e18120442.

[L._El_Ghaoui_2017-3] El Ghaoui, Laurent (2017). Optimization Models and Applications.

[4] "convex analysis - About the strictly convexity of log-sum-exp function - Mathematics Stack Exchange". stackexchange.com.

[F._Nielsen_2018-5] Nielsen, Frank; Hadjeres, Gaetan (2018). "Monte Carlo Information Geometry: The dually flat case". arXiv:1803.07225. Bibcode:2018arXiv180307225N. Cite journal requires |journal= (help)

LogSumExp

Properties

log-sum-exp trick for log-domain calculations

A strictly convex log-sum-exp type function

See also

References