Gibbs' inequality

Josiah Willard Gibbs

In information theory, Gibbs' inequality is a statement about the mathematical entropy of a discrete probability distribution. Several other bounds on the entropy of probability distributions are derived from Gibbs' inequality, including Fano's inequality. It was first presented by J. Willard Gibbs in the 19th century.

Gibbs' inequality

Suppose that

P=\{p_{1},\ldots ,p_{n}\}

is a probability distribution. Then for any other probability distribution

Q=\{q_{1},\ldots ,q_{n}\}

the following inequality between positive quantities (since the p_i and q_i are positive numbers less than one) holds^[1]^:68

-\sum _{{i=1}}^{n}p_{i}\log _{2}p_{i}\leq -\sum _{{i=1}}^{n}p_{i}\log _{2}q_{i}

with equality if and only if

p_{i}=q_{i}

for all i. Put in words, the information entropy of a distribution P is less than or equal to its cross entropy with any other distribution Q.

The difference between the two quantities is the Kullback–Leibler divergence or relative entropy, so the inequality can also be written:^[2]^:34

D_{{{\mathrm {KL}}}}(P\|Q)\equiv \sum _{{i=1}}^{n}p_{i}\log _{2}{\frac {p_{i}}{q_{i}}}\geq 0.

Note that the use of base-2 logarithms is optional, and allows one to refer to the quantity on each side of the inequality as an "average surprisal" measured in bits.

Proof

For simplicity we prove the statement using the natural logarithm (ln), since

\log _{2}a={\frac {\ln a}{\ln 2}},

the particular logarithm we choose only scales the relationship.

Let $I$ denote the set of all $i$ for which p_i is non-zero. Then, since $\ln x\leq x-1$ for all x > 0, with equality if and only if x=1, we have:

-\sum _{{i\in I}}p_{i}\ln {\frac {q_{i}}{p_{i}}}\geq -\sum _{{i\in I}}p_{i}\left({\frac {q_{i}}{p_{i}}}-1\right)

=-\sum _{{i\in I}}q_{i}+\sum _{{i\in I}}p_{i}

=0

Then,

-\sum _{i\in I}p_{i}\ln {\frac {q_{i}}{p_{i}}}\geq 0

The last inequality is a consequence of the p_i and q_i being part of a probability distribution. Therefore, the sum of all values is unity. Specifically, the sum of all non-zero values is also unity, however, some non-zero q_i may be excluded since the choice of indices is conditioned upon the p_i. Therefore the sum of the q_i may be less than unity.

We now have:

-\sum _{i\in I}p_{i}\ln {\frac {q_{i}}{p_{i}}}\geq 0

-\sum _{i\in I}p_{i}\ln q_{i}+\sum _{i\in I}p_{i}\ln p_{i}\geq 0

Since the p_i and q_i are probabilities, their logarithms are negative. The negation of the first sum is thus positive, and the un-negated second sum is negative. We may therefore add the negation of the second sum to both sides (a positive number) without changing the inequality to get:

-\sum _{{i\in I}}p_{i}\ln q_{i}\geq -\sum _{{i\in I}}p_{i}\ln p_{i}

Since the logarithm of zero is negative infinity, restoring the indices for values of p_i that are zero requires some care. We notice that:

0\cdot {\textrm {ln}}\,0=0\cdot -\infty =-{\frac {0}{0}}

The quotient of zeros is an indeterminate form. Typically it is defined to be the convergent of asymptotic values in its neigborhood, or, if such a convergent does not exist, a convention convenient to the case at hand is adopted. In this context, the usual convention is to take the indeterminate form to be identically zero. This gives us:

-\sum _{{i=1}}^{n}p_{i}\ln q_{i}\geq -\sum _{{i=1}}^{n}p_{i}\ln p_{i}

The right hand side does not grow by our convention, and the left hand side does not grow either because zero times anything is zero, or by convention when q_i is also zero.

For equality to hold, we require:

${\frac {q_{i}}{p_{i}}}=1$ for all $i\in I$ so that the approximation $\ln {\frac {q_{i}}{p_{i}}}={\frac {q_{i}}{p_{i}}}-1$ is exact.
$\sum _{{i\in I}}q_{i}=1$ so that equality continues to hold between the third and fourth lines of the proof.

This can happen if and only if

p_{i}=q_{i}

for i = 1, ..., n.

Alternative proofs

The result can alternatively be proved using Jensen's inequality or log sum inequality. Below we give a proof based on Jensen's inequality:

Because log is a concave function, we have that:

\sum _{i}p_{i}\log {\frac {q_{i}}{p_{i}}}\leq \log \sum _{i}p_{i}{\frac {q_{i}}{p_{i}}}=\log \sum _{i}q_{i}=0

Where the first inequality is due to Jensen's inequality, and the last equality is because $q$ is a probability distribution.

Further, because $\log$ is not linear, therefore by the equality condition of Jensen's inequality, we get equality when

{\frac {q_{1}}{p_{1}}}={\frac {q_{2}}{p_{2}}}=\cdots ={\frac {q_{n}}{p_{n}}}

Suppose that this ratio is $\sigma$ , then we have that

1=\sum _{i}q_{i}=\sum _{i}\sigma p_{i}=\sigma

Where we use the fact that $p, q$ are probability distributions. Therefore the equality happens when $p=q$ .

Corollary

The entropy of $P$ is bounded by:^[1]^:68

H(p_{1},\ldots ,p_{n})\leq \log n.

The proof is trivial – simply set $q_{i}=1/n$ for all i.

References

1 2 Pierre Bremaud (6 December 2012). An Introduction to Probabilistic Modeling. Springer Science & Business Media. ISBN 978-1-4612-1046-7.
↑ David J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press. ISBN 978-0-521-64298-9.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Bremaud2012-1] 1 2 Pierre Bremaud (6 December 2012). An Introduction to Probabilistic Modeling. Springer Science & Business Media. ISBN 978-1-4612-1046-7.

[MacKay2003-2] David J. C. MacKay. Information Theory, Inference and Learning Algorithms. Cambridge University Press. ISBN 978-0-521-64298-9.

Gibbs' inequality