Total variation distance of probability measures

In probability theory, the total variation distance is a distance measure for probability distributions. It is an example of a statistical distance metric, and is sometimes called the statistical distance or variational distance.

Definition

The total variation distance between two probability measures P and Q on a sigma-algebra ${\mathcal {F}}$ of subsets of the sample space $\Omega$ is defined via^[1]

\delta (P,Q)=\sup _{A\in {\mathcal {F}}}\left|P(A)-Q(A)\right|.

Informally, this is the largest possible difference between the probabilities that the two probability distributions can assign to the same event.

Properties

Relation to other distances

The total variation distance is related to the Kullback–Leibler divergence by Pinsker's inequality:

\delta (P,Q)\leq {\sqrt {{\frac {1}{2}}D_{\mathrm {KL} }(P\parallel Q)}}.

The total variation distance is related to the L¹ norm by the identity:^[2]

\delta (P,Q)={\frac {1}{2}}\|P-Q\|_{1}={\frac {1}{2}}\sum _{\omega \in \Omega }|P(\omega )-Q(\omega )|.

Connection to transportation theory

The total variation distance (or half the norm) arises as the optimal transportation cost, when the cost function is $c(x,y)={\mathbf {1} }_{x\neq y}$ , that is,

{\frac {1}{2}}\|P-Q\|_{1}=\delta (P,Q)=\inf _{\pi }\,\mathbb {E} [{\mathbf {1} }_{P\neq Q}],

where the infimum is taken over all $\pi$ probability distributions with marginals $P$ and $Q$ , respectively^[3].

References

↑ Chatterjee, Sourav. "Distances between probability measures" (PDF). UC Berkeley. Archived from the original (PDF) on July 8, 2008. Retrieved 21 June 2013.
↑ David A. Levin Yuval Peres, Elizabeth L. Wilmer, 'Markov Chains and Mixing Times', 2nd. rev. ed. (AMS, 2017), Proposition 4.2, p. 48
↑ Villani, Cédric (2009). Optimal Transport, Old and New. Springer-Verlag Berlin Heidelberg. p. 22. doi:10.1007/978-3-540-71050-9. ISBN 978-3-540-71049-3.

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[Chatterjee2007-1] Chatterjee, Sourav. "Distances between probability measures" (PDF). UC Berkeley. Archived from the original (PDF) on July 8, 2008. Retrieved 21 June 2013.

[2] David A. Levin Yuval Peres, Elizabeth L. Wilmer, 'Markov Chains and Mixing Times', 2nd. rev. ed. (AMS, 2017), Proposition 4.2, p. 48

[3] Villani, Cédric (2009). Optimal Transport, Old and New. Springer-Verlag Berlin Heidelberg. p. 22. doi:10.1007/978-3-540-71050-9. ISBN 978-3-540-71049-3.