Conditional independence

In probability theory, two random events $A$ and $B$ are conditionally independent given a third event $C$ precisely if the occurrence of $A$ and the occurrence of $B$ are independent events in their conditional probability distribution given $C$ . In other words, $A$ and $B$ are conditionally independent given $C$ if and only if, given knowledge that $C$ occurs, knowledge of whether $A$ occurs provides no information on the likelihood of $B$ occurring, and knowledge of whether $B$ occurs provides no information on the likelihood of $A$ occurring.

The concept of conditional independence can be extended from random events to random variables and random vectors.

Conditional independence of events

Definition

In the standard notation of probability theory, $A$ and $B$ are conditionally independent given $C$ if and only if $\Pr(A\cap B\mid C)=\Pr(A\mid C)\Pr(B\mid C)$ . Conditional independence of $A$ and $B$ given $C$ is denoted by $(A\perp \!\!\!\perp B)\mid C$ . Formally:

(A\perp \!\!\!\perp B)\mid C\quad \iff \quad \Pr(A\cap B\mid C)=\Pr(A\mid C)\Pr(B\mid C)

(Eq.1)

or equivalently,

(A\perp \!\!\!\perp B)\mid C\quad \iff \quad \Pr(A\mid B\cap C)=\Pr(A\mid C)\quad {\text{or}}\quad \Pr(B\mid C)=1.

Examples

The discussion on StackExchange provides a couple of useful examples. See below.[1]

Coloured boxes

Each cell represents a possible outcome. The events $\color {red}R$ , $\color {blue}B$ and $\color {gold}Y$ are represented by the areas shaded red, blue and yellow respectively. The overlap between the events $\color {red}R$ and $\color {blue}B$ is shaded purple.

The probabilities of these events are shaded areas with respect to the total area. In both examples $R$ and $B$ are conditionally independent given $Y$ because:

\Pr({\color {red}R}\cap {\color {blue}B}\mid {\color {gold}Y})=\Pr({\color {red}R}\mid {\color {gold}Y})\Pr({\color {blue}B}\mid {\color {gold}Y})

[2]

but not conditionally independent given $\left[{\text{not }}Y\right]$ because:

\Pr({\color {red}R}\cap {\color {blue}B}\mid {\text{not }}{\color {gold}Y})\not =\Pr({\color {red}R}\mid {\text{not }}{\color {gold}Y})\Pr({\color {blue}B}\mid {\text{not }}{\color {gold}Y})

Weather and delays

Let the two events be the probabilities of persons A and B getting home in time for dinner, and the third event is the fact that a snow storm hit the city. While both A and B have a lower probability of getting home in time for dinner, the lower probabilities will still be independent of each other. That is, the knowledge that A is late does not tell you whether B will be late. (They may be living in different neighborhoods, traveling different distances, and using different modes of transportation.) However, if you have information that they live in the same neighborhood, use the same transportation, and work at the same place, then the two events are NOT conditionally independent.

Dice rolling

Conditional independence depends on the nature of the third event. If you roll two dice, one may assume that the two dice behave independently of each other. Looking at the results of one die will not tell you about the result of the second die. (That is, the two dice are independent.) If, however, the 1st die's result is a 3, and someone tells you about a third event - that the sum of the two results is even - then this extra unit of information restricts the options for the 2nd result to an odd number. In other words, two events can be independent, but NOT conditionally independent.

Height and vocabulary of children

Height and vocabulary are independent; but they are conditionally not independent if you add age.

Conditional independence of random variables

Two random variables $X$ and $Y$ are conditionally independent given a third random variable $Z$ if and only if they are independent in their conditional probability distribution given $Z$ . That is, $X$ and $Y$ are conditionally independent given $Z$ if and only if, given any value of $Z$ , the probability distribution of $X$ is the same for all values of $Y$ and the probability distribution of $Y$ is the same for all values of $X$ . Formally:

(X\perp \!\!\!\perp Y)\mid Z\quad \iff \quad F_{X,Y\,\mid \,Z\,=\,z}(x,y)=F_{X\,\mid \,Z\,=\,z}(x)\cdot F_{Y\,\mid \,Z\,=\,z}(y)\quad {\text{for all }}x,y,z

(Eq.2)

where $F_{X,Y\,\mid \,Z\,=\,z}(x,y)=\Pr(X\leq x,Y\leq y\mid Z=z)$ is the conditional cumulative distribution function of $X$ and $Y$ given $Z$ .

Two events $R$ and $B$ are conditionally independent given a σ-algebra $\Sigma$ if

\Pr(R\cap B\mid \Sigma )=\Pr(R\mid \Sigma )\Pr(B\mid \Sigma ){\text{ a.s.}}

where $\Pr(A\mid \Sigma )$ denotes the conditional expectation of the indicator function of the event $A$ , $\chi _{A}$ , given the sigma algebra $\Sigma$ . That is,

\Pr(A\mid \Sigma ):=\operatorname {E} [\chi _{A}\mid \Sigma ].

Two random variables $X$ and $Y$ are conditionally independent given a σ-algebra $\Sigma$ if the above equation holds for all $R$ in $\sigma (X)$ and B in $\sigma (Y)$ .

Two random variables $X$ and $Y$ are conditionally independent given a random variable $W$ if they are independent given σ(W): the σ-algebra generated by $W$ . This is commonly written:

X\perp \!\!\!\perp Y\mid W

or

X\perp Y\mid W

This is read " $X$ is independent of $Y$ , given $W$ "; the conditioning applies to the whole statement: "( $X$ is independent of $Y$ ) given $W$ ".

(X\perp \!\!\!\perp Y)\mid W

If $W$ assumes a countable set of values, this is equivalent to the conditional independence of X and Y for the events of the form $[W=w]$ . Conditional independence of more than two events, or of more than two random variables, is defined analogously.

The following two examples show that $X\perp \!\!\!\perp Y$ neither implies nor is implied by $(X\perp \!\!\!\perp Y)\mid W$ . First, suppose $W$ is 0 with probability 0.5 and 1 otherwise. When W = 0 take $X$ and $Y$ to be independent, each having the value 0 with probability 0.99 and the value 1 otherwise. When $W=1$ , $X$ and $Y$ are again independent, but this time they take the value 1 with probability 0.99. Then $(X\perp \!\!\!\perp Y)\mid W$ . But $X$ and $Y$ are dependent, because Pr(X = 0) < Pr(X = 0|Y = 0). This is because Pr(X = 0) = 0.5, but if Y = 0 then it's very likely that W = 0 and thus that X = 0 as well, so Pr(X = 0|Y = 0) > 0.5. For the second example, suppose $X\perp \!\!\!\perp Y$ , each taking the values 0 and 1 with probability 0.5. Let $W$ be the product $X\cdot Y$ . Then when $W=0$ , Pr(X = 0) = 2/3, but Pr(X = 0|Y = 0) = 1/2, so $(X\perp \!\!\!\perp Y)\mid W$ is false. This is also an example of Explaining Away. See Kevin Murphy's tutorial [3] where $X$ and $Y$ take the values "brainy" and "sporty".

Conditional independence of random vectors

Two random vectors $\mathbf {X} =(X_{1},\ldots ,X_{l})^{\mathrm {T} }$ and $\mathbf {Y} =(Y_{1},\ldots ,Y_{m})^{\mathrm {T} }$ are conditionally independent given a third random vector $\mathbf {Z} =(Z_{1},\ldots ,Z_{n})^{\mathrm {T} }$ if and only if they are independent in their conditional cumulative distribution given $\mathbf {Z}$ . Formally:

(\mathbf {X} \perp \!\!\!\perp \mathbf {Y} )\mid \mathbf {Z} \quad \iff \quad F_{\mathbf {X} ,\mathbf {Y} |\mathbf {Z} =\mathbf {z} }(\mathbf {x} ,\mathbf {y} )=F_{\mathbf {X} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} )\cdot F_{\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {y} )\quad {\text{for all }}\mathbf {x} ,\mathbf {y} ,\mathbf {z}

(Eq.3)

where $\mathbf {x} =(x_{1},\ldots ,x_{l})^{\mathrm {T} }$ , $\mathbf {y} =(y_{1},\ldots ,y_{m})^{\mathrm {T} }$ and $\mathbf {z} =(z_{1},\ldots ,z_{n})^{\mathrm {T} }$ and the conditional cumulative distributions are defined as follows.

{\begin{aligned}F_{\mathbf {X} ,\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} ,\mathbf {y} )&=\Pr(X_{1}\leq x_{1},\ldots ,X_{l}\leq x_{l},Y_{1}\leq y_{1},\ldots ,Y_{m}\leq y_{m}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\\[6pt]F_{\mathbf {X} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {x} )&=\Pr(X_{1}\leq x_{1},\ldots ,X_{l}\leq x_{l}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\\[6pt]F_{\mathbf {Y} \,\mid \,\mathbf {Z} \,=\,\mathbf {z} }(\mathbf {y} )&=\Pr(Y_{1}\leq y_{1},\ldots ,Y_{m}\leq y_{m}\mid Z_{1}=z_{1},\ldots ,Z_{n}=z_{n})\end{aligned}}

Uses in Bayesian inference

Let p be the proportion of voters who will vote "yes" in an upcoming referendum. In taking an opinion poll, one chooses n voters randomly from the population. For i = 1, ..., n, let X_i = 1 or 0 corresponding, respectively, to whether or not the ith chosen voter will or will not vote "yes".

In a frequentist approach to statistical inference one would not attribute any probability distribution to p (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that X₁, ..., X_n are independent random variables.

By contrast, in a Bayesian approach to statistical inference, one would assign a probability distribution to p regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that p is in any interval to which a probability is assigned. In that model, the random variables X₁, ..., X_n are not independent, but they are conditionally independent given the value of p. In particular, if a large number of the Xs are observed to be equal to 1, that would imply a high conditional probability, given that observation, that p is near 1, and thus a high conditional probability, given that observation, that the next X to be observed will be equal to 1.

Rules of conditional independence

A set of rules governing statements of conditional independence have been derived from the basic definition.[4][5]

Note: since these implications hold for any probability space, they will still hold if one considers a sub-universe by conditioning everything on another variable, say K. For example, $X\perp \!\!\!\perp Y\Rightarrow Y\perp \!\!\!\perp X$ would also mean that $X\perp \!\!\!\perp Y\mid K\Rightarrow Y\perp \!\!\!\perp X\mid K$ .

Note: below, the comma can be read as an "AND".

Symmetry

X\perp \!\!\!\perp Y\quad \Rightarrow \quad Y\perp \!\!\!\perp X

Decomposition

X\perp \!\!\!\perp A,B\quad \Rightarrow \quad {\text{ and }}{\begin{cases}X\perp \!\!\!\perp A\\X\perp \!\!\!\perp B\end{cases}}

Proof:

$p_{X,A,B}(x,a,b)=p_{X}(x)p_{A,B}(a,b)$ (meaning of $X\perp \!\!\!\perp A,B$ )
$\int _{B}\!p_{X,A,B}(x,a,b)\,db=\int _{B}\!p_{X}(x)p_{A,B}(a,b)\,db$ (ignore variable B by integrating it out)
$p_{X,A}(x,a)=p_{X}(x)p_{A}(a)$

A similar proof shows the independence of X and B.

Weak union

X\perp \!\!\!\perp A,B\quad \Rightarrow \quad {\text{ and }}{\begin{cases}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\mid A\end{cases}}

Proof:

By definition, $\Pr(X)=\Pr(X\mid A,B)$ .
Due to the property of decomposition $X\perp \!\!\!\perp B$ , $\Pr(X)=\Pr(X\mid B)$ .
Combining the above two equalities gives $\Pr(X\mid B)=\Pr(X\mid A,B)$ , which establishes $X\perp \!\!\!\perp A\mid B$ .

The second condition can be proved similarly.

Contraction

\left.{\begin{aligned}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\end{aligned}}\right\}{\text{ and }}\quad \Rightarrow \quad X\perp \!\!\!\perp A,B

Proof:

This property can be proved by noticing $\Pr(X\mid A,B)=\Pr(X\mid B)=\Pr(X)$ , each equality of which is asserted by $X\perp \!\!\!\perp A\mid B$ and $X\perp \!\!\!\perp B$ , respectively.

Contraction-weak-union-decomposition

Putting the above three together, we have:

\left.{\begin{aligned}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\end{aligned}}\right\}{\text{ and }}\quad \iff \quad X\perp \!\!\!\perp A,B\quad \Rightarrow \quad {\text{ and }}{\begin{cases}X\perp \!\!\!\perp A\mid B\\X\perp \!\!\!\perp B\\X\perp \!\!\!\perp B\mid A\\X\perp \!\!\!\perp A\\\end{cases}}

Intersection

For strictly positive probability distributions,[5] the following also holds:

\left.{\begin{aligned}X\perp \!\!\!\perp A\mid C,B\\X\perp \!\!\!\perp B\mid C,A\end{aligned}}\right\}{\text{ and }}\quad \Rightarrow \quad X\perp \!\!\!\perp B,A\mid C

The five rules above were termed "Graphoid Axioms" by Pearl and Paz,[6] because they hold in graphs, if $X\perp \!\!\!\perp A\mid B$ is interpreted to mean: "All paths from X to A are intercepted by the set B".[7]

References

Could someone explain conditional independence?
To see that this is the case, one needs to realise that Pr(R ∩ B | Y) is the probability of an overlap of R and B (the purple shaded area) in the Y area. Since, in the picture on the left, there are two squares where R and B overlap within the Y area, and the Y area has twelve squares, Pr(R ∩ B | Y) = 2/12 = 1/6. Similarly, Pr(R | Y) = 4/12 = 1/3 and Pr(B | Y) = 6/12 = 1/2.
http://people.cs.ubc.ca/~murphyk/Bayes/bnintro.html
Dawid, A. P. (1979). "Conditional Independence in Statistical Theory". Journal of the Royal Statistical Society, Series B. 41 (1): 1–31. JSTOR 2984718. MR 0535541.
J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press
Pearl, Judea; Paz, Azaria (1985). "Graphoids: A Graph-Based Logic for Reasoning About Relevance Relations". Missing or empty |url= (help)
Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann.

External links

Media related to Conditional independence at Wikimedia Commons

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Could someone explain conditional independence?

[2] To see that this is the case, one needs to realise that Pr(R ∩ B | Y) is the probability of an overlap of R and B (the purple shaded area) in the Y area. Since, in the picture on the left, there are two squares where R and B overlap within the Y area, and the Y area has twelve squares, Pr(R ∩ B | Y) = 2/12 = 1/6. Similarly, Pr(R | Y) = 4/12 = 1/3 and Pr(B | Y) = 6/12 = 1/2.

[3] ttp://people.cs.ubc.ca/~murphyk/Bayes/bnintro.html

[4] Dawid, A. P. (1979). "Conditional Independence in Statistical Theory". Journal of the Royal Statistical Society, Series B. 41 (1): 1–31. JSTOR 2984718. MR 0535541.

[pearl:2000-5] J Pearl, Causality: Models, Reasoning, and Inference, 2000, Cambridge University Press

[pearl:paz85-6] Pearl, Judea; Paz, Azaria (1985). "Graphoids: A Graph-Based Logic for Reasoning About Relevance Relations". Missing or empty |url= (help)

[pearl:88-7] Pearl, Judea (1988). Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann.

Conditional independence

Conditional independence of events

Definition

Examples

Coloured boxes

Weather and delays

Dice rolling

Height and vocabulary of children

Conditional independence of random variables

Conditional independence of random vectors

Uses in Bayesian inference

Rules of conditional independence

Symmetry

Decomposition

Weak union

Contraction

Contraction-weak-union-decomposition

Intersection

See also

References

External links