Causal model

A causal model is a conceptual model that describes the causal mechanisms of a system. Causal models have found applications in signal processing and machine learning.^[1]

Pearl defines a causal model as an ordered triple $\langle U,V,E\rangle$ , where U is a set of exogenous variables whose values are determined by factors outside the model; V is a set of endogenous variables whose values are determined by factors within the model; and E is a set of structural equations that express the value of each endogenous variable as a function of the values of the other variables in U and V.^[1]

Causality vs correlation

The history of statistics revolves the analysis of relationships among multiple variables. Traditionally, these relationships are described as correlations, associations without any implied causal relationships. Causal models attempt to extend this framework by adding the notion of causal relationships, in which one changes in one variable cause changes in others.^[1]

Definition

Causal models are mathematical models representing causal relationships within an individual system or population. They facilitate inferences about causal relationships from statistical data. They can teach us a good deal about the epistemology of causation, and about the relationship between causation and probability. They have also been applied to topics of interest to philosophers, such as the logic of counterfactuals, decision theory, and the analysis of actual causation.^[2]
— Stanford Encyclopedia of Philosophy

History

In the late 19th century, the discipline of statistics began to take shape. After a years-long effort to identify causal rules for domains such as biological inheritance, Galton introduced the concept of mean regression (epitomized by the sophomore slump in sports) which later led him to the non-causal concept of correlation.^[3]

As a positivist, Pearson expunged the notion of causality from much of science as only an unprovable special case of association and introduced the correlation coefficient as the metric of association. He wrote "Force as a cause of motion is exactly the same as a tree god as a cause of growth" and the causation was only a "fetish among the inscrutable arcana of modern science". Pearson founded Biometrika and the Biometrics Lab at University College London, which became the world leader in statistics.^[3]

In 1908 Hardy and Weinberg solved the problem of trait stability that had led Galton to abandon causality by invoking Mendelian inheritance.^[3]

In 1921 Wright's path analysis became the theoretical ancestor of causal modeling and causal graphs.^[4] He developed this approach while attempting to untangle the relative impacts of heredity, development and environment on guinea pig coat patterns. He backed up his heretical claims by showing how such analyses could explain the relationship between guinea pig birth weight, in utero time and litter size. Opposition to these ideas by prominent statisticians led them to be ignored for the following 40 years (except among animal breeders). Instead scientists relied on correlations, partly at the behest of Wright's critic and leading statistician, Fisher.^[3]

In the 1960s, Duncan, Blalock, Goldberger and others rediscovered path analysis. Sociologists called it structural equation modeling, but once it became a rote method, it lost its utility, leading some practitioners to reject any relationship to causality. Economists adopted the algebraic part of path analysis, calling it simultaneous equation modeling. However, economists still avoided attributing causal meaning to their equations.^[3]

Sixty years after his first paper, Wright published a piece that recapitulated it, following Karlin et al.'s critique, which objected that it handled only linear relationships and that robust (multiplicitous), model-free presentations of data are more revealing.^[3]

Ladder of causation

Pearl's causal metamodel involves a three-level abstraction he calls the ladder of causation. The lowest level, Association (seeing/observing), entails the sensing of regularities or patterns in the input data, expressed as correlations. The middle level, Intervention (doing), predicts the effects of deliberate actions, expressed as causal relationships. The highest level, Counterfactuals (imagining), involves constructing a theory of (part of) the world that explains why specific actions have specific effects and what to do when they do not.^[3]

Association

One object is associated with another if observing one changes the probability of observing the other. Example: shoppers who buy toothpaste are more likely to also buy dental floss. Mathematically:

$P(floss\vline toothpaste)$

where P is probability. Associations can also be measured via computing the correlation of the two events. Associations have no causal implications. One event could cause the other., the reverse could be true, or both events could be caused by some third event (unhappy dentist shames shopper into better treating their mouth).^[3]

Intervention

This level asserts specific causal relationships between events. Causality is assessed by experimentally performing some action that affects one of the events. Example: doubling the price of toothpaste. Causality cannot be established by examining history (of price changes) because the price change may have been for some other reason that could itself affect the second event (a tariff that increases the price of both goods). Mathematically:

$P(floss\vline do(toothpaste))$

where do is an operator that signals the experimental intervention (doubling the price).^[3]

Counterfactuals

The highest, counterfactual, level involves consideration of an alternate version of a past event. Example: What is the probability that If we had doubled the price, the shopper would still have bought it? Answering yes asserts the existence of a causal relationship. Models that can answer counterfactuals allow precise interventions whose consequences can be predicted. At the extreme, such models are accepted as physical laws (as in the laws of physics, e.g., inertia, which says that if something is not moving, it will move only if a force is applied to it).^[3]

Causal diagram

A causal diagram is a graph that displays causal relationships between variables in a causal model. A causal diagram includes a set of variables (or nodes) within the scope of the model. Each node is connected by an arrow to another node upon which it has a causal influence. The arrowhead delineates the direction of causality, e.g., an arrow connecting variables A and B with the arrowhead at B indicates a change in A causes a change in B (with an associated probability).^[3]

Causal diagrams include causal loop diagrams, directed acyclic graphs, and Ishikawa diagrams.^[3]

Casual diagrams are independent of the quantitative probabilities that inform them. Changes to those probabilities (e.g., due to technological improvements) do not require changes to the model.^[3]

Causality

Twentieth century definitions of causality relied purely on probabilities/associations. One event (X) was said to cause another if it raises the probability of the other (Y). Mathematically this is expressed as:

$P(Y\vline X)>P(Y)$ .

Such definitions are inadequate because they encompass non-causal explanations (e.g., X and Y may have a common cause). Causality is relevant to the second ladder step. Associations are on the first step and provide only evidence to the latter.^[3] A later definition attempted to address this ambiguity by conditioning on background factors. Mathematically:

$P(Y\vline X,K=k)>P(Y|K=k)$ ,

where K is the set of background variables. However, the required set of background variables is indeterminate as long as probability is the only criterion.^[3]

Other attempts to define causality include Granger causality, a statistical hypothesis test that causality (in economics) can be assessed by measuring the ability to predict the future values of one time series using prior values of another time series.^[3]

Model elements

Causal models have formal structures with elements with specific properties.^[3]

Junction patterns

The three types of connections of three nodes are linear chains, branching forks and merging colliders.^[3]

Chain

Chains are straight line connections with arrows pointing from cause to effect. In this model, B is a mediator in that it mediates the change that A would otherwise have on C.^[3]^:113

$A\rightarrow B\rightarrow C$

Fork

In forks, one cause has multiple effects. In such models, B is a common cause or confounder. Conditioning on B (for a specific value of B) reveals a positive correlation between A and C that is not causal.^[3]^:114

$A\leftarrow B\rightarrow C$

Collider

In colliders, multiple causes affect one outcome. Conditioning on B (for a specific value of B) often reveals a non-causal negative correlation between A and C. This negative correlation has been called collider bias and the "explain-away" effect.^[3]^:115 The correlation can be positive in the case where contributions from both A and C are necessary to affect B.^[3]^:197

$A\rightarrow B\leftarrow C$

Node types

Mediator

A mediator node modifies the effect of other causes on an outcome (as opposed to simply affecting the outcome).^[3]^:113

Confounder

A counfounder node affects multiple outcomes, creating a positive correlation among them.^[3]^:114

Confounder/deconfounder

An essential element of correlational study design is to identify potentially confounding influences on the variable under study, such as demographics. These variables are controlled for to eliminate those influences. However, the correct list of confounding variables cannot be determined a priori. It is thus possible that a study may attempt to control for irrelevant variables or even variables including the variable under study.^[3]^:139

Causal models offer a robust technique for identifying appropriate confounding variables. Formally, Y is a confounder if "Y is associated with Z via paths not going through X. These can often be determined using data collected for other studies. Mathematically, if

$P(Y|X)\neq P(Y|do(X))$

Then X is a confounder for Y.^[3]^:151

Earlier, allegedly incorrect definitions include:^[3]^:152

"Any variable that is correlated with both X and Y."
Y is associated with Z among the unexposed.
Noncollapsibility: A difference between the "crude relative risk and the relative risk resulting after adjustment for the potential confounder."
Epidemiological: A variable associated with X in the population at large and associated with Y among people unexposed to X.

The latter is flawed in that given that in the model:

$X\rightarrow Z\rightarrow Y$

Z matches the definition, but is a mediator, not a confounder, and is an example of controlling for the outcome. In the model

$X\leftarrow A\rightarrow B\leftarrow C\rightarrow Y$

Traditionally, B was considered to be a confounder, because it is associated with X and with Y but is not on a causal path nor is it a descendant of anything on a causal path. Controlling for B causes it to become a confounder. This is known as M-bias.^[3]^:161

Backdoor adjustment

In a causal model, the method for identifying all appropriate counfounders (deconfounding) is to block every noncausal path between X and Y without disrupting any causal paths.^[3]^:158

Definition: a backdoor path between two variables X and Y is any path from X to Y that starts with an arrow pointing to X.^[3]^:158

X and Y are deconfounded if every backdoor path is blocked and no controlled-for variable Z is descended from X. It is not necessary to control for any variables other than the deconfounders.^[3]^:158

Definition: the backdoor criterion is satisfied when all backdoor paths in a model are blocked.

When the causal model is a plausible representation of reality and the backdoor criterion is satisfied, then partial regression coefficients can be used as (causal) path coefficients (for linear relationships).^[5]^:223

$P(Y|do(X))=\textstyle \sum _{x}\displaystyle P(Y|X,Z=z)P(Z=z)$ ^[5]^:227

Frontdoor adjustment

Definition: a frontdoor path is a direct causal path for which data is available for all variables.^[3]^:226

The following converts a do expression into a do-free expression by conditioning on the variables along the front-door path.^[3]^:226

$P(Y|do(X))=\textstyle \sum _{z}\displaystyle P(Z=z,X)\textstyle \sum _{x}\displaystyle P(Y|X=x,Z=z)P(X=x)$

Presuming data for these observable probabilities is available, the ultimate probability can be computed without an experiment, regardless of the existence of other confounding paths and without backdoor adjustment.^[3]^:226

Do calculus

The do calculus is the set of manipulations that are available to transform one expression into another, with the general goal of transforming expressions that contain the do operator into expressions that do not. Expressions that do not include the do operator can be estimated from observational data alone, without the need for an experimental intervention, which might be expensive, lengthy or even unethical (e.g., asking subjects to take up smoking).^[3]^:231 The set of rules is complete (it can be used to derive every true statement in this system).^[3]^:237 An algorithm can determine whether, for a given model, a solution is computable in polynomial time.^[3]^:238

Rules

The calculus includes three rules for the transformation of conditional probability expressions involving the do operator.

Rule 1

$P(Y|do(X),Z,W)=P(Y|do(X),Z)$

in the case that the variable set Z blocks all paths from W to Y are all arrows leading into X have been deleted.^[3]^:234 This rule permits the addition or deletion of observations.^[3]^:235

Rule 2

$P(Y|do(X),Z)=P(Y|X,Z)$

in the case that Z satisfies the back-door criterion.^[3]^:234 This rule permits the replacement of an intervention with an observation or vice versa.^[3]^:235

Rule 3

$P(Y|do(X))=P(Y)$

in the case where no causal paths connect X and Y.^[3]^:234 This rule permits the deletion or addition of interventions.^[3]^:235

Extensions

The rules do not imply that any query can have its do operators removed. In those cases, it may be possible to substitute a variable that is subject to manipulation (e.g., diet) in place of one that is not (e.g., blood cholesterol), which can then be transformed to remove the do. Example:

$P(Heartdisease|do(bloodcholesterol))=P(Heartdisease|do(diet))$

Queries

Queries are questions asked based on a specific model. They are generally answered via performing experiments (interventions). Interventions take the form of fixing the value of one variable in a model and observing the result. Mathematically, such queries take the form (from the example):^[3]^:8

$P(floss\vline do(toothpaste))$

where the do operator indicates that the experiment explicitly modified the price of toothpaste. Graphically, this blocks any causal factors that would otherwise affect that variable. Diagramatically, this erases all causal arrows pointing at the experimental variable.^[3]^:40

More complex queries are possible, in which the do operator is applied (the value is fixed) to multiple variables.

Independence conditions

Variables are independent if the values of one do not directly affect the values of the other. Multiple causal models can share independence conditions. For example, the models

A\rightarrow B\rightarrow C

and

A\leftarrow B\rightarrow C

have the same independence conditions, because conditioning on B leaves A and C independent. However, the two models do not have the same meaning and can be falsified based on data. (If observations show an association between A and C after conditioning on B, then both models are incorrect). Conversely, data cannot show which of these two models are correct, because they have the same independence conditions. Conditioning on a variable is a mechanism for conducting hypothetical experiments. Conditioning on B in the first example, implies that observations for a given value of B should show no correlation between A and C. If such a correlation exists, then model must change accordingly. Bayesian networks cannot make such distinctions, because they do not make causal assertions.^[3]^:129-130

Bayesian network

Any causal model can be implemented as a Bayesian network. Bayesian networks can be used to provide the inverse probability of an event (given an outcome, what are the probabilities of a specific cause. This requires preparation of a conditional probability table, showing all possible inputs and outcomes with their associated probabilities.^[3]^:119

For example, given a two variable model of Disease and Test (for the disease) the conditional probability table takes the form:^[3]^:117

Probability of a positive test for a given disease
	Test
Disease	Positive	Negative
Negative	12	88
Positive	73	27

According to this table, when a patient does not have the disease, the probability of a positive test is 12%.

While this is tractable for small problems, as the number of variables and their associated states increase, the probability table (and associated computation time) increases exponentially.^:121

Bayesian networks are used commercially in applications such as wireless data error correction and DNA analysis.^[3]^:122

References

1 2 3 Pearl, Judea (2009-09-14). Causality. Cambridge University Press. ISBN 9781139643986.
↑ Hitchcock, Christopher (2018), Zalta, Edward N., ed., "Causal Models", The Stanford Encyclopedia of Philosophy (Fall 2018 ed.), Metaphysics Research Lab, Stanford University, retrieved 2018-09-08
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Pearl, Judea; Mackenzie, Dana (2018-05-15). The Book of Why: The New Science of Cause and Effect. Basic Books. ISBN 9780465097616.
↑ Samir Okasha, Causation in Biology, in Oxford Handbook of Causation, (H. Beebee, et al., (eds.) 707, 711 (2009).
1 2 Pearl, Judea; Mackenzie, Dana (2018-05-15). The Book of Why: The New Science of Cause and Effect. Basic Books. ISBN 9780465097616.

External links

Causal modeling at PhilPapers

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[:0-1] 1 2 3 Pearl, Judea (2009-09-14). Causality. Cambridge University Press. ISBN 9781139643986.

[2] Hitchcock, Christopher (2018), Zalta, Edward N., ed., "Causal Models", The Stanford Encyclopedia of Philosophy (Fall 2018 ed.), Metaphysics Research Lab, Stanford University, retrieved 2018-09-08

[:1-3] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 Pearl, Judea; Mackenzie, Dana (2018-05-15). The Book of Why: The New Science of Cause and Effect. Basic Books. ISBN 9780465097616.

[4] Samir Okasha, Causation in Biology, in Oxford Handbook of Causation, (H. Beebee, et al., (eds.) 707, 711 (2009).

[:12-5] 1 2 Pearl, Judea; Mackenzie, Dana (2018-05-15). The Book of Why: The New Science of Cause and Effect. Basic Books. ISBN 9780465097616.