Overview and Essential Concepts in Causal Inference

Causal Inference is key to understanding effects of treatments, interventions and policies

Seungjun (Josh) Kim
7 min readAug 18, 2022
Source: Pixels

Introduction

What is Causal Inference? According to this article, causal inference is the process where causes are inferred from data, as the name in itself suggests already. In the more specific context of public health, causal inference can be defined as a field which “focuses on exploring the rigorous assumptions, study designs, and estimation strategies that allow researchers to draw causal conclusions based on a clinical trial or observational data.”

Causal inference is not just limited to the domain of public health. It is a useful tool for gauging and evaluating effects of policies, treatments or pretty much any kind of interventions. As buzzwords including deep learning, transformers, and language models are receiving immense spotlight from techies, explanability and interpretability of models are becoming increasingly important. Causal inference offers considerable insight to such aspects as complicated deep learning models often fail to provide explanations on the causes of observed phenomenon in data.

Correlation v.s. Causality

Correlation is the degree to which two or more quantities are linearly associated. It is a well known concept for those who studied basic statistics so I will skip the mathematical formulation of this concept. However, a lot of people often confuse correlation with causality. Causality refers to the link between A and B where A directly led to B. In other words, A was the direct cause of B. Correlation, on the other hand, can still hold between two variables even when one does not lead to the other.

There is also this concept of “spurious” correlation. As the name suggests, spurious correlation refers to causally unrelated variables happening to be highly correlated with each other over some period of time. The following presumed correlation is a famous example of a spurious correlation.

Source: http://www.tylervigen.com/spurious-correlations (CC BY 4.0)

We can observe from the line graphs above that the divorce rates in Maine over time correlate with per capita consumption of margarine. Does this mean that high divorce rates cause people to more margarine and vice versa? Probably not. Does not sound plausible, right? As this example illustrates, we should refrain from quickly judging the relationship between two or more variables. Causal inference allows us to more deeply investigate the relationship between variables, review whether proper assumptions that should be made in order to make a judgement are satisfied and then make an evaluation within certain boundaries.

Reverse Causality

Another challenge for defining causality is determining the “direction” of causality. Even after we have confidence that variables A and B have not only some correlating relationship but some sort of causal relationship, the direction of causality may be unclear. Did A cause B or B cause A?

Let us consider a situation where a gym has been installed (A) in an area where a lot of people exercise and work out (B). Scientists have found out that these two elements (installation of gym and the area where the gym was installed having high proportions of people who actively work out) are linked with some causality. In this case, both directions of causality seem not too unreasonable.

Case 1) Because the area had a lot of residents who would potentially use the gym to work out, it was attractive for the gym company and therefore led the company to install the gym there.

Case 2) The installation of a gym instigated local residents, who weren’t able to work out because there was no gym around, to exercise more.

Would someone be able to persuasively claim that one case is the right one over the other?

Some Key Concepts in Causal Inference

Treatment in causal inference is often interchangeably used with terms including exposure and interventions. Treatment is often multi-level but most textbooks or materials cover binary treatment since it is easier to make the points and explain the concepts. Outcome would be the observed result or phenomenon after a certain treatment has been applied.

Suppose we are interested in the causal effect of some treatment A on some outcome Y.

Examples of treatment would be:

A=1 if receive COVID19 vaccine; A=0 otherwise

A=1 if receive active drug; A=0 if receive placebo

Examples of Outcome would be:

Y=1 if develop diabetes in 5 years; Y=0 otherwise

Y=time until death

Potential outcomes are outcomes we would observe under each possible treatment option. We denote potential outcomes as Y⁰ or Y¹ where Y^a is defined as the outcome that would be observed if treatment was set to be A=a.

Example:

Treatment (A): regional (A=1) versus general (A=0) anesthesia for hip fracture surgery

Outcome (Y): major pulmonary complications

Y¹: equal to 1 if major pulmonary complications and equal to 0 otherwise, if given regional anesthesia

Y⁰: equal to 1 major pulmonary complications and equal to 0 otherwise, if given general anesthesia

Counterfactual outcomes are ones that would have been observed, had the treatment been different.

If my treatment was A=1, then my counterfactual outcome is Y⁰.

If my treatment was A=0, then my counterfactual outcome is Y¹.

Example:

Did the COVID19 vaccine prevent me from getting COVID?

I received the vaccine and did not catch COVID. My actual exposure was A=1 and my observed outcome was Y=Y¹. In this case, the counterfactual would be situation where I did not get the vaccine. Would I have gotten sick if I didn’t receive the vaccine? My counterfactual exposure is A=0. My counterfactual outcome is Y⁰.

One Version of Treatment refers to the assumption that is made for treatments that there are no hidden versions of treatment. The following example will clear up what I mean by “hidden versions of treatment”.

For instance, say we are interested in the causal effect of weight on health outcomes. This can be tricky because there are various ways to manipulating the value of weight. Some people eat less or more to control weight. Others workout more intensely to reduce weight. These different methods may also be associated with different outcomes, distorting the pure causal effects we want to measure.

Fundamental Challenge of Causal Inference

The fundamental challenge of causal inference is that we can only observe one potential outcome for each person. We are unable to observe the counterfactual outcome. This is where “assumptions” come into play. This is the same with Economics which tries to develop economic theories and models under certain assumptions (e.g. All Humans are Rational — throwback to ECON 101). Similarly, researchers estimate population level (average) causal effects based on some assumptions and validation of them.

This challenge can be articulated via the following mathematical formulation:

E(Y¹-Y⁰): Average value of Y if everyone was treated with A=1 minus the average value of Y if everyone was treated with A=0

We want to ideally be able to calculate the value above but due to the fundamental challenge of not being able to observe separate outcomes on the same population based on different treatments, we cannot directly calculate E(Y¹-Y⁰). The best we can calculate, using the observed data we have in hand would be:

E(Y|A=1) — E(Y|A=0)

where

E(Y|A=1): mean of Y among people with A=1

E(Y|A=0): mean of Y among people with A=1

E(Y¹): mean of Y if the whole population was treated with A=1

E(Y¹): mean of Y if the whole population was treated with A=0

E(Y|A=1)- E(Y|A=0) is generally not equal to E(Y¹-Y⁰). Why? E(Y|A=1)- E(Y|A=0) is generally not a causal effect, because it is comparing two different populations of people while E(Y¹ –Y⁰) is a causal effect, because it is comparing what would happen if the same people were treated with A=1 v.s. if the same people were treated with A=0.

Causal Assumptions

In order to identify causal effects, some untestable assumptions are required. They are called “casual assumptions”. They are Consistency, Positivity, Stable Unit Treatment Value Assumption (SUTVA) and Ignorability.

Consistency

The potential outcome under treatment A=a, Y^a, is equal to the observed outcome if the actual treatment received is A=a.

Positivity

The positivity assumption dictates that for set of values for X, treatment assignment is not deterministic. If, for some values of X, treatment was deterministic, then we would have no observed values of Y for one of the treatment groups for those values of X. This is because variability in treatment assignment is important for identification.

SUTVA(Stable Unit Treatment Value Assumption)

  • Units do not interfere with each other.
  • Treatment assignment of one unit does not affect that outcome of another unit.
  • Spillover or contagion are also terms for interference
  • SUTVA allows us to write potential outcome for the person in terms of only that person’s treatment. In other words, there should be only one version of treatment.

Ignorability (no unmeasured confounders assumption)

Given pre-treatment covariates X, treatment assignment is independent from the potential outcomes. Among people with the same values of X, we can think of treatment A as being randomly assigned.

— The End —

I plan to write more on causal inference in future posts including how to implement causal inference design and estimation in Python and R so stay tuned!

If you found this post helpful, consider supporting me by signing up on medium via the following link : )

You will have access to so many useful and interesting articles and posts from not only me but also other authors!

About the Author

Data Scientist. 1st Year PhD student in Informatics at UC Irvine.

Former research area specialist at the Criminal Justice Administrative Records System (CJARS) economics lab at the University of Michigan, working on statistical report generation, automated data quality review, building data pipelines and data standardization & harmonization. Former Data Science Intern at Spotify. Inc. (NYC).

He loves sports, working-out, cooking good Asian food, watching kdramas and making / performing music and most importantly worshiping Jesus Christ, our Lord. Checkout his website!

--

--

Seungjun (Josh) Kim
Seungjun (Josh) Kim

Written by Seungjun (Josh) Kim

Data Scientist; PhD Student in Informatics; Artist (Singing, Percussion); Consider Supporting Me : ) https://joshnjuny.medium.com/membership

No responses yet