Regression Discontinuity

ES | EN

The Regression Discontinuity (RD) is a quasi-experimental design that allows identifying causal effects under certain continuity assumptions. Consider a population of units indexed by \( i = 1, \dots, N \), each with a value in a continuous variable called the "score" or assignment variable, denoted by \( X_i \). There is a threshold \( c \) such that units with \( X_i \geq c \) receive the treatment (denoted by \( D_i = 1 \)), while those with \( X_i < c \) do not receive it (\( D_i = 0 \)).

The goal is to estimate the causal effect of the treatment on an outcome \( Y_i \). Under the assumption that the conditional expectation functions of the potential outcomes without treatment, \( Y_i(0) \), and with treatment, \( Y_i(1) \), are continuous at \( c \), the difference in the limit as \( X_i \) approaches the threshold from the left and from the right identifies the treatment effect at the threshold. Formally, if we assume that:

\[ \lim_{x \to c^-} \mathbb{E}[Y_i(0) \mid X_i = x] \quad \text{and} \quad \lim_{x \to c^+} \mathbb{E}[Y_i(1) \mid X_i = x] \]

are well defined, and the expectation functions of the outcomes without treatment and with treatment are continuous at point \( c \), then the local causal effect (LATE, by its acronym in English) is given by:

\[ \tau_{\text{RD}} = \lim_{x \to c^+} \mathbb{E}[Y_i \mid X_i = x] - \lim_{x \to c^-} \mathbb{E}[Y_i \mid X_i = x]. \]

In other words, RD assumes that, except for the treatment assignment, units immediately to the left and right of the threshold are comparable in all observable and unobservable characteristics. Therefore, the discontinuity in the expectation of the outcome exactly at the threshold is interpreted as the causal effect of the treatment for units with \( X_i = c \).

This interactive panel allows you to simulate data generated under an RD design. By modifying parameters such as the number of observations, the treatment value, the threshold \( c \), the underlying slope, the standard deviation of the error, and the "neighborhood" (the range around the threshold used for estimation), you can visualize how the effect estimates and statistical precision change. A larger neighborhood provides more data but may introduce bias if the underlying relationship is not adequately modeled, while a very narrow neighborhood minimizes bias at the cost of increasing the variance of the estimator.



Outcome Variable by Assignment Score


Variation of Treatment Effect and Number of Observations by Neighborhood