Regresión Espúria vs Cointegración

ES | EN

In econometrics, spurious regression occurs when a linear regression model is fitted between two or more non-stationary time series. The problem lies in the fact that, even when there is no real economic relationship, the model can yield an artificially high coefficient of determination (R²) and misleadingly significant p-values. Formally, if \( x_{t} \) and \( y_{t} \) are integrated of order 1 (I(1)) processes, and the following regression is estimated:

\[ y_{t} = \beta_{0} + \beta_{1}\,x_{t} + \varepsilon_{t}, \]

the residuals \( \varepsilon_{t} \) are often non-stationary, leading to spurious conclusions.

On the other hand, cointegration occurs when the series \( x_{t} \) and \( y_{t} \), although both being non-stationary of order 1, combine linearly into residuals that are stationary (I(0)). In other words, if:

\[ u_{t} = y_{t} - \beta_{0} - \beta_{1}\,x_{t} \]

is a stationary process, then we say that \( x_{t} \) and \( y_{t} \) are cointegrated, and the regression is not spurious. To verify the stationarity of \( u_{t} \), the Augmented Dickey-Fuller (ADF) test is applied, which, in its simplest form, tests for the presence of a unit root using a regression like:

\[ \Delta u_{t} = \alpha + \gamma\,u_{t-1} + \sum_{i=1}^{p} \phi_{i}\,\Delta u_{t-i} + \varepsilon_{t}. \]

If \( \gamma \) is significantly negative (below the critical value), the null hypothesis of a unit root is rejected, concluding that \( u_{t} \) is stationary and, therefore, that the variables are cointegrated.

Spurious Regression


Stochastic Data Generating Process

\[ \begin{cases} x_{t}= x_{t-1} + \epsilon_{t}, \qquad ~~~ \epsilon_{t} \sim N(0,1) \\ y_{t}= y_{t-1} + \epsilon^{*}_{t}, \qquad ~~~ \epsilon^{*}_{t} \sim N(0,1) \end{cases} \] \begin{equation} y_{t}=\beta_{0}+\beta_{1} x_{t} + u_{t} \end{equation}

Time Series Evolution - Spurious Regression

Ordinary Least Squares Estimation - Spurious Regression

Augmented Dickey-Fuller (ADF) Test on Residuals


Cointegration


Stochastic Data Generating Process

\[ \begin{cases} x_{t}= x_{t-1} + \epsilon_{t}, \qquad ~~~ \epsilon_{t} \sim N(0,1) \\ y_{t}= \beta_{0} + \beta_{1} x_{t} + \epsilon^{*}_{t}, \qquad ~~~ \epsilon_{t}^{*} \sim N(0,1) \\ \end{cases} \]

Time Series Evolution - Cointegration

Ordinary Least Squares Estimation - Cointegrated Data

Augmented Dickey-Fuller (ADF) Test on Residuals