Applied Science

Panel Data Model

$$ y_{it} = \beta x_{it} + \alpha_i + u_{it} $$

Where:

$ y_{it} $: Dependent variable for individual $ i $ at time period $ t $.
$ \alpha_i $: Idiosyncratic effect for individual $ i $.
$ \beta_k $: Coefficient of covariate $ x_{itk} $.
$ u_{it} $: Error term.

The model for estimating $ \beta_k $ depends on the relationship between the model’s covariates, $ x_{itk} $, and the idiosyncratic term, $ \alpha_i $. If $ \alpha_i $ and $ x_{itk} $ are not correlated, Pooled Ordinary Least Squares (Pooled OLS) can be used. However, this results in an estimator that, while unbiased and consistent, is inefficient because the variance-covariance matrix of the term $ v_{it} = \alpha_i + u_{it} $ is not diagonal. Instead:

$$ E(\mathbf{v}_i \mathbf{v}_i') = \Omega = \left[ \begin{array}{cccc} \sigma^2_u + \sigma^2_c & \sigma^2_c & \cdots & \sigma^2_c \\ \sigma^2_c & \sigma^2_u + \sigma^2_c & \cdots & \vdots \\ \vdots & \vdots & \ddots & \sigma^2_c \\ \sigma^2_c & \sigma^2_c & \cdots & \sigma^2_u + \sigma^2_c \\ \end{array} \right] $$

Where $ Var(\alpha_i) = \sigma^2_c $, $ Var(u_{it}) = \sigma^2_u $ and $ \mathbf{v}_i' = [v_{i1}, v_{i2}, \cdots, v_{iT}] $ is a row vector of dimension $ 1\times T $.

Considering this, the best estimator is one that can capture this structure in the covariance matrix. In this case, the GLS method, often referred to in the literature as the random effects estimator, is used.

However, if $ \alpha_i $ and $ x_{itk} $ are correlated, the random effects estimator will be biased. In this case, a simple way to estimate the coefficients $ \beta_k $ is by "removing" the idiosyncratic component. One approach is to consider this component as a constant term and include it as an additional parameter to estimate. This is what we refer to as the fixed effects estimator.

Random Effects (RE) Estimation

The RE estimation assumes that individual effects $ \alpha_i $ are random variables but are uncorrelated with covariates:

$$ Cov[\alpha_i, x_{it}] = 0. $$

$$ y_{it} = \beta x_{it} + \underset{v_{it}}{\underbrace{\alpha_i + u_{it}}}. $$

The RE estimator is obtained by performing Pooled OLS on centered data, assuming:

$$ \bar{y}_{it} = y_{it} - \bar{y}_{i.}, \quad \bar{x}_{it} = x_{it} - \bar{x}_{i.}, $$

$$ \bar{y}_{i.} = \frac{1}{T} \sum_{t=1}^{T} y_{it}, \quad \bar{x}_{i.} = \frac{1}{T} \sum_{t=1}^{T} x_{it}. $$

The RE estimator is calculated as:

$$ \hat{\beta}_{RE} = \frac{\sum_{i=1}^N \sum_{t=1}^T \bar{x}_{it} \bar{y}_{it}}{\sum_{i=1}^N \sum_{t=1}^T \bar{x}_{it}^2}. $$

Fixed Effects (FE) Estimation

If $ Cov[\alpha_i, x_{it}] \neq 0 $, the RE estimator is biased. In this case, the Fixed Effects (FE) estimator is preferred, which accounts for individual effects by transforming the data.

The FE estimator is calculated by demeaning the data:

$$ \hat{\beta}_{FE} = \frac{\sum_{i=1}^N \sum_{t=1}^T (x_{it} - \bar{x}_i)(y_{it} - \bar{y}_i)}{\sum_{i=1}^N \sum_{t=1}^T (x_{it} - \bar{x}_i)^2}. $$