Applied Science

In statistics, the properties of estimators are fundamental to assess their performance when inferring population parameters. Two of the most important properties are bias and consistency.

Bias

Bias measures the systematic difference between the expected value of the estimator and the true value of the parameter being estimated. Mathematically, the bias of an estimator $ \hat{\theta} $ is defined as: \[ \text{Bias}(\hat{\theta}) = \mathbb{E}[\hat{\theta}] - \theta, \] where $ \mathbb{E}[\hat{\theta}] $ is the expected value of the estimator, and $ \theta $ is the true value of the parameter. An estimator is unbiased if $ \text{Bias}(\hat{\theta}) = 0$. If there is a nonzero bias, the estimator systematically underestimates or overestimates the parameter.

Consistency

An estimator is consistent if, as the sample size increases, the estimator converges in probability to the true parameter value. This means that for any $ \epsilon > 0$, \[ \lim_{n \to \infty} P(|\hat{\theta}_n - \theta| > \epsilon) = 0, \] where $ \hat{\theta}_n $ is the estimator based on a sample of size $ n $. Consistency implies that, with enough data (i.e., a sufficiently large sample), the estimator becomes arbitrarily close to the true value.

Difference Between Bias and Consistency

The main difference between these two properties lies in their nature. Bias evaluates the performance of an estimator on finite samples, measuring whether there is a systematic error in the estimation. On the other hand, consistency assesses the behavior of the estimator in the limit, as the sample size tends to infinity. An estimator can be consistent but biased, as long as the bias decreases as the sample size increases.

For example, the uncorrected sample variance estimator: \[ \hat{\sigma}^2 = \frac{1}{n}\sum_{i=1}^n (X_i - \bar{X})^2 \] (biased sample variance) is biased but consistent, as the bias disappears when $n \to \infty $.

Metrics

$$ \begin{aligned} & \text{Estimator: } & \bar{X} = \frac{1}{n}\sum_{i=1}^n X_i & \\ & \text{Probability: }& P\left[|\bar{X} - \mu| > \epsilon\right] &= 0.0000 \\ & \text{Bias: }& \left(\bar{X} - \mu\right) &= 0.0000 \\ & \text{Variance: }& \text{Var}(\bar{X}) &= 0.0000 \end{aligned} $$