# Ordinary Least Squares for inference

## Bias of OLS estimators

### Expectation of OLS estimators

#### Expectation in terms of observables

We have: $$\hat{\theta }=(X^TX)^{-1}X^Ty$$

Let’s take the expectation.

$$E[\hat{\theta }]=E[(X^TX)^{-1}X^Ty]$$

#### Expectation in terms of errors

Let’s model $$y$$ as a function of $$X$$. As we place no restrictions on the error terms, this is not as assumption.

$$y=X\theta +\epsilon$$.

$$E[\hat{\theta }]=E[(X^TX)^{-1}X^T(X\theta +\epsilon)]$$

$$E[\hat{\theta }]=E[(X^TX)^{-1}X^TX\theta ]+E[(X^TX)^{-1}X^T \epsilon)]$$

$$E[\hat{\theta }]=\theta +E[(X^TX)^{-1}X^T \epsilon)]$$

$$E[\hat{\theta }]=\theta +E[(X^TX)^{-1}X^T]E[ \epsilon]+cov [(X^TX)^{-1}X^T ,\epsilon]$$

#### The Gauss-Markov: Expected error is $$0$$

$$E[\epsilon =0]$$

This means that:

$$E[\hat{\theta }]=\theta + cov [(X^TX)^{-1}X^T ,\epsilon]$$

#### The Gauss-Markov: Errors and indepedent variables are uncorrelated

If the error terms and $$X$$ are uncorrelated then $$E[\epsilon|X]=0$$ and therefore:

$$E[\hat{\theta }]=\theta$$

So this is an unbiased estimator, so long as the condition holds.

## Variance of OLS estimators

### Variance of OLS estimators

#### Variance-covariance matrix

We know:

$$\hat \theta =(X^TX)^{-1}X^Ty$$

$$y=X\theta +\epsilon$$

Therefore:

$$\hat \theta =(X^TX)^{-1}X^T(X\theta +\epsilon)$$

$$\hat \theta =\theta +(X^TX)^{-1}X^T\epsilon$$

$$\hat \theta -\theta =(X^TX)^{-1}X^T\epsilon$$

$$Var [\hat \theta ]=E[(\hat \theta -\theta)(\hat \theta -\theta )^T]$$

$$Var [\hat \theta ]=E[(X^TX)^{-1}X^T\epsilon(X^TX^{-1}X^T\epsilon )^T]$$

$$Var [\hat \theta ]=E[(X^TX)^{-1}X^T\epsilon \epsilon^T X(X^TX)^{-1}]$$

$$Var [\hat \theta ]=(X^TX)^{-1}X^TE[\epsilon \epsilon^T ]X(X^TX)^{-1}$$

We write:

$$\Omega=E[\epsilon \epsilon^T]$$

$$Var [\hat \theta ]=(X^TX)^{-1}X^T\Omega X(X^TX)^{-1}$$

Depending on how we estimate $$\Omega$$, we get different variance terms.

#### Variance under IID

If IID:

$$\Omega = I\sigma^2_{\epsilon }$$

$$Var [\hat \theta ]=(X^TX)^{-1}X^TI\sigma^2_{\epsilon } X(X^TX)^{-1}$$

$$Var [\hat \theta ]=\sigma^2_\epsilon (X^TX)^{-1}$$

### Heteroskedasticity-Consistent (HC) standard errors

#### Variance of OLS estimators

$$Var [\hat \theta ]=(X^TX)^{-1}X^T\Omega X(X^TX)^{-1}$$

#### Robust standard errors for heteroskedasticity

$$\Omega_{ij}=\delta_{ij}\epsilon_i\epsilon_j$$

These are also known as the Eicker-Huber-White standard errors, or the White correction.

These are also refered to as robust standard errors.

## Properties of the OLS estimator

### Maximum Likelihood Estimator (MLE) and OLS equivalence

#### The OLS estimator

$$\hat \theta_{OLS}=(X^TX)^{-1}X^Ty$$

$$E[\hat \theta_{OLS}]=w$$

$$Var[\hat \theta_{OLS}]=\sigma^2 (X^TX)^{-1}$$

#### The MLE estimator

$$y_i=\mathbf x_i\theta +\epsilon_i$$

$$P(y=y_i|x=x_i)=P(\epsilon_i=y_i-\mathbf x_i \theta )$$

If we assume $$\epsilon_i \sim N(0, \sigma^2_\epsilon )$$ we have:

$$P(y=y_i|x=x_i)=\dfrac{1}{\sqrt {2\pi \sigma^2_\epsilon }}e^{-\dfrac{(y_i-\mathbf x_i\theta )^2}{2\sigma_\epsilon^2}}$$

$$L(X, \theta )=\prod_{i=1}^n\dfrac{1}{\sqrt {2\pi \sigma^2_\epsilon }}e^{-\dfrac{(y_i-\mathbf x_i\theta )^2}{2\sigma_\epsilon^2}}$$

$$l(X, \theta )=\sum_{i=1}^n -\dfrac{1}{2}\ln (2\pi \sigma_\epsilon^2)-\dfrac{(y_i-\mathbf x_i\theta )^2}{2\sigma_\epsilon^2}$$

$$\dfrac{\delta l}{\delta \theta_j }=\sum_{i=1}^n2x_{ij}\dfrac{y_i-\mathbf x_{i}\theta }{2\sigma^2_\epsilon}$$

$$\sum_{i=1}^nx_{ij}(y_i-\hat \theta_{MLE}\mathbf x_{i} )=0$$

$$X^T(y-X\hat \theta_{MLE} )=0$$

$$X^Ty=X^TX\hat \theta_{MLE}$$

$$\hat \theta_{MLE}=(X^TX)^{-1}X^Ty$$

#### Equivalence

If errors are normally IID then:

$$\hat \theta_{OLS}=\hat \theta_{MLE}$$

### Gauss-Markov theorem

Mean of errors zero + If the model should only have errors on upside or downside for some reason, OLS will not provide this.

Homoscedastic (all have the same variance) + The results aren’t biased, but variances etc are

Errors are uncorrelated + (this would mean you should add lagged variables etc)

show bias from each GM violation

OLS is BUE under normally distributed errors

OLS is BLUE for non-normally distribed errors