# Maximum Likelihood Estimation (MLE)

## Maximising the likelihood function

### Maximising the likelihood function

We have a likelihood function of the data.

$$L(\theta ; X)=P(X|\theta )$$

We choose values for $$\theta$$ which maximise the likelihood function.

$$argmax_\theta P(X|\theta )$$

That is, for which values of $$\theta$$ was the observation we saw most likely?

This is a mode estimate.

### IID

$$L(\theta ; X)=\prod_i P(x_i|\theta )$$

### Logarithms

We can take logarithms, which preserve stationary points. As logarithms are defined on all values above $$0$$, and all probabilities are also above zero (or zero), this preserves solutions.

The non-zero stationary points of:

$$\ln L(\theta ; X)=\ln \prod_i P(x_i|\theta )$$

$$\ln L(\theta ; X)=\sum_i \ln P(x_i|\theta )$$

### Example: Coin flip

Let’s take our simple example about coins. Heads and tails are the only options, so $$P(H)+P(T)=1$$.

$$P(H|\theta )=\theta$$

$$P(T|\theta )=1-\theta$$

$$\ln L(\theta ; X)=\sum_i \ln P(x_i|\theta )$$

If we had $$5$$ heads and $$5$$ tails we would have:

$$\ln L(\theta ; X)=5\ln (\theta )+ 5\ln (1-\theta )$$

So $$P(H)=\dfrac{1}{2}$$ is the value which makes our observation most likely.

## Results for specific distributions

### MLE of the Gaussian distribution

The parameters are the population means and covariance matrix.

The MLE estimator for the mean is the sample mean.

The MLE estimator for the covariance matrix is the unadjusted sample covariance.

## Other

### Restricted Maximum Likelihood

We can partition out Likelihood functions, and include a part only with variance.

### Scores

Existing score: rename Maximum Likelihood score

MLE bad if true theta not at where score is 0

Eg if one sided tails, true theta is not at MLE condition.

Can we find other scores?

### Orthogonality

Score of one parameter depends on other parameters