# Maximum Likelihood Estimation (MLE)

## Maximising the likelihood function

### Maximising the likelihood function

We have a likelihood function of the data.

\(L(\theta ; X)=P(X|\theta )\)

We choose values for \(\theta \) which maximise the likelihood function.

\(argmax_\theta P(X|\theta )\)

That is, for which values of \(\theta \) was the observation we saw most likely?

This is a mode estimate.

### IID

\(L(\theta ; X)=\prod_i P(x_i|\theta )\)

### Logarithms

We can take logarithms, which preserve stationary points. As logarithms are defined on all values above \(0\), and all probabilities are also above zero (or zero), this preserves solutions.

The non-zero stationary points of:

\(\ln L(\theta ; X)=\ln \prod_i P(x_i|\theta )\)

\(\ln L(\theta ; X)=\sum_i \ln P(x_i|\theta )\)

### Example: Coin flip

Letâ€™s take our simple example about coins. Heads and tails are the only options, so \(P(H)+P(T)=1\).

\(P(H|\theta )=\theta \)

\(P(T|\theta )=1-\theta \)

\(\ln L(\theta ; X)=\sum_i \ln P(x_i|\theta )\)

If we had \(5\) heads and \(5\) tails we would have:

\(\ln L(\theta ; X)=5\ln (\theta )+ 5\ln (1-\theta )\)

So \(P(H)=\dfrac{1}{2}\) is the value which makes our observation most likely.

## Properties of the MLE estimator

### Asymptotic normality of the MLE

## Results for specific distributions

### MLE of the Gaussian distribution

The parameters are the population means and covariance matrix.

The MLE estimator for the mean is the sample mean.

The MLE estimator for the covariance matrix is the unadjusted sample covariance.

### MLE of the Poisson distribution

### MLE of the Bernoulli and binomial distributions

## Other

### Restricted Maximum Likelihood

We can partition out Likelihood functions, and include a part only with variance.

### Targeted Maximum Likelihood Estimation

### Scores

Existing score: rename Maximum Likelihood score

MLE bad if true theta not at where score is 0

Eg if one sided tails, true theta is not at MLE condition.

Can we find other scores?

### Orthogonality

Score of one parameter depends on other parameters

If we misestimate one, then estimate another, will be bad answer

We want the score not to change around bad estimates

We want nuisance parameter bias not to affect score

separate page for orthogonality for sets of parameters. eg nuisance; of interest