# Likelihood functions

## Likelihood functions

### Likelihood function

We want to estimate parameters. One way of looking into this is to look at the likelihood function:

$$L(\theta ; X)=P(X|\theta )$$

The likelihood function shows the chance of the observed data being generated, given specific parameters.

If this has high peaks then it provides information that $$\theta$$ is located in this region.

### IID

For multiple events, the likelihood function is:

$$L(\theta ; X)=P(X|\theta )$$

$$L(\theta ; X)=P(A_1 \land B_2 \land C_3 \land D_4…|\theta )$$

If the events are independent, that is the chance of a flip doesn’t depend on any other outcomes, then:

$$L(\theta ; X)=P(A_1|\theta ).P(B_2|\theta ).P(C_3|\theta ).P(D_4|\theta )...$$

If the events are identically distributed, the chance of flipping a head doesn’t change across flips (for example the heads side doesn’t get heavier over time) then:

$$L(\theta ; X)=P(A|\theta ).P(B|\theta ).P(C|\theta ).P(D|\theta )...$$

$$L(\theta ; X)=\prod_{i=1}^n P(X_i|\theta )$$

## Score functions

### The score

The score is defined as the differential of the log-likelihood function with respect to $$\theta$$.

$$V(\theta, X)=\dfrac{\delta }{\delta \theta }l(\theta ; X)$$

$$V(\theta, X)=\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X)$$

### Expectation of the score

The expectation of the score, given the true value of $$\theta$$ is:

$$E[V(X|\theta)]=\int V(X|\theta) dX$$

$$E[V(X|\theta)]=E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X) ]$$

$$E[V(X|\theta)]=\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X)$$

$$E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]$$

$$\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}P(\theta )d\theta$$

We can show that the expected value of this is $$0$$.

### Variance of the score

The variance of the score is:

$$var [\dfrac{\delta }{\delta \theta }l(\theta ; X) ]$$

$$var [\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]$$

## Fisher information

### Fisher information

The Fisher information is the variance:

$$E[(\dfrac{\delta }{\delta \theta }\log f(X, \theta ))^2 |\theta ]$$

$$E[\dfrac{\delta^2 }{\delta \theta^2 }\log f(X, \theta ) |\theta ]$$

Same as expectation of score squared, because centred around $$0$$.

### Fisher information matrix

We have $$k$$ parameters.

$$I(\theta )_{ij}=E[(\dfrac{\delta }{\delta \theta_i}\log f(X, \theta ))(\dfrac{\delta }{\delta \theta_j }\log f(X, \theta ))|\theta ]$$

### Observed Fisher information matrix

The Fisher information matrix contains informatio about the population

The observed Fisher infoirmation is the negative of the Hessian of the log likelihood.

We have:

• $$l(\theta |\mathbf X)=\sum_i\ln P(\mathbf x_i|\theta )$$

• $$J(\theta^*)=-\nabla \nabla^Tl(\theta|mathbf X )|_{\theta = \theta^*}$$

The Fisher information is the expected value of this.

$$I(\theta )=E[J(\theta)]$$

## Orthogonality

### Orthogonality

Two variables are called orthogonal if their entry in fisher info matrix is 0

This means that the parameters can be calculated separately. MLE estimates are separate

This can be written as a moment condition

$$\delta$$