Likelihood functions

Likelihood functions

Likelihood function

We want to estimate parameters. One way of looking into this is to look at the likelihood function:

\(L(\theta ; X)=P(X|\theta )\)

The likelihood function shows the chance of the observed data being generated, given specific parameters.

If this has high peaks then it provides information that \(\theta \) is located in this region.

IID

For multiple events, the likelihood function is:

\(L(\theta ; X)=P(X|\theta )\)

\(L(\theta ; X)=P(A_1 \land B_2 \land C_3 \land D_4…|\theta )\)

If the events are independent, that is the chance of a flip doesn’t depend on any other outcomes, then:

\(L(\theta ; X)=P(A_1|\theta ).P(B_2|\theta ).P(C_3|\theta ).P(D_4|\theta )...\)

If the events are identically distributed, the chance of flipping a head doesn’t change across flips (for example the heads side doesn’t get heavier over time) then:

\(L(\theta ; X)=P(A|\theta ).P(B|\theta ).P(C|\theta ).P(D|\theta )...\)

\(L(\theta ; X)=\prod_{i=1}^n P(X_i|\theta )\)

Score functions

The score

The score is defined as the differential of the log-likelihood function with respect to \(\theta \).

\(V(\theta, X)=\dfrac{\delta }{\delta \theta }l(\theta ; X) \)

\(V(\theta, X)=\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X) \)

Expectation of the score

The expectation of the score, given the true value of \(\theta \) is:

\(E[V(X|\theta)]=\int V(X|\theta) dX\)

\(E[V(X|\theta)]=E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X) ]\)

\(E[V(X|\theta)]=\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X) \)

\(E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]\)

\(\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}P(\theta )d\theta \)

We can show that the expected value of this is \(0\).

Variance of the score

The variance of the score is:

\(var [\dfrac{\delta }{\delta \theta }l(\theta ; X) ]\)

\(var [\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]\)

Fisher information

Fisher information

The Fisher information is the variance:

\(E[(\dfrac{\delta }{\delta \theta }\log f(X, \theta ))^2 |\theta ]\)

\(E[\dfrac{\delta^2 }{\delta \theta^2 }\log f(X, \theta ) |\theta ]\)

Same as expectation of score squared, because centred around \(0\).

Fisher information matrix

We have \(k\) parameters.

\(I(\theta )_{ij}=E[(\dfrac{\delta }{\delta \theta_i}\log f(X, \theta ))(\dfrac{\delta }{\delta \theta_j }\log f(X, \theta ))|\theta ]\)

Observed Fisher information matrix

The Fisher information matrix contains informatio about the population

The observed Fisher infoirmation is the negative of the Hessian of the log likelihood.

We have:

  • \(l(\theta |\mathbf X)=\sum_i\ln P(\mathbf x_i|\theta )\)

  • \(J(\theta^*)=-\nabla \nabla^Tl(\theta|mathbf X )|_{\theta = \theta^*}\)

The Fisher information is the expected value of this.

\(I(\theta )=E[J(\theta)]\)

Orthogonality

Orthogonality

Two variables are called orthogonal if their entry in fisher info matrix is 0

This means that the parameters can be calculated separately. MLE estimates are separate

This can be written as a moment condition

\(\delta \)

Quasi-likelihood function

Quasi-likelihood function