The score, Fisher information and orthogonality

Score functions

The score

The score is defined as the differential of the log-likelihood function with respect to \(\theta\).

\(V(\theta, X)=\dfrac{\delta }{\delta \theta }l(\theta ; X)\)

\(V(\theta, X)=\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X)\)

Expectation of the score

The expectation of the score, given the true value of \(\theta\) is:

\(E[V(X|\theta)]=\int V(X|\theta) dX\)

\(E[V(X|\theta)]=E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X) ]\)

\(E[V(X|\theta)]=\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}\dfrac{\delta }{\delta \theta}L(\theta; X)\)

\(E[\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]\)

\(\int \dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}P(\theta )d\theta\)

We can show that the expected value of this is \(0\).

Variance of the score

The variance of the score is:

\(var [\dfrac{\delta }{\delta \theta }l(\theta ; X) ]\)

\(var [\dfrac{1 }{\prod_{i=1}^nP(X_i|\theta )}]\)

Fisher information

Fisher information

The Fisher information is the variance:

\(E[(\dfrac{\delta }{\delta \theta }\log f(X, \theta ))^2 |\theta ]\)

\(E[\dfrac{\delta^2 }{\delta \theta^2 }\log f(X, \theta ) |\theta ]\)

Same as expectation of score squared, because centred around \(0\).

Fisher information matrix

We have \(k\) parameters.

\(I(\theta )_{ij}=E[(\dfrac{\delta }{\delta \theta_i}\log f(X, \theta ))(\dfrac{\delta }{\delta \theta_j }\log f(X, \theta ))|\theta ]\)

Observed Fisher information matrix

The Fisher information matrix contains informatio about the population

The observed Fisher infoirmation is the negative of the Hessian of the log likelihood.

We have:

  • \(l(\theta |\mathbf X)=\sum_i\ln P(\mathbf x_i|\theta )\)

  • \(J(\theta^*)=-\nabla \nabla^Tl(\theta|mathbf X )|_{\theta = \theta^*}\)

The Fisher information is the expected value of this.

\(I(\theta )=E[J(\theta)]\)

Orthogonality

Orthogonality

Two variables are called orthogonal if their entry in fisher info matrix is 0

This means that the parameters can be calculated separately. MLE estimates are separate

This can be written as a moment condition

\(\delta\)